How to plot crosstab in python

In this article, we will discuss how to create a bar plot by using pandas crosstab in Python. First Lets us know more about the crosstab, It is a simple cross-tabulation of two or more variables.

What is cross-tabulation?

It is a simple cross-tabulation that help us to understand the relationship between two or more variable. It will give a clear understanding of the data and makes analysis easier.  

Let us take an example if we take a data set of Handedness of people which includes peoples nationality, sex, age, and name. Suppose if we want to analyze the relationship between nationality and handedness of the peoples. Crosstab gives you the relationship between them.

How to plot crosstab in python

Crosstab using pandas

Before creating the barplot we should create cross-tabulation using pandas.

Syntax: pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False

Code:

Python

import pandas as pd

df = pd.read_csv('Data.csv')

crosstb = pd.crosstab(df.Nationality, df.Handedness)

Output:

How to plot crosstab in python

Creating bar plots

Bar graphs are most used to compare between different groups or to track changes over time. Using bar plots to the crosstab is one of the efficient ways to conclude the crosstab and analyze them even better.

Syntax: DataFrame.plot.bar(x=None, y=None, **kwargs)

Code:

Python3

import pandas as pd

df = pd.read_csv('Data.csv')

crosstb = pd.crosstab(df.Nationality, df.Handedness)

barplot = crosstb.plot.bar(rot=0)

Output:

How to plot crosstab in python

Stacked barplot

Here we will create a stacked barplot through dataframe by passing the stacked parameter as True.

Dataframe.plot(kind=”bar”, stacked = True, rot=0)

Code:

Python

import pandas as pd

df = pd.read_csv('Data.csv')

crosstb = pd.crosstab(df.Nationality, df.Handedness)

pl = crosstb.plot(kind="bar", stacked=True, rot=0)

Output:

How to plot crosstab in python

Creating bar plot using more than two variables from the crosstab

In the above example, we found the relationship between nationality and the handedness of the people. We can also create a crosstab with more than two values. We will implement this in the following example. 

Python3

import pandas as pd

df = pd.read_csv('Data.csv')

crosstb = pd.crosstab(df.Sex, [df.Nationality,

                               df.Handedness])

a = crosstb.plot(kind='bar', rot=0)

a.legend(title='Handedness', bbox_to_anchor=(1, 1.02),

         loc='upper left')

Output:

How to plot crosstab in python


  • The guy who created Seaborn doesn't like stacked bar charts (but that link has a hack which uses Seaborn + Matplotlib to make them anyway).
  • If you're willing to accept a grouped bar chart instead of a stacked one, following are two approaches
  • Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
# first some sample data
import numpy as np 
import pandas as pd
import seaborn as sns

N = 1000
np.random.seed(365)
mark = np.random.choice([True, False], N)
periods = np.random.choice(['BASELINE', 'WEEK 12', 'WEEK 24', 'WEEK 4'], N)

df = pd.DataFrame({'mark':mark,'period':periods})
ct = pd.crosstab(df.period, df.mark)

mark      False  True
period               
BASELINE    124   126
WEEK 12     102   118
WEEK 24     118   133
WEEK 4      140   139

# now stack and reset
stacked = ct.stack().reset_index().rename(columns={0:'value'})

# plot grouped bar chart
p = sns.barplot(x=stacked.period, y=stacked.value, hue=stacked.mark, order=['BASELINE', 'WEEK 4', 'WEEK 12', 'WEEK 24'])
sns.move_legend(p, bbox_to_anchor=(1, 1.02), loc='upper left')

How to plot crosstab in python

  • The point of using pandas.crosstab is to get the counts per group, however this can be bypassed by passing the original dataframe, df, to seaborn.countplot
ax = sns.countplot(data=df, x='period', hue='mark', order=['BASELINE', 'WEEK 4', 'WEEK 12', 'WEEK 24'])
sns.move_legend(ax, bbox_to_anchor=(1, 1.02), loc='upper left')

for c in ax.containers:
    
    # set the bar label
    ax.bar_label(c, label_type='center')

How to plot crosstab in python

How do you plot cross tables in Python?

In Python, a crosstab is a tabulation of two different categorical variables. ... Crosstab in Python Pandas..

What is cross tab Python?

The crosstab() function is used to compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.

How do you show percentages in crosstab Python?

“how to display percentage in pandas crosstab” Code Answer.
pd. crosstab(df. A,df. B, normalize='index')\.
. round(4)*100..
B A B C..
one 33.33 33.33 33.33..
three 33.33 33.33 33.33..
two 33.33 33.33 33.33..

Is a crosstab a Dataframe?

The crosstab function can operate on numpy arrays, series or columns in a dataframe. For this example, I pass in df. make for the crosstab index and df.