In this article, we will discuss how to create a bar plot by using pandas crosstab in Python. First Lets us know more about the crosstab, It is a simple cross-tabulation of two or more variables.
What is cross-tabulation?
It is a simple cross-tabulation that help us to understand the relationship between two or more variable. It will give a clear understanding of the data and makes analysis easier.
Let us take an example if we take a data set of Handedness of people which includes peoples nationality, sex, age, and name. Suppose if we want to analyze the relationship between nationality and handedness of the peoples. Crosstab gives you the relationship between them.
Crosstab using pandas
Before creating the barplot we should create cross-tabulation using pandas.
Syntax: pandas.crosstab[index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False
Code:
Python
import
pandas as pd
df
=
pd.read_csv[
'Data.csv'
]
crosstb
=
pd.crosstab[df.Nationality, df.Handedness]
Output:
Creating bar plots
Bar graphs are most used to compare between different groups or to track changes over time. Using bar plots to the crosstab is one of the efficient ways to conclude the crosstab and analyze them even better.
Syntax: DataFrame.plot.bar[x=None, y=None, **kwargs]
Code:
Python3
import
pandas as pd
df
=
pd.read_csv[
'Data.csv'
]
crosstb
=
pd.crosstab[df.Nationality, df.Handedness]
barplot
=
crosstb.plot.bar[rot
=
0
]
Output:
Stacked barplot
Here we will create a stacked barplot through dataframe by passing the stacked parameter as True.
Dataframe.plot[kind=”bar”, stacked = True, rot=0]
Code:
Python
import
pandas as pd
df
=
pd.read_csv[
'Data.csv'
]
crosstb
=
pd.crosstab[df.Nationality, df.Handedness]
pl
=
crosstb.plot[kind
=
"bar"
, stacked
=
True
, rot
=
0
]
Output:
Creating bar plot using more than two variables from the crosstab
In the above example, we found the relationship between nationality and the handedness of the people. We can also create a crosstab with more than two values. We will implement this in the following example.
Python3
import
pandas as pd
df
=
pd.read_csv[
'Data.csv'
]
crosstb
=
pd.crosstab[df.Sex, [df.Nationality,
df.Handedness]]
a
=
crosstb.plot[kind
=
'bar'
, rot
=
0
]
a.legend[title
=
'Handedness'
, bbox_to_anchor
=
[
1
,
1.02
],
loc
=
'upper left'
]
Output:
- The guy who created Seaborn doesn't like stacked bar charts [but that link has a hack which uses Seaborn + Matplotlib to make them anyway].
- If you're willing to accept a grouped bar chart instead of a stacked one, following are two approaches
- Tested in
python 3.8.11
,pandas 1.3.2
,matplotlib 3.4.3
,seaborn 0.11.2
# first some sample data
import numpy as np
import pandas as pd
import seaborn as sns
N = 1000
np.random.seed[365]
mark = np.random.choice[[True, False], N]
periods = np.random.choice[['BASELINE', 'WEEK 12', 'WEEK 24', 'WEEK 4'], N]
df = pd.DataFrame[{'mark':mark,'period':periods}]
ct = pd.crosstab[df.period, df.mark]
mark False True
period
BASELINE 124 126
WEEK 12 102 118
WEEK 24 118 133
WEEK 4 140 139
# now stack and reset
stacked = ct.stack[].reset_index[].rename[columns={0:'value'}]
# plot grouped bar chart
p = sns.barplot[x=stacked.period, y=stacked.value, hue=stacked.mark, order=['BASELINE', 'WEEK 4', 'WEEK 12', 'WEEK 24']]
sns.move_legend[p, bbox_to_anchor=[1, 1.02], loc='upper left']
- The point of using
pandas.crosstab
is to get the counts per group, however this can be bypassed by passing the original dataframe,df
, toseaborn.countplot
ax = sns.countplot[data=df, x='period', hue='mark', order=['BASELINE', 'WEEK 4', 'WEEK 12', 'WEEK 24']]
sns.move_legend[ax, bbox_to_anchor=[1, 1.02], loc='upper left']
for c in ax.containers:
# set the bar label
ax.bar_label[c, label_type='center']