This article describes how to get the number of rows, columns, and total number of elements [size] of pandas.DataFrame
and pandas.Series
.
pandas.DataFrame
- Display number of rows, columns, etc.:
df.info[]
- Get the number of rows:
len[df]
- Get the number of columns:
len[df.columns]
- Get the number of rows and columns:
df.shape
- Get the number of elements:
df.size
- Notes when specifying index
- Display number of rows, columns, etc.:
pandas.Series
- Get the number of
elements:
len[s]
,s.size
- Get the number of
elements:
As an example, use Titanic survivor data. It can be downloaded from Kaggle.
import pandas as pd
df = pd.read_csv['data/src/titanic_train.csv']
print[df.head[]]
# PassengerId Survived Pclass \
# 0 1 0 3
# 1 2 1 1
# 2 3 1 3
# 3 4 1 1
# 4 5 0 3
#
# Name Sex Age SibSp \
# 0 Braund, Mr. Owen Harris male 22.0 1
# 1 Cumings, Mrs. John Bradley [Florence Briggs Th... female 38.0 1
# 2 Heikkinen, Miss. Laina female 26.0 0
# 3 Futrelle, Mrs. Jacques Heath [Lily May Peel] female 35.0 1
# 4 Allen, Mr. William Henry male 35.0 0
#
# Parch Ticket Fare Cabin Embarked
# 0 0 A/5 21171 7.2500 NaN S
# 1 0 PC 17599 71.2833 C85 C
# 2 0 STON/O2. 3101282 7.9250 NaN S
# 3 0 113803 53.1000 C123 S
# 4 0 373450 8.0500 NaN S
Get the number of rows, columns, elements of pandas.DataFrame
Display number of rows, columns, etc.: df.info[]
The info[]
method of pandas.DataFrame
can display information
such as the number of rows and columns, the total memory usage, the data type of each column, and the number of non-NaN elements.
df.info[]
#
# RangeIndex: 891 entries, 0 to 890
# Data columns [total 12 columns]:
# PassengerId 891 non-null int64
# Survived 891 non-null int64
# Pclass 891 non-null int64
# Name 891 non-null object
# Sex 891 non-null object
# Age 714 non-null float64
# SibSp 891 non-null int64
# Parch 891 non-null int64
# Ticket 891 non-null object
# Fare 891 non-null float64
# Cabin 204 non-null object
# Embarked 889 non-null object
# dtypes: float64[2], int64[5], object[5]
# memory usage: 83.6+ KB
The result is standard output and cannot be obtained as a value.
Get the number of rows: len[df]
The number of rows of pandas.DataFrame
can be obtained with the Python built-in function len[]
.
In the example, it is displayed using print[]
, but len[]
returns an integer value, so it can be
assigned to another variable or used for calculation.
Get the number of columns: len[df.columns]
The number of columns of pandas.DataFrame
can be obtained by applying len[]
to the columns
attribute.
print[len[df.columns]]
# 12
Get the number of rows and columns: df.shape
The shape
attribute of pandas.DataFrame
stores the number of rows and columns as a tuple [number of rows, number of columns]
.
print[df.shape]
# [891, 12]
print[df.shape[0]]
# 891
print[df.shape[1]]
# 12
It is also possible to unpack and store them in separate variables.
- Unpack a tuple and list in Python
row, col = df.shape
print[row]
# 891
print[col]
# 12
Get the number of elements: df.size
The total number of elements of pandas.DataFrame
is stored in the size
attribute. This is equal to the row_count * column_count
.
print[df.size]
# 10692
print[df.shape[0] * df.shape[1]]
# 10692
Notes when specifying index
When
a column of data is specified as an index by the set_index[]
method, these columns are removed from the data body [values
attribute], so it is not counted as the number of columns.
df_multiindex = df.set_index[['Sex', 'Pclass', 'Embarked', 'PassengerId']]
print[len[df_multiindex]]
# 891
print[len[df_multiindex.columns]]
# 8
print[df_multiindex.shape]
# [891, 8]
print[df_multiindex.size]
# 7128
See the following article for set_index[]
.
- pandas: Assign existing column to the DataFrame index with set_index[]
Get the number of elements of
pandas.Series
As an example of pandas.Series
, select one row from pandas.DataFrame
.
s = df['PassengerId']
print[s.head[]]
# 0 1
# 1 2
# 2 3
# 3 4
# 4 5
# Name: PassengerId, dtype: int64
Get the number of elements : len[s]
, s.size
Since pandas.Series
is one-dimensional, you can get the total number of elements [size] with either len[]
or size
attribute.
Note that the shape
attribute is a tuple with one element.
print[len[s]]
# 891
print[s.size]
# 891
print[s.shape]
# [891,]
There is no info[]
method in pandas.Series
.