programming python

Hướng dẫn dùng duplicates python

DataFrame.drop_duplicates[subset=None, keep='first', inplace=False, ignore_index=False][source]¶

Return DataFrame with duplicate rows removed.

Considering certain columns is optional. Indexes, including time indexes are ignored.

Parameterssubsetcolumn label or sequence of labels, optional

Only consider certain columns for identifying duplicates, by default use all of the columns.

keep{‘first’, ‘last’, False}, default ‘first’

Determines which duplicates [if any] to keep. - first : Drop duplicates except for the first occurrence. - last : Drop duplicates except for the last occurrence. - False : Drop all duplicates.

inplacebool, default False

Whether to drop duplicates in place or to return a copy.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

New in version 1.0.0.

ReturnsDataFrame or None

DataFrame with duplicates removed or None if inplace=True.

Examples

Consider dataset containing ramen rating.

>>> df = pd.DataFrame[{
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... }]
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns.

>>> df.drop_duplicates[]
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

To remove duplicates on specific column[s], use subset.

>>> df.drop_duplicates[subset=['brand']]
    brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5

To remove duplicates and keep last occurrences, use keep.

>>> df.drop_duplicates[subset=['brand', 'style'], keep='last']
    brand style  rating
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
4  Indomie  pack     5.0

DataFrame.duplicated[subset=None, keep='first'][source]¶

Return boolean Series denoting duplicate rows.

Considering certain columns is optional.

Parameterssubsetcolumn label or sequence of labels, optional

Only consider certain columns for identifying duplicates, by default use all of the columns.

keep{‘first’, ‘last’, False}, default ‘first’

Determines which duplicates [if any] to mark.

first : Mark duplicates as True except for the first occurrence.
last : Mark duplicates as True except for the last occurrence.
False : Mark all duplicates as True.

ReturnsSeries

Boolean series for each duplicated rows.

Examples

Consider dataset containing ramen rating.

>>> df = pd.DataFrame[{
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... }]
>>> df
    brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, for each set of duplicated values, the first occurrence is set on False and all others on True.

>>> df.duplicated[]
0    False
1     True
2    False
3    False
4    False
dtype: bool

By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.

>>> df.duplicated[keep='last']
0     True
1    False
2    False
3    False
4    False
dtype: bool

By setting keep on False, all duplicates are True.

>>> df.duplicated[keep=False]
0     True
1     True
2    False
3    False
4    False
dtype: bool

To find duplicates on specific column[s], use subset.

>>> df.duplicated[subset=['brand']]
0    False
1     True
2    False
3     True
4     True
dtype: bool

Toplist mới

Top 9 tập bản đồ lớp 8 bài 31 2023

6 tháng trước

Top 6 kết quả thi hsg đà nẵng 2022 2023

6 tháng trước

Top 9 tủ nhựa đài loan 4 cánh 3d 2023

6 tháng trước

Top 9 chất khí có thể làm mất màu dung dịch nước brom là: a. so2. b. co2. c. o2. d. hcl. 2023

6 tháng trước

Top 8 tìm việc làm tiện, phay bảo q7 2023

6 tháng trước

Top 3 tôi xuyên thành tiểu kiều the của lão đại phản 2 2023

6 tháng trước

Top 9 đổi mới phong cách, thái độ phục vụ của cán bộ y tế hướng tới sự hài lòng của người bệnh 2023

6 tháng trước

Top 2 bài the dục phát triển chung lớp 6 2022 2023

6 tháng trước

Top 3 bài giảng vũ điệu sắc màu (lớp 4) 2023

6 tháng trước

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề