pandas

如果你想找到或者删除 DataFrame中重复的行, 可以使用 duplicateddrop_duplicates

查找重复值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
example:
col1 col2 c
0 one x -1.067137
1 one y 0.309500
2 two x -0.211056
3 two y -1.842023
4 two x -0.390820
5 three x -1.964475
6 four x 1.298329
In:
// 单列
df.duplicated("col1", keep="first")

// 多列
// df.duplicated(["col1", "col2"], keep="first")

Out:
0 False
1 True
2 False
3 True
4 True
5 False
6 False
dtype: bool

// 默认 keep = "first",第一次出现的不算重复,返回False
// keep = "last", 最后出现的不算重复
// keep = False, 重复值均返回 True

删除重复值

1
2
3
4
5
6
7
8
9
In:
df.drop_duplicates('col1')

Out:
col1 col2 c
0 one x -1.067137
2 two x -0.211056
5 three x -1.964475
6 four x 1.298329