pandas手记(2)

继上一篇文章简要介绍了Pandas的Series和DataFrame结构之后,这块文章重点介绍一下Pandas的核心DataFrame结构的常用操作。

DataFrame的删除操作:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data = DataFrame(np.arange(16).reshape((4, 4)),
... index=['Ohio', 'Colorado', 'Utah', 'New York'],
... columns=['one', 'two', 'three', 'four'])
>>> data
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
>>> print(data.drop(['Colorado', 'Ohio']))
one two three four
Utah 8 9 10 11
New York 12 13 14 15
>>> print(data.drop('two', axis=1))
one three four
Ohio 0 2 3
Colorado 4 6 7
Utah 8 10 11
New York 12 14 15
>>> print(data.drop(['two', 'four'], axis=1))
one three
Ohio 0 2
Colorado 4 6
Utah 8 10
New York 12 14

数据过滤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
>>> print data[data.three < 10]  # three列上值大于等于10的行扔掉,小于的保留。
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
>>> data
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
>>> print data.loc[data.three < 5, ]
one two three four
Ohio 0 1 2 3
>>> data[data > 10] = 0
>>> data
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 0
New York 0 0 0 0
>>> data + 100
one two three four
Ohio 100 101 102 103
Colorado 104 105 106 107
Utah 108 109 110 100
New York 100 100 100 100

>>> # DataFrame和Series的计算
... frame = DataFrame(np.arange(12.).reshape((4, 3)),
... columns=list('bde'),
... index=['Utah', 'Ohio', 'Texas', 'Oregon'])
>>> s = frame.iloc[0]
>>> print(frame)
b d e
Utah 0.0 1.0 2.0
Ohio 3.0 4.0 5.0
Texas 6.0 7.0 8.0
Oregon 9.0 10.0 11.0
>>> print(s)
b 0.0
d 1.0
e 2.0
Name: Utah, dtype: float64
>>> print(frame - s) # 每一行减去对应的s,本质上每一行在对应的索引位置上相减。
b d e
Utah 0.0 0.0 0.0
Ohio 3.0 3.0 3.0
Texas 6.0 6.0 6.0
Oregon 9.0 9.0 9.0