pandas手记（2）

继上一篇文章简要介绍了Pandas的Series和DataFrame结构之后，这块文章重点介绍一下Pandas的核心DataFrame结构的常用操作。

DataFrame的删除操作：

>>> data = DataFrame(np.arange(16).reshape((4, 4)),
...                  index=['Ohio', 'Colorado', 'Utah', 'New York'],
...                  columns=['one', 'two', 'three', 'four'])
>>> data
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15
>>> print(data.drop(['Colorado', 'Ohio']))
          one  two  three  four
Utah        8    9     10    11
New York   12   13     14    15
>>> print(data.drop('two', axis=1))
          one  three  four
Ohio        0      2     3
Colorado    4      6     7
Utah        8     10    11
New York   12     14    15
>>> print(data.drop(['two', 'four'], axis=1))
          one  three
Ohio        0      2
Colorado    4      6
Utah        8     10
New York   12     14

数据过滤

>>> print data[data.three < 10]  # three列上值大于等于10的行扔掉，小于的保留。
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
>>> data
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15
>>> print data.loc[data.three < 5, ]
      one  two  three  four
Ohio    0    1      2     3
>>> data[data > 10] = 0
>>> data
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10     0
New York    0    0      0     0
>>> data + 100
          one  two  three  four
Ohio      100  101    102   103
Colorado  104  105    106   107
Utah      108  109    110   100
New York  100  100    100   100

>>> # DataFrame和Series的计算
... frame = DataFrame(np.arange(12.).reshape((4, 3)),
...                   columns=list('bde'),
...                   index=['Utah', 'Ohio', 'Texas', 'Oregon'])
>>> s = frame.iloc[0]
>>> print(frame)
          b     d     e
Utah    0.0   1.0   2.0
Ohio    3.0   4.0   5.0
Texas   6.0   7.0   8.0
Oregon  9.0  10.0  11.0
>>> print(s)
b    0.0
d    1.0
e    2.0
Name: Utah, dtype: float64
>>> print(frame - s) # 每一行减去对应的s，本质上每一行在对应的索引位置上相减。
          b    d    e
Utah    0.0  0.0  0.0
Ohio    3.0  3.0  3.0
Texas   6.0  6.0  6.0
Oregon  9.0  9.0  9.0

pandas手记（2）

近期文章

近期评论

标签

热门

文章归档

分类目录

功能