pandas common usage

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

pandas

本文主要记录在实践中遇到的一些常见的用法

一些说明

df表示 pandas.DataFrame

dataFrame

rolling
- 为了提升数据的准确性，将某个点的取值扩大到包含这个点的一段区间，用区间来进行判断，这个区间就是窗口。移动窗口就是窗口向一端滑行，默认是从右往左
- rolling返回的类可以进行很多数值操作，例如mean(), std(), sum(),这些均是dataFrame包含的方法
dropna
- 去除空值
Merge, join, and concatenate
- 对多个dataframe进行合并操作，包括行、列数据
duplicated
- 判断重复数据
drop_duplicates
- 去除重复数据
group_by
- 根据某些col对数据进行聚合
~~as_matrix~~
- 转化为numpy的数组，注意columns参数是列名的列表
columns
- 获取列名
获取某个位置的元素
- DataFrame.iat
  Fast integer location scalar accessor.
- DataFrame.loc
  Purely label-location based indexer for selection by label.
- Series.iloc
  Purely integer-location based indexing for selection by position.
values
- 返回numpy 数组值

apply)

1
2
3

df.apply(np.sum, axis=0)
df.apply(lambda x: [1, 2], axis=1)
X[categorical_cols] = X[categorical_cols].apply(lambda col: le.fit_transform(col))

pandas common usage

一些说明

dataFrame

近期文章

近期评论

标签

热门

文章归档

分类目录

功能