pandas中的时间处理

Create datetime as index in pandas

read and use dates as index

pd.read_csv('hs300.csv',index_col='date',parse_dates=True,thousands=',',dtype={'price':np.float64})

Generating Ranges of Timestamps

  1. calendar day: date_range
  2. business day: bdate_range
from datetime import datetime
start = datetime(2011, 1, 1)
end = (2012, 1, 1)
ts = pd.date_range(start, end, freq='B')

Convert

To convert a Series or list-like object of date-like objects e.g. strings, epochs, use the to_datetime function

import pandas as pd
pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None]))

Indexing with datetime in pandas

Indexing

To provide convenience for accessing longer time series, can pass in the year or year and month as strings:

  • select all data in year 2011: ts['2011']
  • select all data in year 2011/06: ts['2011-6']
  • select all data from 2013/01 to 2013/02: dft['2013-1':'2013-2']
  • select from 2013/1/1 10:12:00: dft[datetime(2013, 1, 1, 10, 12, 0):datetime(2013, 2, 28, 10, 12, 0)]

A truncate convenience function is provided that is equivalent to slicing:
ts.truncate(before=’10/31/2011’, after=’12/31/2011’)

dateoffset

from datetime import datetime
from pandas.tseries.offsets import *
d = datetime(2008, 8, 18, 9, 0)
# plus 4 month 5 days
d + DateOffset(months=4, days=5)
# plus one week
d + Week()

Shifting / lagging

Shifting without realign of the data: ts.shift(5, freq='BM')

Resampling

.resample() is a time-based groupby, followed by a reduction method on each of its groups.

from datetime import datetime
start = datetime(2011, 1, 1)
end = datetime(2011, 3, 15)
ts = pd.date_range(start, end, freq='B')
ts = pd.DataFrame(list(range(len(ts))),index=ts)
'''
the how parameter by name,
including `sum, mean, std, sem, max, min, median, first, last, ohlc`.
'''
ts.resample('M').mean()

Window Function

For working with data, a number of windows functions are provided for computing common window or rolling statistics.

s = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
r = s.rolling(window=60)
s.plot(style='k--')
r.mean().plot(style='k')

The apply() function takes an extra func argument and performs generic rolling computations.
The func argument should be a single function that produces a single value from an ndarray input.

mad = lambda x: np.fabs(x - x.mean()).mean()
s.rolling(window=60).apply(mad).plot(style='k')

Difference of resampling and window function

They both operate and perform reductive operations on time-indexed pandas objects.
When using .rolling() with an offset. The offset is a time-delta. You will get a same sized result as the input.

When using .resample() with an offset. Construct a new index that is the frequency of the offset.

To summarize, .rolling() is a time-based window operation, while .resample() is a frequency-based window operation.

Group

A single group can be selected using GroupBy.get_group():
grouped.get_group('bar')

With grouped Series you can also pass a list or dict of functions to do aggregation with, outputting a DataFrame:

grouped = df.groupby('A')
grouped['C'].agg([np.sum, np.mean, np.std])
grouped['D'].agg({'result1' : np.sum,
'result2' : np.mean})
grouped.agg({'C' : np.sum,
'D' : lambda x: np.std(x, ddof=1)})

The filter method returns a subset of the original object.

Suppose we want to take only elements that belong to groups with a group sum greater than 2.

sf = pd.Series([1, 1, 2, 3, 3, 3])
sf.groupby(sf).filter(lambda x: x.sum() > 2)

Apply

Some operations on the grouped data might not fit into either the aggregate or transform categories.
grouped['C'].apply(lambda x: x.describe())