Pandas is for fast analysis and data cleaning and preparation.
Built on top of Numpy.
Work with data from a wide variety of sources.
1. Pandas Series
Just like dictionary of Python:
import pandas as pd
pd.Series(data = mylist, index = labelList) pd.Series(myDict)
2. Pandas DataFrames
df = pd.DataFrame(randn(5,4),index=['A','B','C','D','E'],columns=['W','X','Y','Z']) df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
DataFrames
DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let’s use pandas to explore this topic!
import pandas as pd import numpy as np
from numpy.random import randn np.random.seed(101)
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
W
X
Y
Z
A
2.706850
0.628133
0.907969
0.503826
B
0.651118
-0.319318
-0.848077
0.605965
C
-2.018168
0.740122
0.528813
-0.589001
D
0.188695
-0.758872
-0.933237
0.955057
E
0.190794
1.978757
2.605967
0.683509
Selection and Indexing
Let’s learn the various methods to grab data from a DataFrame
drop method’s default axis = 0 ( = the labels), so if we want to delete a column, we shall set axis = 1 Beacuse Pandas is built on Numpy, so it use Numpy to save the data: for df.shape = (5,4), we have df.shape[axis = 1] = Column.
the drop function isn’t inplace df. If we want to do inplace, we need: df.drop(‘new’,axis=1,inplace=True)
df.drop('new',axis=1)
W
X
Y
Z
A
2.706850
0.628133
0.907969
0.503826
B
0.651118
-0.319318
-0.848077
0.605965
C
-2.018168
0.740122
0.528813
-0.589001
D
0.188695
-0.758872
-0.933237
0.955057
E
0.190794
1.978757
2.605967
0.683509
df.drop('new',axis=1,inplace=True)
W
X
Y
Z
A
2.706850
0.628133
0.907969
0.503826
B
0.651118
-0.319318
-0.848077
0.605965
C
-2.018168
0.740122
0.528813
-0.589001
D
0.188695
-0.758872
-0.933237
0.955057
E
0.190794
1.978757
2.605967
0.683509
Can also drop rows this way:
df.drop('E',axis=0)
W
X
Y
Z
A
2.706850
0.628133
0.907969
0.503826
B
0.651118
-0.319318
-0.848077
0.605965
C
-2.018168
0.740122
0.528813
-0.589001
D
0.188695
-0.758872
-0.933237
0.955057
Selecting Rows
df.loc['A']
W 2.706850
X 0.628133
Y 0.907969
Z 0.503826
Name: A, dtype: float64
Or select based off of position instead of label: iloc = Index Location (0 Base)
df.iloc[2]
W -2.018168
X 0.740122
Y 0.528813
Z -0.589001
Name: C, dtype: float64
# Reset to default 0,1...n index # Note : 1. the current index will be listed in a new column # : 2. if we want to do it inplace, use df.reset_index(inplace = True) df.reset_index()
index
W
X
Y
Z
new
0
A
2.706850
0.628133
0.907969
0.503826
3.614819
1
B
0.651118
-0.319318
-0.848077
0.605965
-0.196959
2
C
-2.018168
0.740122
0.528813
-0.589001
-1.489355
3
D
0.188695
-0.758872
-0.933237
0.955057
-0.744542
4
E
0.190794
1.978757
2.605967
0.683509
2.796762
Set New Index by A Column in Table
newind = 'CA NY WY OR CO'.split() df['States'] = newind df.set_index('States',inplace=True)
近期评论