月落丹枫

Q1:Estimated total cost for each event

-Use Python to solve:

Steps
1.create a new dataframe called ‘Total_cost’, Groupby ‘EVENT TYPE’ and get each type’s ‘ESTIMATED TOTAL COST’
2.Filter out ‘ESTIMATED TOTAL COST’>0 and assign it to ‘Total_cost_new’
3.Get the index of ‘Total_cost_new’ and assign it to ‘event_type’
4.Use sns.barplot() to plot the figture

Total_cost=pd.DataFrame(data.groupby('EVENT TYPE')['ESTIMATED TOTAL COST'].sum())
Total_cost_new=Total_cost.loc[Total_cost['ESTIMATED TOTAL COST']>0]

event_type=Total_cost_new.index.values
plt1=sns.barplot(x='ESTIMATED TOTAL COST',y=event_type,data=Total_cost_new)
图片名称

-Use PowerBI to solve:

Steps
1.create a new dataframe called ‘Total cost for each event’, Groupby ‘EVENT TYPE’ and get each type’s ‘ESTIMATED TOTAL COST’

图片名称

2.Filter out ‘ESTIMATED TOTAL COST’>0

图片名称

3.Plot figture of Estimated total cost for each event

图片名称

Q2: Number of event happend per year where DFAA payments (both federal and provincial) have been applied.

-Use Python to solve:

Steps
1.create a new dataframe called ‘Event_per_year’
2.Conver the type of ‘EVENT START DATE’ from string to date format
3.Extract the year from the ‘EVENT START DATE’ column
4.Filter out ‘FEDERAL DFAA PAYMENTS’>0 and ‘PROVINCIAL DFAA PAYMENTS’>0 assign it to ‘Event_per_year_new’
5.Count the num of events happened per year, get the index and value for the following figture
6.Use sns.barplot() to plot the figture

Event_per_year=pd.DataFrame(data)
Event_per_year.head()
Event_per_year['EVENT START DATE']=pd.to_datetime(Event_per_year['EVENT START DATE'],dayfirst=True)
Event_per_year['YEAR']=Event_per_year['EVENT START DATE'].apply(lambda x:x.year)
Event_per_year_new=pd.DataFrame(Event_per_year[(Event_per_year["FEDERAL DFAA PAYMENTS"]>0 )&
 (Event_per_year["PROVINCIAL DFAA PAYMENTS"]>0)])
Event_per_year_new

Num_event_yearly=Event_per_year_new['YEAR'].value_counts()
print(Num_event_yearly)

print(Num_event_yearly.index)
print(Num_event_yearly.values)

plt2=sns.barplot(x=Num_event_yearly.index,y=Num_event_yearly.values,data=Event_per_year_new)
plt2.set(xlabel='Year',ylabel='Number of events happened', title='Number of events happened
each year where DFAA payments applied')
图片名称

-Use PowerBI to solve:

Steps
1.Split the ‘EVENT START DATE’ column

图片名称

2.Based on delimiter=’/‘to split

图片名称
图片名称

3.Filter out the “FEDERAL DFAA PAYMENTS”]>0 and “PROVINCIAL DFAA PAYMENTS”]>0 rows

图片名称

4.plot figture of Number of event happend per year where DFAA payments (both federal and provincial) have been applied

图片名称

Q3:Duration of events

-Use Python to solve:

Steps
1.create a new dataframe called ‘Duration’
2.Conver the type of ‘EVENT START DATE’ and ‘EVENT END DATE’ from string to date format
3.Calculate the result of Duration[“EVENT END DATE”]-Duration[“EVENT START DATE”], converted the difference’s type and assign it to Duration[‘DURATION’]
4.Use sns.hist() to plot the figture

Duration=pd.DataFrame(data)
Duration['EVENT START DATE']=pd.to_datetime(Duration['EVENT START DATE'],dayfirst=True)
Duration['EVENT END DATE']=pd.to_datetime(Duration['EVENT END DATE'],dayfirst=True)

Duration['DURATION_DAYS']=(Duration["EVENT END DATE"]-Duration["EVENT START DATE"]).astype('timedelta64[D]')

duration_max=data['DURATION_DAYS'].max()
duration_max
id=data['DURATION_DAYS'].idxmax()
id

plt3=plt.boxplot(data['DURATION_DAYS'])
plt.show()
plt.hist(data["DURATION_DAYS"],bins=10,range=(0,50))
plt.show()

图片名称

-Use PowerBI to solve:

1.Convert the type of EVENT END DATE and EVENT START DATE to date, if occurs errors, change the local option

图片名称

2.Create a new column and calculate the difference of Duration[“EVENT END DATE”]-Duration[“EVENT START DATE”]

图片名称

Q4 Get the province from the ‘PLACE’ column

-Use Python to solve:

Steps
1.create a new dataframe called ‘Province’
2.Create a new column ‘ORGANIZED’ and assign 0 to it
3.Slice the last two letter from the ‘PLACE’ column and assign to ‘Province’
4.For every entry in ‘Province’ column if it is in abbre list then we label as ‘ORGANIZED’=1 otherwise ‘ORGANIZED’=0
5.Filter out all these unorganized row to create a new dataframe ‘Unorg’
6.Filter out ‘British Columbia’ from Unorg[‘PLACE’], then label the corresponding province as ‘BC’

abbre=['NL','PE','NS','NB','QC','ON','MB','SK','AB','BC','YT','NT','NU']
province=['Nunavut','Quebec','Nothwest Territories','Ontario','British Columbia','Alberta',
'Saskatchewan','Manitoba','Yukon','Newfoundland and Labrador','New Brunswick','Nova Scotia',
'Prince Edward Island']
import sys
i=0
Province=pd.DataFrame(data)
Province['ORGANIZED']=0
Province['Province']=Province['PLACE'].apply(lambda x: x[-2:])
for entry in Province['Province']:
    if entry in abbre:
        Province.ix[i,'ORGANIZED']=1
    else:
        Province.ix[i,'ORGANIZED']=0
    i=i+1
Province.head()

Unorg= pd.DataFrame(Province.loc[Province['ORGANIZED'] == 0])
Unorg['Province']=0
Unorg.head()

BC=Unorg[Unorg['PLACE'].str.contains('British Columbia')]
BC.index
Unorg.ix[BC.index,'Province']='BC'

图片名称