
Q1:Estimated total cost for each event
-Use Python to solve:
Steps
1.create a new dataframe called ‘Total_cost’, Groupby ‘EVENT TYPE’ and get each type’s ‘ESTIMATED TOTAL COST’
2.Filter out ‘ESTIMATED TOTAL COST’>0 and assign it to ‘Total_cost_new’
3.Get the index of ‘Total_cost_new’ and assign it to ‘event_type’
4.Use sns.barplot() to plot the figture
Total_cost=pd.DataFrame(data.groupby('EVENT TYPE')['ESTIMATED TOTAL COST'].sum())
Total_cost_new=Total_cost.loc[Total_cost['ESTIMATED TOTAL COST']>0]
event_type=Total_cost_new.index.values
plt1=sns.barplot(x='ESTIMATED TOTAL COST',y=event_type,data=Total_cost_new)

-Use PowerBI to solve:
Steps
1.create a new dataframe called ‘Total cost for each event’, Groupby ‘EVENT TYPE’ and get each type’s ‘ESTIMATED TOTAL COST’

2.Filter out ‘ESTIMATED TOTAL COST’>0

3.Plot figture of Estimated total cost for each event

Q2: Number of event happend per year where DFAA payments (both federal and provincial) have been applied.
-Use Python to solve:
Steps
1.create a new dataframe called ‘Event_per_year’
2.Conver the type of ‘EVENT START DATE’ from string to date format
3.Extract the year from the ‘EVENT START DATE’ column
4.Filter out ‘FEDERAL DFAA PAYMENTS’>0 and ‘PROVINCIAL DFAA PAYMENTS’>0 assign it to ‘Event_per_year_new’
5.Count the num of events happened per year, get the index and value for the following figture
6.Use sns.barplot() to plot the figture
Event_per_year=pd.DataFrame(data)
Event_per_year.head()
Event_per_year['EVENT START DATE']=pd.to_datetime(Event_per_year['EVENT START DATE'],dayfirst=True)
Event_per_year['YEAR']=Event_per_year['EVENT START DATE'].apply(lambda x:x.year)
Event_per_year_new=pd.DataFrame(Event_per_year[(Event_per_year["FEDERAL DFAA PAYMENTS"]>0 )&
(Event_per_year["PROVINCIAL DFAA PAYMENTS"]>0)])
Event_per_year_new
Num_event_yearly=Event_per_year_new['YEAR'].value_counts()
print(Num_event_yearly)
print(Num_event_yearly.index)
print(Num_event_yearly.values)
plt2=sns.barplot(x=Num_event_yearly.index,y=Num_event_yearly.values,data=Event_per_year_new)
plt2.set(xlabel='Year',ylabel='Number of events happened', title='Number of events happened
each year where DFAA payments applied')

-Use PowerBI to solve:
Steps
1.Split the ‘EVENT START DATE’ column

2.Based on delimiter=’/‘to split


3.Filter out the “FEDERAL DFAA PAYMENTS”]>0 and “PROVINCIAL DFAA PAYMENTS”]>0 rows

4.plot figture of Number of event happend per year where DFAA payments (both federal and provincial) have been applied

Q3:Duration of events
-Use Python to solve:
Steps
1.create a new dataframe called ‘Duration’
2.Conver the type of ‘EVENT START DATE’ and ‘EVENT END DATE’ from string to date format
3.Calculate the result of Duration[“EVENT END DATE”]-Duration[“EVENT START DATE”], converted the difference’s type and assign it to Duration[‘DURATION’]
4.Use sns.hist() to plot the figture
Duration=pd.DataFrame(data)
Duration['EVENT START DATE']=pd.to_datetime(Duration['EVENT START DATE'],dayfirst=True)
Duration['EVENT END DATE']=pd.to_datetime(Duration['EVENT END DATE'],dayfirst=True)
Duration['DURATION_DAYS']=(Duration["EVENT END DATE"]-Duration["EVENT START DATE"]).astype('timedelta64[D]')
duration_max=data['DURATION_DAYS'].max()
duration_max
id=data['DURATION_DAYS'].idxmax()
id
plt3=plt.boxplot(data['DURATION_DAYS'])
plt.show()
plt.hist(data["DURATION_DAYS"],bins=10,range=(0,50))
plt.show()

-Use PowerBI to solve:
1.Convert the type of EVENT END DATE and EVENT START DATE to date, if occurs errors, change the local option

2.Create a new column and calculate the difference of Duration[“EVENT END DATE”]-Duration[“EVENT START DATE”]

Q4 Get the province from the ‘PLACE’ column
-Use Python to solve:
Steps
1.create a new dataframe called ‘Province’
2.Create a new column ‘ORGANIZED’ and assign 0 to it
3.Slice the last two letter from the ‘PLACE’ column and assign to ‘Province’
4.For every entry in ‘Province’ column if it is in abbre list then we label as ‘ORGANIZED’=1 otherwise ‘ORGANIZED’=0
5.Filter out all these unorganized row to create a new dataframe ‘Unorg’
6.Filter out ‘British Columbia’ from Unorg[‘PLACE’], then label the corresponding province as ‘BC’
abbre=['NL','PE','NS','NB','QC','ON','MB','SK','AB','BC','YT','NT','NU']
province=['Nunavut','Quebec','Nothwest Territories','Ontario','British Columbia','Alberta',
'Saskatchewan','Manitoba','Yukon','Newfoundland and Labrador','New Brunswick','Nova Scotia',
'Prince Edward Island']
import sys
i=0
Province=pd.DataFrame(data)
Province['ORGANIZED']=0
Province['Province']=Province['PLACE'].apply(lambda x: x[-2:])
for entry in Province['Province']:
if entry in abbre:
Province.ix[i,'ORGANIZED']=1
else:
Province.ix[i,'ORGANIZED']=0
i=i+1
Province.head()
Unorg= pd.DataFrame(Province.loc[Province['ORGANIZED'] == 0])
Unorg['Province']=0
Unorg.head()
BC=Unorg[Unorg['PLACE'].str.contains('British Columbia')]
BC.index
Unorg.ix[BC.index,'Province']='BC'





近期评论