In this chapter we will go trough the essential steps that you will need to take before beginning to build predictive models.
How it works
Get the Data with Pandas
Understanding your data
Rose vs Jack, or Female vs Male
Does age play a role
First Prediction
Predicting with Decision Tree
Intro to decision trees
Cleaning and Formatting your Data
Creating your first decision tree
Interpreting your decision tree
Predict and submit to Kaggle
Overfitting and how to control it
Feature-engineering for our Titanic data set
Data Science is an art that benefits from a human element. Enter feature engineering: creatively engineering your own features by combining the different existing variables.
While feature engineering is a discipline in itself, too broad to be covered here in detail, you will have a look at a simple example by creating your own new predictive attribute: family_size.
A valid assumption is that larger families need more time to get together on a sinking ship, and hence have lower probability of surviving. Family size is determined by the variables SibSp andParch, which indicate the number of family members a certain passenger is traveling with. So when doing feature engineering, you add a new variable family_size, which is the sum of SibSpand Parch plus one (the observation itself), to the test and train set.
Improving your predictions
What techniques can you use to improve your predictions even more? One possible way is by making use of the machine learning method Random Forest. Namely, a forest is just a collection of trees…
近期评论