Underfitting (high bias) and overfitting (high varience) are both not good in regularization.
If we have too many features, the learned hypothesis may fit the training set very well, i.e. $J(theta)=frac{1}{2m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})^2approx0$, but it is too difficult to figure out the prediction of new examples.
Occurrence of overfitting
If there are too many features with a little of training data, the overfitting problem might occur.
Addressing overfitting
Reduce number of features
- Manually select which features to keep.
- Model selection algorithm.
But the disadvantage of throwing away some features is by reducing them, we are throwing away the informations of the data.
Regularization
- Keep all the features $(x_1,x_2,…,x_n)$, but reduce magnitude/values of parameters $theta_j$.
- Regularization works well when we have a lot of features, each of which contributes a bit to predicting $y$.
近期评论