Abstract
In this post, we derive the bias-variance decomposition of mean square error for regression.
Reference
Regression Decomposition
In regression analysis, it’s common to decompose the observed value as following:
where the true regression is regarded as a constant given .
The error (or the data noise) is independent of
with the assumption that follows a Gaussian distribution of mean 0 and variance .
Noticed that (0) is only a description of the data.
When we replace with its estimation , (0) turns into a more practical form:
Where the estimation of the true regression is regarded as a non-constant given ,
and the residual describes the gap between and .
Based on (0) and (1), we can make the following observations:
Bias-variance decomposition
Using (2) and (3), we can show that why minimizing mean squared error for regression problem is useful.
For the derivation, we need a few more identities related to expectation and variance.
Given any two independent random variable x, y, and a constant c, we have:
Begin with the definition of mean squared error; we can rewrite it in the form of expected value:
By expanding , we get
Noted that
because is independent of and
And we reach our final form (7) which is the sum of data noise variance ,
prediction variance
and the squared prediction bias .
Such result is the bias-variance decomposition.
Why variance matter in regression?
Lowing the prediction bias certainly gives the model higher accuracy on the training dataset;
however, to obtain similar performance outside of training dataset,
we want to prevent the model from overfitting the training dataset.
Given that the true regression has zero variance,
a robust model should have prediction variance as small as possible,
and this is consistent with the objective of the mean squared error.
近期评论