
$(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…,(x^{(m)},y^{(m)})$
$m$ training sets.
$L$
Total numbers of layers in network.
$s_l$
numbers of units (without bias unit) in layer $l$ of the network. So $s_L$ will be the numbers of the output layer.
For example, in binary classification problem, $s_L=1$, $y$ can only be $0$ or $1$ for one output unit, it means that the result implies the input is / isn’t a specific class. Furthermore, in multi-classes problem, say $K$ distint classes, $yinmathbb{R}^{K}$, there are $K$ output units and $s_L=K$, $h_{theta}(x)inmathbb{R}^{K}$.
As a note, we only use one vs all method when the classes number is greater than or equal to three, i.e. $Kge3$ in a multi-classes problem.
$K$
$K=s_L$ also means the numbers of the output layer.
$(h_{Theta}(x))_i$
$i^{th}$ output.
Cost function
Regularized Logistic regression
$$J(theta)=-frac{1}{m}left[sum_{i=1}^my^{(i)}logleft(h_{theta}(x^{(i)})right)+(1-y^{(i)})logleft(1-h_{theta}(x^{(i)})right)right]+frac{lambda}{2m}sum_{i=1}^ntheta_j^2$$
where
$j=1,2,3,…,n$
Generalization of Regularized Logistic regression
For $h_{Theta}(x)inmathbb{R}^K$, i.e. $h_{Theta}(x)$ is a $K$ dimensional vector,
$$J(Theta)=-frac{1}{m}left[sum_{i=1}^msum_{k=1}^Ky_k^{(i)}logleft(h_{Theta}(x^{(i)})right)_k+(1-y_k^{(i)})logleft(1-h_{theta}(x^{(i)})_kright)right]+frac{lambda}{2m}sum_{l=1}^{L-1}sum_{i=1}^{s_l}sum_{j=1}^{s_l+1}left(Theta_{ji}^{(l)}right)^2$$
Same as above, $i$ start from $1$, we don’t regularize $Theta_{i0}^{(l)}$.




近期评论