$(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…,(x^{(m)},y^{(m)})$

$m$ training sets.

$s_l$

numbers of units (without bias unit) in layer $l$ of the network. So $s_L$ will be the numbers of the output layer.

For example, in binary classification problem, $s_L=1$, $y$ can only be $0$ or $1$ for one output unit, it means that the result implies the input is / isn’t a specific class. Furthermore, in multi-classes problem, say $K$ distint classes, $yinmathbb{R}^{K}$, there are $K$ output units and $s_L=K$, $h_{theta}(x)inmathbb{R}^{K}$.

As a note, we only use one vs all method when the classes number is greater than or equal to three, i.e. $Kge3$ in a multi-classes problem.

$K$

$K=s_L$ also means the numbers of the output layer.

$(h_{Theta}(x))_i$

$i^{th}$ output.

Cost function

Regularized Logistic regression

$$J(theta)=-frac{1}{m}left[sum_{i=1}^my^{(i)}logleft(h_{theta}(x^{(i)})right)+(1-y^{(i)})logleft(1-h_{theta}(x^{(i)})right)right]+frac{lambda}{2m}sum_{i=1}^ntheta_j^2$$
where
$j=1,2,3,…,n$

Generalization of Regularized Logistic regression

For $h_{Theta}(x)inmathbb{R}^K$, i.e. $h_{Theta}(x)$ is a $K$ dimensional vector,

$$J(Theta)=-frac{1}{m}left[sum_{i=1}^msum_{k=1}^Ky_k^{(i)}logleft(h_{Theta}(x^{(i)})right)_k+(1-y_k^{(i)})logleft(1-h_{theta}(x^{(i)})_kright)right]+frac{lambda}{2m}sum_{l=1}^{L-1}sum_{i=1}^{s_l}sum_{j=1}^{s_l+1}left(Theta_{ji}^{(l)}right)^2$$

Same as above, $i$ start from $1$, we don’t regularize $Theta_{i0}^{(l)}$.

cost function in neural network (cs229) Cost function

$(x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),…,(x^{(m)},y^{(m)})$

$L$

$s_l$

$K$

$(h_{Theta}(x))_i$

Cost function

Regularized Logistic regression

Generalization of Regularized Logistic regression

近期文章

近期评论

标签

热门

文章归档

分类目录

功能