
To compute $J(theta)$ and $frac{partial}{partialtheta_j}J(theta)$ with given $theta$ more efficiently, here are some algorithms:
- Gradient descent
- Conjugate gradient
- BFGS
- L-BFGS
For those three algorithms after gradient descent, their advantages are no need to manually pick $alpha$ and faster than gradient descent oftenly, but are even more sophsiticated.
Think about that the three algorithms have more clever inter-loop called line search algorithm, which automatically tries out differet learning rate $alpha$ and picks a good learning rate for every iteration so we don’t have to choose it ourselves.
Because of the better $alpha$ chosen, these algorithms end up converging much faster than gradient descent.
Example
For $n=2$
Given
$$mathbf{theta}=
begin{Bmatrix}
theta_1 \
theta_2
end{Bmatrix}$$
$$J(theta)=(theta_1-5)^2+(theta_2-5)^2tag{1}$$
Derivate terms
$$frac{partial}{partialtheta_1}J(theta)=2(theta_1-5)$$
$$frac{partial}{partialtheta_2}J(theta)=2(theta_2-5)$$
We can get $min_{theta}J(theta)$ easily by observing $(1)$, i.e.
$$mathbf{theta}=
begin{Bmatrix}
5 \
5
end{Bmatrix}$$
Implementation
|
|
costFunction returns two values: cost function value jVal and derivate terms value in a matrix type assign to gradient.
|
|
fminunc is stands for function minimization unconstrained in Octave.
Remind that $thetainmathbb{R^d},dge2$
For n $theta$
Implementation
|
|
Where
$$frac{partial}{partialtheta_0}J(theta)=frac{1}{m}sum_{i=1}^m[(h_{theta}(x^{(i)}))-y^{(i)}*x_0^{(i)}]$$
$$frac{partial}{partialtheta_1}J(theta)=frac{1}{m}sum_{i=1}^m[(h_{theta}(x^{(i)}))-y^{(i)}*x_1^{(i)}]$$




近期评论