$$frac{partial}{partialtheta_j}J(theta_0,theta_1)=frac{partial}{partialtheta_j}frac{1}{2m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})^2$$
$$=frac{partial}{partialtheta_j}frac{1}{2m}sum_{i=1}^m(theta_0+theta_1x^{(i)}-y^{(i)})^2tag{1}$$
for $j=0,$
$$frac{partial}{partialtheta_0}J(theta_0,theta_1)=frac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*1tag{1.1}$$
for $j=1,$
$$frac{partial}{partialtheta_1}J(theta_0,theta_1)=frac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*x^{(i)}tag{1.2}$$
Gradient descent algorithm
Knowing from $(1.1)$ and $(1.2)$, repeat until convergence:
$$theta_0:=theta_0-alphafrac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})$$
$$theta_1:=theta_1-alphafrac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*x^{(i)}$$
Batch gradient descent
Batch
means we run through all amounts $(m)$ of the data every step when we get approach to the $min$ of $J(theta_i)$
近期评论