gradient descent for linear regression (stanford cs229) Gradient descent algorithm Batch gradient descent

$$frac{partial}{partialtheta_j}J(theta_0,theta_1)=frac{partial}{partialtheta_j}frac{1}{2m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})^2$$
$$=frac{partial}{partialtheta_j}frac{1}{2m}sum_{i=1}^m(theta_0+theta_1x^{(i)}-y^{(i)})^2tag{1}$$

for $j=0,$
$$frac{partial}{partialtheta_0}J(theta_0,theta_1)=frac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*1tag{1.1}$$

for $j=1,$
$$frac{partial}{partialtheta_1}J(theta_0,theta_1)=frac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*x^{(i)}tag{1.2}$$

Gradient descent algorithm

Knowing from $(1.1)$ and $(1.2)$, repeat until convergence:
$$theta_0:=theta_0-alphafrac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})$$
$$theta_1:=theta_1-alphafrac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*x^{(i)}$$

Batch gradient descent

Batch means we run through all amounts $(m)$ of the data every step when we get approach to the $min$ of $J(theta_i)$