tips in programming (cs229)

Normal equation

$$h_theta(x)=sum_{j=0}^ntheta_jx_jtag{1}$$
$$=mathbf{theta^top x}tag{2}$$

where

$$mathbf{theta}=
begin{Bmatrix}
theta_0\
theta_1\
theta_2
end{Bmatrix}
$$

$$mathbf{x}=
begin{Bmatrix}
x_0\
x_1\
x_2
end{Bmatrix}
$$

With unvectorized eqution $(1)$, we implement $h_theta(x)$ as:

1
2
3
4
prediction = 0.0;
for j = 1: (n+1)
prediction = prediction + theta(j) * x(j)
end;

To simply our implementation and without using for loop, the vectorized equation $(2)$ can be rewrite to:

1
prediction = theta' * x;

theta’ stands for the transpose matrix of theta matrix.

C++ example

Unvectorized implementation

1
2
3
double prediction = 0.0;
for (j = 0; j < n; j++)
prediction += theta[j] * x[j];

Vectorized implementation

1
double prediction = theta.transpose() * x;

Gradient descendent

Gradient descent equations for $n=2$:

$$theta_0:=theta_0-alphafrac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*x_0^{(i)}$$
$$theta_1:=theta_1-alphafrac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*x_1^{(i)}$$
$$theta_2:=theta_2-alphafrac{1}{m}sum_{i=1}^m(h_{theta}(x^{(i)})-y^{(i)})*x_2^{(i)}$$

Vectorized implementation

We can simplify the equations above to:

$$theta:=theta-alphadelta$$

Where

$$delta_j=frac{1}{m}sum_{i=1}^m(h_theta(x^{(i)})-y^{(i)})*x_j^{(i)}$$

$$mathbf{delta}=
begin{Bmatrix}
delta_0\
delta_1\
delta_2
end{Bmatrix}$$

$$mathbf{x^{(i)}}=
begin{Bmatrix}
x_0^{(i)}\
x_1^{(i)}\
x_2^{(i)}
end{Bmatrix}$$

$h_theta$ and $x^{(i)}$ are vectors, $h_theta*x^{(i)}$ turns out to be a scalar.

$(h_theta x^{(i)}-y^{(i)})*x_j^{(i)}$ is a vector.

Using vectorization skill is much more efficient than for loop while there are many features for us to consider.