Training set

$$lbrace(x^{(1)},y^{(1)}),(x^{(1)},y^{(1)}),…,(x^{(m)},y^{(m)})rbrace$$

$m$ examples

$$mathbf{x}=
begin{Bmatrix}
x_0 \
x_1 \
vdots \
x_n
end{Bmatrix}inmathbb{R}^{n+1}$$

$$yinlbrace0,1rbrace$$

$$h_{theta}(x)=frac{1}{1+e^{-mathbf{theta^top}x}}$$

How to choose $mathbftheta$ ?

If we take cost function as a squared function like:

$$Cost(h_{theta}(x^{(i)}),y^{(i)})=frac{1}{2}(h_{theta}(x^{(i)})-y^{(i)})^2$$

Then the linear regression will become

$$J(theta)=frac{1}{m}sum_{i=1}^mfrac{1}{2}(h_{theta}(x^{(i)})-y^{(i)})^2\=frac{1}{m}sum_{i=1}^mCost(h_{theta}(x^{(i)}),y^{(i)})$$

To be brief, we get rid of the superscript:

$$Cost(h_{theta}(x),y)=frac{1}{2}(h_{theta}(x)-y)^2$$

We can proof that if a function is a convex funtion, then we’ll reach the global minmum running gradient descnet.

Assume
$$
Cost(h_{theta}(x),y)=
begin{cases}
-log(h_{theta}(x)), text{if $y=1$} \
-log(1-h_{theta}(x)), text{if $y=0$}
end{cases}
$$

Obviously, we can find the global minmum now as define the cost function a kind of convex function.