
Training set
$$lbrace(x^{(1)},y^{(1)}),(x^{(1)},y^{(1)}),…,(x^{(m)},y^{(m)})rbrace$$
$m$ examples
$$mathbf{x}=
begin{Bmatrix}
x_0 \
x_1 \
vdots \
x_n
end{Bmatrix}inmathbb{R}^{n+1}$$
Labels
$$yinlbrace0,1rbrace$$
Hypothesis
$$h_{theta}(x)=frac{1}{1+e^{-mathbf{theta^top}x}}$$
How to choose $mathbftheta$ ?
Logistic Regression
If we take cost function as a squared function like:
$$Cost(h_{theta}(x^{(i)}),y^{(i)})=frac{1}{2}(h_{theta}(x^{(i)})-y^{(i)})^2$$
Then the linear regression will become
$$J(theta)=frac{1}{m}sum_{i=1}^mfrac{1}{2}(h_{theta}(x^{(i)})-y^{(i)})^2\=frac{1}{m}sum_{i=1}^mCost(h_{theta}(x^{(i)}),y^{(i)})$$
To be brief, we get rid of the superscript:
$$Cost(h_{theta}(x),y)=frac{1}{2}(h_{theta}(x)-y)^2$$
for assigning the hypothesis function to an logistic function, the cost furntion becomes a non-convex function, it means that we can not guarantee to converge to the global minimum when we run gradient descent.
We can proof that if a function is a convex funtion, then we’ll reach the global minmum running gradient descnet.
Choose a logistic regression cost function with a convex function
Assume
$$
Cost(h_{theta}(x),y)=
begin{cases}
-log(h_{theta}(x)), text{if $y=1$} \
-log(1-h_{theta}(x)), text{if $y=0$}
end{cases}
$$

Obviously, we can find the global minmum now as define the cost function a kind of convex function.




近期评论