
Logistic Regression
Logistic regression is a learning algorithm used in a supervised learning problem when the output y are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data.
Given (x, hat{y} = P(y=1|x),) where ( 0 ≤ hat{y} ≤ 1 )
The parameters used in Logistic regression are:
- The input features vector: ( x in mathbb{R}^{n_x},) where \(n_x) is the number of features
- The training label: ( y in 0,1)
- The weights: ( w in mathbb{R}^{n_x},) where ( n_x) is the number of features
- The threshold: ( b in mathbb{R})
- The output: ( hat{y} = sigma(w^Tx+b) )
- Sigmoid function: ( s = sigma(w^Tx+b) = sigma(z)= frac {1}{1+e^{-z}} )

( (w^Tx+b) )is a linear function ( (ax+b) ), but since we are looking for a probability constraint between [0,1], the sigmoid function is used. The function is bounded between [0,1] as shown in the graph above.
Some observations from the graph:
- if ( z ) is a large positive number, then ( sigma(z) = 1 )
- if ( z ) is small or large negative number, then ( sigma(z) = 0 )
- if ( z= 0 ), then ( sigma(z) = 0.5 )
Logistic Regression: Cost Function
To train the parameters w and b, we need to define a cost function.
( {hat{y}}^{(i)} = sigma(w^Tx^{(i)}+b) ), where ( sigma(z^{(i)})= frac{1}{1+e^{-z^{(i)}}} )
(( x^{(i)} ) the i-th training example )
Given ( {(x^{(1)}, y^{(1)}), … ,(x^{(m)},y^{(m)})}, ) we want ( {hat{y}}^{(i)} ≈ y^{(i)} )
Loss(error) function
The loss function measures the discrepancy between the prediction ( {hat{y}}^{(i)} ) and the desired output ( y^{(i)} ). In other words, the loss function computes the error for a single training example.
( L({hat{y}}^{(i)}, y^{(i)}) = frac{1}{2}({hat{y}}^{(i)} - y^{(i)})^2 )
( L({hat{y}}^{(i)}, y^{(i)}) = -[(y^{(i)}log({hat{y}}^{(i)}) + (1-y^{(i)})log(1-{hat{y}}^{(i)})] )
- If ( y^{(i)} = 1: L({hat{y}}^{(i)}, y^{(i)}) = - log({hat{y}}^{(i)}) ) where ( log({hat{y}}^{(i)}) ) and ( {hat{y}}^{(i)} ) should be close to 1
- If ( y^{(i)} = 0: L({hat{y}}^{(i)}, y^{(i)}) = - log(1- {hat{y}}^{(i)}) ) where ( log(1 - {hat{y}}^{(i)})) and ( {hat{y}}^{(i)} ) should be close to 0
Cost function
The cost function is the average of the loss function of the entire training set. We are going to find the parameters w and b that minimize the overall cost function.
$$ J(w,b) = frac{1}{m} sum_{i=1}^{m} L({hat{y}}^{(i)}, y^{(i)}) = - frac{1}{m} sum_{i=1}^{m} [(y^{(i)}log({hat{y}}^{(i)})+(1-y^{(i)})log(1-{hat{y}}^{(i)})] $$
Loss function 和 cost function 的区别
Loss function是针对单个样本而言,\{hat{y}})和y的差异,而cost function则是针对整个训练集而言。




近期评论