深度学习第二章课程(吴恩达)学习笔记(1)

Logistic Regression

Logistic regression is a learning algorithm used in a supervised learning problem when the output y are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data.

Given (x, hat{y} = P(y=1|x),) where ( 0 ≤ hat{y} ≤ 1 )

The parameters used in Logistic regression are:

  • The input features vector: ( x in mathbb{R}^{n_x},) where \(n_x) is the number of features
  • The training label: ( y in 0,1)
  • The weights: ( w in mathbb{R}^{n_x},) where ( n_x) is the number of features
  • The threshold: ( b in mathbb{R})
  • The output: ( hat{y} = sigma(w^Tx+b) )
  • Sigmoid function: ( s = sigma(w^Tx+b) = sigma(z)= frac {1}{1+e^{-z}} )

sigmoid函数的图像

( (w^Tx+b) )is a linear function ( (ax+b) ), but since we are looking for a probability constraint between [0,1], the sigmoid function is used. The function is bounded between [0,1] as shown in the graph above.

Some observations from the graph:

  • if ( z ) is a large positive number, then ( sigma(z) = 1 )
  • if ( z ) is small or large negative number, then ( sigma(z) = 0 )
  • if ( z= 0 ), then ( sigma(z) = 0.5 )

Logistic Regression: Cost Function

To train the parameters w and b, we need to define a cost function.

( {hat{y}}^{(i)} = sigma(w^Tx^{(i)}+b) ), where ( sigma(z^{(i)})= frac{1}{1+e^{-z^{(i)}}} )

(( x^{(i)} ) the i-th training example )

Given ( {(x^{(1)}, y^{(1)}), … ,(x^{(m)},y^{(m)})}, ) we want ( {hat{y}}^{(i)} ≈ y^{(i)} )

Loss(error) function

The loss function measures the discrepancy between the prediction ( {hat{y}}^{(i)} ) and the desired output ( y^{(i)} ). In other words, the loss function computes the error for a single training example.

( L({hat{y}}^{(i)}, y^{(i)}) = frac{1}{2}({hat{y}}^{(i)} - y^{(i)})^2 )

( L({hat{y}}^{(i)}, y^{(i)}) = -[(y^{(i)}log({hat{y}}^{(i)}) + (1-y^{(i)})log(1-{hat{y}}^{(i)})] )

  • If ( y^{(i)} = 1: L({hat{y}}^{(i)}, y^{(i)}) = - log({hat{y}}^{(i)}) ) where ( log({hat{y}}^{(i)}) ) and ( {hat{y}}^{(i)} ) should be close to 1
  • If ( y^{(i)} = 0: L({hat{y}}^{(i)}, y^{(i)}) = - log(1- {hat{y}}^{(i)}) ) where ( log(1 - {hat{y}}^{(i)})) and ( {hat{y}}^{(i)} ) should be close to 0

Cost function

The cost function is the average of the loss function of the entire training set. We are going to find the parameters w and b that minimize the overall cost function.

$$ J(w,b) = frac{1}{m} sum_{i=1}^{m} L({hat{y}}^{(i)}, y^{(i)}) = - frac{1}{m} sum_{i=1}^{m} [(y^{(i)}log({hat{y}}^{(i)})+(1-y^{(i)})log(1-{hat{y}}^{(i)})] $$

Loss function 和 cost function 的区别

Loss function是针对单个样本而言,\{hat{y}})和y的差异,而cost function则是针对整个训练集而言。