深度学习第二章课程(吴恩达)学习笔记（1）

Logistic Regression

Logistic regression is a learning algorithm used in a supervised learning problem when the output y are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data.

Given (x, hat{y} = P(y=1|x),) where ( 0 ≤ hat{y} ≤ 1 )

The parameters used in Logistic regression are:

The input features vector: ( x in mathbb{R}^{n_x},) where \(n_x) is the number of features
The training label: ( y in 0,1)
The weights: ( w in mathbb{R}^{n_x},) where ( n_x) is the number of features
The threshold: ( b in mathbb{R})
The output: ( hat{y} = sigma(w^Tx+b) )
Sigmoid function: ( s = sigma(w^Tx+b) = sigma(z)= frac {1}{1+e^{-z}} )

( (w^Tx+b) )is a linear function ( (ax+b) )， but since we are looking for a probability constraint between [0,1], the sigmoid function is used. The function is bounded between [0,1] as shown in the graph above.

Some observations from the graph:

if ( z ) is a large positive number, then ( sigma(z) = 1 )
if ( z ) is small or large negative number, then ( sigma(z) = 0 )
if ( z= 0 ), then ( sigma(z) = 0.5 )

Logistic Regression: Cost Function

To train the parameters w and b, we need to define a cost function.

( {hat{y}}^{(i)} = sigma(w^Tx^{(i)}+b) ), where ( sigma(z^{(i)})= frac{1}{1+e^{-z^{(i)}}} )

(( x^{(i)} ) the i-th training example )

Given ( {(x^{(1)}, y^{(1)}), … ,(x^{(m)},y^{(m)})}, ) we want ( {hat{y}}^{(i)} ≈ y^{(i)} )

Loss(error) function

The loss function measures the discrepancy between the prediction ( {hat{y}}^{(i)} ) and the desired output ( y^{(i)} ). In other words, the loss function computes the error for a single training example.

( L({hat{y}}^{(i)}, y^{(i)}) = frac{1}{2}({hat{y}}^{(i)} - y^{(i)})^2 )

( L({hat{y}}^{(i)}, y^{(i)}) = -[(y^{(i)}log({hat{y}}^{(i)}) + (1-y^{(i)})log(1-{hat{y}}^{(i)})] )

If ( y^{(i)} = 1: L({hat{y}}^{(i)}, y^{(i)}) = - log({hat{y}}^{(i)}) ) where ( log({hat{y}}^{(i)}) ) and ( {hat{y}}^{(i)} ) should be close to 1
If ( y^{(i)} = 0: L({hat{y}}^{(i)}, y^{(i)}) = - log(1- {hat{y}}^{(i)}) ) where ( log(1 - {hat{y}}^{(i)})) and ( {hat{y}}^{(i)} ) should be close to 0

Cost function

The cost function is the average of the loss function of the entire training set. We are going to find the parameters w and b that minimize the overall cost function.

$$ J(w,b) = frac{1}{m} sum_{i=1}^{m} L({hat{y}}^{(i)}, y^{(i)}) = - frac{1}{m} sum_{i=1}^{m} [(y^{(i)}log({hat{y}}^{(i)})+(1-y^{(i)})log(1-{hat{y}}^{(i)})] $$

Loss function 和 cost function 的区别

Loss function是针对单个样本而言，\{hat{y}})和y的差异，而cost function则是针对整个训练集而言。

深度学习第二章课程(吴恩达)学习笔记（1）

Logistic Regression

Logistic Regression: Cost Function

Loss(error) function

Cost function

近期文章

近期评论

标签

热门

文章归档

分类目录

功能