Arthur Samuel(1959): Field of study that gives compuers the ability to learn without being explicitly programmed.

Tom Mitchell(1998): A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Machine learning algorithms

Suspervised learning
Unsuspervised learning
Others like: Reinforcement learning, recommender systems.

Example: Playing chess

E = the Experience of playing many chess games.
T = Task of playing chess
P = the probability of win the chess game

Suspervised learning

Given a data set and know how the correct output should look like. Having the idea that there is a relationship between the “input” and “output”.

“Regression” Problems

input variables –> continuous function

Examples:

① Data about house size & price. –> Predict the price of a house sized 100 m^2.

② Pictures of many people with different ages. –> Predict a picture of “John”, and predict his age.

“Classification” Problems

input variables –> discrete categories

Examples:

① Data about a house price & time. –> Predict if tomorrow’s price is higher than $100,000

② Data about humors —— malignant / benign –> Predict the patient’s tumor is malignant or benig.

Unsuspervised learning

Allow us to approach problems with little or no idea what out results should be like. We can get some information by using the data.

Clustering the data on relationships among the variables in the data.
No feedback based on the prediction results.

Example:

① Find a way to group 1,000,000 different genes into similar or related groups. (Clustering)

② The “Cocktail Party Algorithm”, allows you to find structure in a chaotic environment. (Non-clustering)

Model Representation

Some symbols

x⁽ⁱ⁾: “input” variables (features)
y⁽ⁱ⁾: “output/target” variables
(x⁽ⁱ⁾, y⁽ⁱ⁾): a training example
m: the training set
X: the space of input
Y: the space of output

The following is a simple example picture of machine learning ↓

where h(x) refers to “Hypothesis Function“

The Cost Function

J(θ₀, θ₁) = 1/(2m)*(∑(h_θ(x_i)- y_i)²

This function is otherwise called the “Squared error function” or “Mean squared error”.

The mean is halved (1/2) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 1/2 term.

So the key to solve the problem is to find the global minimum of J(θ₀, θ₁).

The simpliest way to find minimum - Gradient Descent Algorithm

θ_j := θ_j - α∂/∂θ_j*J(θ₀,θ₁)

:= means assignment

where j = 0, 1 represents the feature index number. At each iteration of j, one should simulaneously update the parameters, θ. Until we reach the minimum. One thing we should be considered is that we must calculated all the value of θ first and then try to assign a new value to all the θ. Otherwise, it can not be called Gradient Descent Algorithm and may lead to errors.

How does Gradient Descent converge with a fixed step size α？

Updated @ 2019.04.24 17点03分

machine-learning introduction Machine learning algorithms Model Representation

Machine learning algorithms

Suspervised learning

“Regression” Problems

“Classification” Problems

Unsuspervised learning

Model Representation

Some symbols

The Cost Function

The simpliest way to find minimum - Gradient Descent Algorithm

近期文章

近期评论

标签

热门

文章归档

分类目录

功能