learning to answer yes or no

What hypothesis set can we use?

A Simple Hypothesis Set: the “Perceptron”

Perceptron in $mathbb{R}^2$

$h(x) = sign(w_0 + w_1x_1 + w_2x_2)$

features x: points on the plane (or points in $mathbb{R}^d$ )
labels y: (+1), × (-1)
hypothesis h: lines (or hyperplanes in $mathbb{R}^d$ ),positive on one side of a line, negative on the other side
different line classifies simples differently

perceptrons <=> linear (binary) classifiers

Select g from $mathscr{H}$

want: $g approx f$ (hard when f unknown)
almost necessary: $g approx f$ on D, ideally $g(x_n) = f(x_n) = y_n$
difficult: H is of infinite size
idea: start from some $g_0$, and correct its mistakes on $D$

Perceptron Learning Algorithm

if PLA halts (i.e. no more mistakes),
(necessary condition) $D$ allows some w to make no mistake
call such $D$ linear separable
as long as linear separable and correct by mistake
- inner product of $w_f$ and $w_t$ grows fast; length of wt grows slowly
- PLA ‘lines’ are more and more aligned with $w_f rightarrow halts$

Line with Noise Tolerance

$D$ is not linear separable?

Pocket Algorithm

Find the best weights in pocket until enough iterations

Question

Since we do not know whether D is linear separable in advance, we may decide to just go with pocket instead of PLA. If D is actually linear separable, what’s the difference between the two?
1 pocket on D is slower than PLA
2 pocket on D is faster than PLA
3 pocket on D returns a better g in approximating f than PLA
4 pocket on D returns a worse g in approximating f than PLA

answer: Because pocket need to check whether $w_{t+1}$ is better than $hat{w}$ in each iteration, it is slower than PLA. On linear separable D, $w_{POCKET}$ is the same as $w_{PLA}$, both making no mistakes.

learning to answer yes or no

What hypothesis set can we use?

A Simple Hypothesis Set: the “Perceptron”

Select g from $mathscr{H}$

Perceptron Learning Algorithm

Line with Noise Tolerance

Pocket Algorithm

Question

近期文章

近期评论

标签

热门

文章归档

分类目录

功能