machine learning theory

There are 5 important inequalities in Machine Learning Theory:


Markov’s Inequality

If $X$ is a nonnegative random variable and $epsilon>0$. Then

Proof for Markov’s Inequality

First of all, we define indicator function:

For an event $A$, let $I(A)$ be the indicator of the event:

We have $Xgeqepsilon I(Xgeqepsilon)$. Taking expectation on both sides, we get

Chebyshev’s Inequality

Let $X$ be a random variable with finite expected value and finite non-zero variance. Then for any real number $epsilon>0$:

Proof for Chebyshev’s Inequality

From Markov’s inequality, we define

Then


The previous two inequalities are easy to understand compared with the following three inequalities.

Chernoff-Hoeffding Bound

The following theorem is a special case of chernoff bound

Let $x_1, cdots, x_m$ be i.i.d. Bernoulli random variables, where for every $i$:

Let

We have

Moment-Generating Functions (MGFs)

MGF of a random variable $X$ is defined as

MGF has some important properties:

  1. If $M_X(t)$ is defined for $tin(-t^ast,t^ast)$ for some $t^ast>0$, then
    • All the moments of $X$ exist, and $M_X(t)$ has derivatives of all orders at $t=0$ where $forall k, E[X^k]=frac{partial^kM}{partial t^k}vert_{t=0}$
    • $forall tin(-t^ast,t^ast)$, $M_X(t)$ has Taylor’s expansion $M_X(t)=sum_kfrac{E[X^k]}{k!}t^k$
  2. If $x$ and $y$ are independent random variables, then the $M_{x+y}(t)=M_x(t)M_y(t)$
  3. For any constants $a$ and $b$, $M_{ax+b}(t)=e^{bt}M_x(at)$

Jensen’s Inequality

Let $X$ be a random variable, and $f$ be a convex function. Then

Proof for Chernoff-Hoeffding Bound

Define $y_i=x_i-p_i$, $bar{y}=frac{1}{m}sum_{i=1}^my_i$. We need to show $P(vertbar{y}vert>epsilon)leq2e^{-2epsilon^2m}$

Conseder each of $P(bar{y}>epsilon)$ and $P(bar{y}<epsilon)$

For $t>0$, look at

Lemma 1: For all $t>0$, $M_{y_i}(t)leq e^{t^2/8}$

Proof: Recall




Look at $g(t)=-tp_i+ln(p_ie^t+(1-p_i))$, for some $t^astin[0,t]$

We can show that $forall t^ast$, we can get $g^{primeprime}(t)leqfrac{1}{4}$.

Thus $g(t)leqfrac{t^2}{8}$

Using properties of MGFs, $M_{bar{y}}(t)=prod_{i=1}^mM_{y_i}(frac{t}{m})$

Using Lemma 1, $M_{bar{y}}(t)leqprod_{i=1}^mexp(frac{t^2}{8m^2})=exp(frac{t^2}{8m})$



Pick $t$ that maximizes the $P(bar{y}>epsilon)$

The other direction is the same.

Hoeffding’s Inequality

Let $x_1, cdots, x_m$ be independent random variables, where for every $i$:

Let

For any $epsilon>0$, we have


McDiarmid’s Inequality

Let $f:mathbb{R}^mrightarrowmathbb{R}$ be such that $forall i$, and all $x_1, cdots, x_i, cdots, x_m, X’_i$

Let $x_1, cdots, x_m$ be independent random variables, then


Note: Hoeffding’s inequality is a special case of McDiarmid’s inequality: just set

Thanks






切比雪夫不等式到底是个什么概念? - 回答作者:姚岑卓






Markov’s Inequality






Chebyshev’s Inequality






我只要这么多样本! - 比生活简单多了(知乎专栏 · 作者:张熤)






Chernoff Bound






Moment-Generating Function






Hoeffding’s Inequality






McDiarmid’s Inequality