
There are 5 important inequalities in Machine Learning Theory:
- Markov’s Inequality
- Chebyshev’s Inequality
- Chernoff-Hoeffding Bound
- Hoeffding’s Inequality
- McDiarmid’s Inequality
Markov’s Inequality
If $X$ is a nonnegative random variable and $epsilon>0$. Then
Proof for Markov’s Inequality
First of all, we define indicator function:
For an event $A$, let $I(A)$ be the indicator of the event:
We have $Xgeqepsilon I(Xgeqepsilon)$. Taking expectation on both sides, we get
Chebyshev’s Inequality
Let $X$ be a random variable with finite expected value and finite non-zero variance. Then for any real number $epsilon>0$:
Proof for Chebyshev’s Inequality
From Markov’s inequality, we define
Then
The previous two inequalities are easy to understand compared with the following three inequalities.
Chernoff-Hoeffding Bound
The following theorem is a special case of chernoff bound
Let $x_1, cdots, x_m$ be i.i.d. Bernoulli random variables, where for every $i$:
Let
We have
Moment-Generating Functions (MGFs)
MGF of a random variable $X$ is defined as
MGF has some important properties:
- If $M_X(t)$ is defined for $tin(-t^ast,t^ast)$ for some $t^ast>0$, then
- All the moments of $X$ exist, and $M_X(t)$ has derivatives of all orders at $t=0$ where $forall k, E[X^k]=frac{partial^kM}{partial t^k}vert_{t=0}$
- $forall tin(-t^ast,t^ast)$, $M_X(t)$ has Taylor’s expansion $M_X(t)=sum_kfrac{E[X^k]}{k!}t^k$
- If $x$ and $y$ are independent random variables, then the $M_{x+y}(t)=M_x(t)M_y(t)$
- For any constants $a$ and $b$, $M_{ax+b}(t)=e^{bt}M_x(at)$
Jensen’s Inequality
Let $X$ be a random variable, and $f$ be a convex function. Then
Proof for Chernoff-Hoeffding Bound
Define $y_i=x_i-p_i$, $bar{y}=frac{1}{m}sum_{i=1}^my_i$. We need to show $P(vertbar{y}vert>epsilon)leq2e^{-2epsilon^2m}$
Conseder each of $P(bar{y}>epsilon)$ and $P(bar{y}<epsilon)$
For $t>0$, look at
Lemma 1: For all $t>0$, $M_{y_i}(t)leq e^{t^2/8}$
Proof: Recall
Look at $g(t)=-tp_i+ln(p_ie^t+(1-p_i))$, for some $t^astin[0,t]$
We can show that $forall t^ast$, we can get $g^{primeprime}(t)leqfrac{1}{4}$.
Thus $g(t)leqfrac{t^2}{8}$
Using properties of MGFs, $M_{bar{y}}(t)=prod_{i=1}^mM_{y_i}(frac{t}{m})$
Using Lemma 1, $M_{bar{y}}(t)leqprod_{i=1}^mexp(frac{t^2}{8m^2})=exp(frac{t^2}{8m})$
Pick $t$ that maximizes the $P(bar{y}>epsilon)$
The other direction is the same.
Hoeffding’s Inequality
Let $x_1, cdots, x_m$ be independent random variables, where for every $i$:
Let
For any $epsilon>0$, we have
McDiarmid’s Inequality
Let $f:mathbb{R}^mrightarrowmathbb{R}$ be such that $forall i$, and all $x_1, cdots, x_i, cdots, x_m, X’_i$
Let $x_1, cdots, x_m$ be independent random variables, then
Note: Hoeffding’s inequality is a special case of McDiarmid’s inequality: just set
Thanks
知
我只要这么多样本! - 比生活简单多了(知乎专栏 · 作者:张熤)




近期评论