I’d like to talk about something about EM algorithm in my understanding.
This post is mainly based on Richard Xu’s machine learning course.
Gaussian Mixture Model
Gaussian Mixture Model (GMM) (k-mixture) is defined as:
and
For data $X = { x_1, dots, x_n }$, we introduce latent variable $Z = { z_1, dots, z_n }$, each $z_i$ indicates which mixture components $x_i$ belongs to. (The introduction of latent variable should not change the marginal distribution of $p(X)$.)
Then we can use MLE to estimate $Theta$ :
This formula is difficult to solve because it is in ‘log-of-sum’ form. So, we solve this problem in an iterative way, called Expectation Maximization.
Expectation Maximization
Instead of performing
we assume some latent variable $Z$ to the model, such that we generate a series of $Theta = { Theta^{(1)}, Theta^{(2)}, dots, Theta^{(t)} }$.
For each iteration of the E-M algorithm, we perform:
We must ensure convergence:
Proof :
denote
then we have
Because
the second inequality can be derived using Jensen’s inequality.
Hence ,
Using EM algorithm to solve GMM
Put GMM into this frame work.
E-Step:
Define $ p(X, Z | Theta)$ :
Define $p (Z | X, Theta)$ :
Then
M-Step:
The first term contains only $alpha$ and the second term contains only $mu, Sigma$, so we can maximize both terms independantly.
Maximizing $alpha$ means that:
subject to $sum_{l=1}^k = 1$.
Solving this problem via Lagrangian Multiplier, we have
Similarly, we can solve $mu$ and $Sigma$.
近期评论