expectation maximize Principles

Today I am going to talk about the Expectation Maximize Algorithm (EM-Algorithm), which is an optimization algorithm based on an interesting idea. However, it might be a little bit difficult for you to fully understand it if it is your first time hear about it. Don’t worry about that. Actually I don’t figure out all thedetails in the EM-Algorithm though I have go through the whole process manytimes. OK, Let’s start!

Generally speaking, Expectation Maximize can be considered as a kind of non-convex Optimization technique, which tends to find the optima for a non-convex objective function. Specifically, EM is applied and performs well when dealing with latent variables, which means unobservable. The EM is applied and extended in the parameter learning of Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), and mixed regression. I will give you the principles of EM, and then explain its implementations in GMM and HMM in the sequel.

Principles

For a general problem, suppose we can get the log likelihood function l(θ):

img

Which can be considered as a function of theta (cause x are the observed data samples,where theta is unknown).

Remember, there are latent variables in the model. So for the above log likelihoodfunction, we can introduce latent variable z and sum it up.

img

It isdifficult to optimize the above objective function using traditional gradient-based methods since there are sum terms in the log function which is hard to compute.

Then the following figure shows the basic intuition for the optimization process. That is, for a given theta, try to find a approximate function r(x|theta), which hasthe same value at the point of give theta while has a smaller value anywhere else. The approximate function r(x|theta) is easy to optimize so we can repeat this process until convergence. Finally, we will get a local optima.

img

OK, till to now, you may have the intuition of EM-Algorithm and raise a question: How do we determine the approximate function r(x|theta). Let’s get back to the loglikelihood function:

img

For the previous log likelihood function: to begin with, we can introduce the distribution Q(z) for the latent variable z. Then use Jensen inequation get the smaller value function. Actually ,we need a function which can obtain the equal sign, which means that we need:

img

Where c is a constant. Furthermore, Because sum Q(z) is equal to one, we can get:

img

Till to now, I have show the fundamental ideas of EM-Algorithm, then I will give you the pseudocode of Expectation Maximize:

img


Why is it called the Expectation Maximize? Because it repeats the circle that calculation of the expectation and maximization.