Today I am going to talk about the Expectation Maximize Algorithm (EM-Algorithm), which is an optimization algorithm based on an interesting idea. However, it might be a little bit difficult for you to fully understand it if it is your first time hear about it. Don’t worry about that. Actually I don’t figure out all thedetails in the EM-Algorithm though I have go through the whole process manytimes. OK, Let’s start!
Generally speaking, Expectation Maximize can be considered as a kind of non-convex Optimization technique, which tends to find the optima for a non-convex objective function. Specifically, EM is applied and performs well when dealing with latent variables, which means unobservable. The EM is applied and extended in the parameter learning of Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), and mixed regression. I will give you the principles of EM, and then explain its implementations in GMM and HMM in the sequel.
Principles
For a general problem, suppose we can get the log likelihood function l(θ)
:
Which can be considered as a function of theta (cause x are the observed data samples,where theta is unknown).
Remember, there are latent variables in the model. So for the above log likelihoodfunction, we can introduce latent variable z and sum it up.
It isdifficult to optimize the above objective function using traditional gradient-based methods since there are sum terms in the log function which is hard to compute.
Then the following figure shows the basic intuition for the optimization process. That is, for a given theta, try to find a approximate function r(x|theta), which hasthe same value at the point of give theta while has a smaller value anywhere else. The approximate function r(x|theta) is easy to optimize so we can repeat this process until convergence. Finally, we will get a local optima.
OK, till to now, you may have the intuition of EM-Algorithm and raise a question: How do we determine the approximate function r(x|theta). Let’s get back to the loglikelihood function:
For the previous log likelihood function: to begin with, we can introduce the distribution Q(z) for the latent variable z. Then use Jensen inequation get the smaller value function. Actually ,we need a function which can obtain the equal sign, which means that we need:
Where c is a constant. Furthermore, Because sum Q(z) is equal to one, we can get:
Till to now, I have show the fundamental ideas of EM-Algorithm, then I will give you the pseudocode of Expectation Maximize:
Why is it called the Expectation Maximize? Because it repeats the circle that calculation of the expectation and maximization.
近期评论