linregvb

This paper proposed a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the KLD of an approximating distribution to the intractable posterior distribution.

In bayesian analysis the form of the posterior distribution is often not analytically tractable. To obtain quantities of interest under such a distribution, such as moments or marginal distributions, we typically need to use MC methods or approximate the posterior with a more convenient distribution.

Fixed-form VB

A different approach is to restrict the optimization problem to a reduced set of more convenient distributions (Q). If (p(x,y)) is of conjugate exponential form, choosing (Q) to be the set of factorized distributions (q(x) = q(x_1)q(x_2)...q(x_k)) often leads to a tractable optimization problem that can be solved efficiently using an algorithm called Variational Bayes Expectation Maximization (VBEM).

An alternative choice for (Q) is the set of distributions of a certain parametric form (q_{eta}(x)), where (eta) denotes the vector of parameters governing the shape of the posterior approximation. This approach is known as structured or fixed-form VB.

Usually, the posterior appeoximation is chosen to be a specific member of the exponential family of distributions:

[q_{eta}(x) = exp[T(x)eta - U(eta)] v(x)].

Using this approach, the variational optimization problem reduces to a parametric optimization problem in (eta):

[hat{eta} = arg min E{q{eta}(x)}[log q_{eta}(x) - log p(x,y)]]

VB as linear regression

For notational convenience we will write our posterior approximation in the adjusted form,

[hat{q}_{hat{eta}}(x) = exp[hat{T}(x)hat{eta}]]

To work with (hat{q}_{hat{eta}}(x)), we use the unormalized version of the KLD.

[hat{eta} = E_q [hat{T}(x)^{'}hat{T}(x)]^{-1} E_q [hat{T}(x)^{'} log p(x,y)]]

Our key insight is to notice the similarity between (9) and the maximum likelihood estimator for linear regression.

[ hat{beta} (X^{'}X)^{-1} X^{'}Y]

A stochastic approximation algorithm

The link between VB and linear regression in itself is interesting, but it does not yet provide us with a solution to the bariational optimization problem. We propose solving this optimization problem by viewing (9) as a fixed point update. Let (C = E_q[hat{T}(x)^{'}hat{T}(x)]) and (g = E_q[hat{T}(x)^{'} log p(x,y)]) so that (9) can be written (hat{eta} = C^{-1}g). We iteratively approximate (C,g) by weighted MC, drawing a single sample from the current posterior approximation at each iteration (t), and using the update equations

[ g_{t+1} = (1 - w) g_t + w hat{g}_t]

[C_{t+1} = (1 - w) C_t + w hat{C}_t]