The detail of generative model can refer to generative models
Definition of generative models
given training data, generate new samples from same distribution
begin{aligned}
mathrm{p}_{model}(mathrm{x}) { similar to } mathrm{p}_(mathrm{x})
end{aligned}
Taxonomy of Generative models
PixelsRNN and PixelsCNN
-
this method belongs to explicit density model.
-
Using chain rule to decompose likelihood of an image x into product of 1-d distributions:
begin{aligned}
p_{theta}(x)=prod_{i=1}^{n} p_{theta}left(x_{i} | x_{1}, ldots, x_{i-1}right)
end{aligned}
where $x_i$ represents pixel, $p(x)$ denoets the likelihood of image $x$
-
it’s important to define ordering of those pixels
-
complex distribution over pixels can be sloved by neural networks
-
pros: explicitly compute likelihood $p(x)$.
-
cons: sequential generation is slow.
PixelsRNN
-
generating image pixels from corner, and then using RNN(LSTM) to model dependency on previous pixels.
-
drawback is that the process of sequential generation if slow.
PixelsCNN
-
generating image pixels from corner, and then using CNN to cover context region.
-
generation proceed sequentially is still slow.
Variational Autoenconders
-
general explanation is here VAE
-
this method defines intractable density function with latent $z$:
begin{aligned}
p_{theta}(x)=int p_{theta}(z) p_{theta}(x | z) d z
end{aligned}
Autoencoders
- mapping image $x$ to features $z$ with deep neural networks, $z$ is regarded to capture meaningful factors of variation in data.
- to learn feature $z$ by reconstruction error: $x$ -> $z$ -> $hat_{x}$ with encoder and decoder.
Variational Autoenconders for generation problem
-
Assuming training data $x(i)$ representation $z$ is generated from latent representation $z$. Specifically, first sampling $z$ from true prior $p_{theta}(z)$, then sampling $x$ from true conditional $p_{theta}(x|z^(i))$
-
the aim is to estimate the true parameter $theta$ by maximuming likelihood of training data :
begin{equation}
p_{theta}(x)=int p_{theta}(z) p_{theta}(x | z) d z
end{equation}
where $p_{theta}(z)$ is gaussion prior, and $p_{theta}(x | z) $ is decoder.
-
intractable optimization problem, because it’s impossible to compute $p(x | z)$ for every $z$ which means the integral operation is fail in this condition.
-
posterior density $p_{theta}(z | x)=p_{theta}(x | z) p_{theta}(z) / p_{theta}(x)$ is also intractable, since $p_{theta}(x)$ is intractable data likelihood.
lower bound of VAE
- why: intractable, transfer optimization
- how: In addition to decoder network modeling $p_theta(x|z)$, define additional encoder network $q_{phi}(z|x)$ that approximates $p_theta(z|x)$
-
final optimization objective:
begin{aligned}
log p_{theta}left(x^{(i)}right)\
&=mathbf{E}_{z sim q_{phi}left(z | x^{(i)}right)}left[log p_{theta}left(x^{(i)}right)right]\
&=mathbf{E}_{z}left[log frac{p_{theta}left(x^{(i)} | zright) p_{theta}(z)}{p_{theta}left(z | x^{(i)}right)}right]\
&=mathbf{E}_{z}left[log frac{p_{theta}left(x^{(i)} | zright) p_{theta}(z)}{p_{theta}left(z | x^{(i)}right)} frac{q_{phi}left(z | x^{(i)}right)}{q_{phi}left(z | x^{(i)}right)}right]\
&=mathbf{E}_{z}left[log p_{theta}left(x^{(i)} | zright)right]-mathbf{E}_{z}left[log frac{q_{phi}left(z | x^{(i)}right)}{p_{theta}(z)}right]+mathbf{E}_{z}left[log frac{q_{phi}left(z | x^{(i)}right)}{p_{theta}left(z | x^{(i)}right)}right]\
&=mathbf{E}_{z}left[log p_{theta}left(x^{(i)} | zright)right]-D_{K L}left(q_{phi}left(z | x^{(i)}right) | p_{theta}(z)right)+D_{K L}left(q_{phi}left(z | x^{(i)}right) | p_{theta}left(z | x^{(i)}right)right)
end{aligned}
where $mathbf{E}_{z}left[log p_{theta}left(x^{(i)} | zright)right]$ is used to reconstruct the input data, $q_{phi}left(z | x^{(i)}right) | p_{theta}(z)$ is used to make approximate posterior distribution close to prior. -
lower bound: transfering intractable problem into tractale problem with lower bounds, introducing encoder $_{phi}$ to transfer two KL divergence, in which one is tractable while another one is untractabel. Thus, lower bound aims at optimizing the tractable KL divergence.
-
the detail of optimization procedure as follows:
GAN
begin{equation}
{min {G} max {D} V(D, G)=}
{mathbb{E}{boldsymbol{x} sim p{text {data}}(boldsymbol{x})}[log D(boldsymbol{x})]+mathbb{E}{boldsymbol{z} sim p{boldsymbol{z}}(boldsymbol{z})}[log (1-D(G(boldsymbol{z}))]}
end{equation}
where $D(x)$ represents the probability of $x$ sourced from real data.
近期评论