Similarity
-
Both aim at constructing a model that can map latent variabels $z$ to object data $p_{data}$. specifically, the trained model $mathbf{X} = g(mathbf{Z})$ where $Z$ is general distribution such as normal distribution and gaussian distribution, $mathbf{X}$ represents the probability distribution of training data. So they both aim at distribution transformation.
-
Both have generative problem that it’s difficult to obtain distribution expression of generated distribution and true distribution, only samples from two distribution are available.
- $KL$ divergence only be applied to calculte distribution differene on the condition that complete distribution expressions are provided, so it is unapplicable in this scenario.
Difference
-
Measurement method: manmade measurement rule for VAE while this measurement rule of GAN is trained by neural network.
-
GAN: proposed to leverage deep neural networks to measure distribution difference because there isn’t suitable measurement method.
-
VAE: adopt a roundabout skill to leverage $KL$ divergence.
VAE
Important points
- The detail is here VAE’s blog
-
Notes: each sample has a gaussian distribution constructed by multivariate Gaussion and then obtain $z$ by sampling it from this distribution:
begin{equation}
log q_{phi}left(mathbf{z} | mathbf{x}^{(i)}right)=log mathcal{N}left(mathbf{z} ; boldsymbol{mu}^{(i)}, boldsymbol{sigma}^{2(i)} mathbf{I}right)
end{equation} -
Notes: $log boldsymbol{mu_k} = f_1(x_k), boldsymbol{sigma}^{2(k)} = f_2(x_k)$ which are both fitted by neural network.
-
Noise from constructed $z$ can be calculated from $sigma$ which can be controlled to zero, so noise takes no effect.
-
Generative ability is based on the condition that all $p(ZX)$ close to gaussion distribution.
begin{equation}
p(Z)=sum_{X} p(Z | X) p(X)=sum_{X} mathcal{N}(0, I) p(X)=mathcal{N}(0, I) sum_{X} p(X)=mathcal{N}(0, I)
end{equation}
so $p(Z)$ subjects to normal distribution, which satisfy the prior.
-
how: intruducing reconstruction loss:
direct method:
begin{equation}
mathcal{L}_{mu}=left|f_{1}left(X_{k}right)right|^{2}
end{equation}begin{equation}
mathcal{L}_{sigma^{2}}=left|f_{2}left(X_{k}right)right|^{2}
end{equation}it’s difficult to measure those loss, so it’s reasonable to introduce $KL$ divegence between standard gaussion distribution and independent gaussion distribution $K Lleft(Nleft(mu, sigma^{wedge}right) | N(0, l)right)$:
begin{equation}
mathcal{L}_{mu, sigma^{2}}=frac{1}{2} sum_{i=1}^{d}left(mu_{(i)}^{2}+sigma_{(i)}^{2}-log sigma_{(i)}^{2}-1right)
end{equation}
where $d$ is the dimension of $z$
Essence of VAE
-
Two encoder: $f_1$ for $mu$ while $f_2$ for $sigma$
-
Reconstruction process: the loss of decoder assume that there is no noise; Sample $z$ process: the loss of encoder assume that there is gaussion noise.
GAN
Important points
- GAN is used to map normal distribution $p(z)$ into specfic distribution $p(x)$.
近期评论