Goal
Accelerate the computation of kernel matrix by directly approximating the kernel function.
Map x into a low-dimension random feature space, $z: mathcal { R } ^ { d } rightarrow mathcal { R } ^ { D }$
Basic Thought
Rewrite k(x, y) as an expectation over a certain probability space, and hence we can use sample mean to approximate it.
Method: Random Fourier Features
Some kernels depend only on $Delta = abs(x-y)$, for example, Gaussian kernel,
There is a theorem by Bochner: A continuous kernel $k ( mathrm { x } , mathrm { y } ) = k ( mathrm { x } - mathrm { y } )$ on $mathcal { R } ^ { d }$ is positive definite if and only if $k ( delta )$ is the Fourier transform of a non-negative measure.
And recall the Fourier Transform, $X ( j omega ) = int _ { - infty } ^ { + infty } x ( t ) e ^ { - j omega t } d t$, we let
And use the inverse formula:
By sampling different $omega$ from $p(omega)$, z(x) is a vector of D $zeta _ { omega } ( mathbf { x } )$, and
Plus, note that both $k()$ and $p(omega)$ are real-valued, so we could ignore the imaginary part,
After adding a random variable uniformly drawn from $[0, 2pi]$, (to avoid additionally calculating $sin(omega^{ T } x)$, since $cos(omega^{T} (x-y)) = cos(omega^{T} x) cos(omega^{T} y) + sin(omega^{T} x) sin(omega^{T} y)$)
So the final map is $z _ { omega } ( x ) = sqrt { 2 } cos left( omega ^ { T } x + b right)$
How close is the approximation to the original kernel function?
By Hoeffding inequality, (since the rv, i.e. the integral, is bounded)
Method: Random binning Features
First try to approximate a special “hat” kernel
Partition the real number line with a grid of pitch δ, and shift this grid randomly by an amount u drawn uniformly at random from [0,δ]. This grid partitions the real number line into intervals [u + nδ,u + (n + 1)δ] for all integers n.
Extend it to more general kernel function
And we can rewrite a kernel function depending only on the L1 distance between pairs of points,
the distribution p could be set as, $p ( delta ) = delta ddot { k } ( delta )$, which is guarenteed by the lemma 1 in this paper.
Random maps for separable multivariate shift-invariant kernels of the form can also utilize this method.
近期评论