# Fixed Points of SVGD

Published:

The following note is inspired by the discussion on properties of linear kernel functions. Though they’re not supposed to perform well when we use SVGD, they could provide exact estimates for some functions including the mean and variance. Three kinds of kernel functions, constant kernel, linear kernel and polynomial kernel respectively, are explored here to see what kinds of functions they could provide exact estimates for and to see how well particles could approximate the target distribution.

## What are fixed points in SVGD?

Define a fixed point X = {xi }n i=1 of SVGD is a set of particles that remains unchanged during updates, i.e.

$\phi^*(x_i) = \frac{1}{n}\sum_{j=1}^n \mathcal{A}_p^{x_j}k(x_j,x_i) = 0, \quad \forall i = 1, \dotsc, n.$

If fixed points exist, then X would exactly estimates Ep[ f ] for f ∈ FX∗ of the form

$\mathcal{F}_{X^*}^{scal} =\{ x \mapsto \sum_{i=1}^n a_i^\top \mathcal{A}_p^x k(x,x_i) + b : \quad b\in \mathds{R}, a_i \in \mathds{R}^d, \forall i = 1, \dotsc, n \}$

or of the form

$\mathcal{F}_{X^*}^{vec} =\{ x \mapsto \sum_{i=1}^n a_i \mathcal{A}_p^x k(x,x_i) + b : \quad b\in \mathds{R}^d, a_i \in \mathds{R}, \forall i = 1, \dotsc, n \}$

where the last second expression is a family of scalar functions and the last one is a family of vector functions. Thus we might wish an abundance of fix points or a rich family of FX∗ so that we could draw a large group of exact estimates. The existence of fixed points is equivalent to the existence of solution to equations ∀i = 1,…,n

$\begin{split} 0 &= \sum_{j=1}^n \mathcal{A}_p^{x_j}k(x_j,x_i)\\ &= \sum_{j=1}^n \nabla_{x_j} log p(x_j)^\top k(x_i,x_j) + \nabla k(x_i,x_j)^\top\\ &= \sum_k \lambda_k [\sum_{j=1}^n (\nabla_{x_j}log p(x_j)^\top) \phi_k (x_i)\phi_k (x_j) + \phi_k (x_i)\nabla_{x_j} \phi_k (x_j)^\top]\\ \end{split}$

where k(xi,xj) = ∑ λkφk(xik(xj) by Mercer’s theorem. In light of the equations above,the existence of fixed point largely depends on both target distribution p(x) and eigenfunctions {φk(x)}, which brings complexity into this topic. However, some discussions couldstill be made by taking p(x) and {φk(x)} to be relatively simple cases. In the followingdiscussions, we take p to be the normal distribution with mean vector μ and covariance matrix Σ and explore what FX* looks like.