Reading: Stein method and its applications (Part 1)

2 minute read

Published:

Short notes of five papers I read these days on applications of Stein’s method including wild variational inference, reinforcement learning and sampling. Most paper are from the project Stein’s method for practical machine learning that I am quite interested in.

Approximate Inference with Amortised MCMC

Amortised MCMC is propsed as a framework to approximate the posterior distribution p of interest. With the following three main ingredients:

  • a parametric set Q = {qφ} of the sampler distribution
  • a transition kernel K(zt|zt−1) in MCMC dynamics
  • divergence D(…||…) and update rule for φ

the process of approximating distribution p is described as follow, where we update the distribution in the parametric set Q by minimizing discrepancy between samples before and after MCMC. Firstly, with sample {z0k} from the distribution qφt−1 in (t − 1)th iteration, apply T-step transition to obtain {zTk } where zTk ∼ KT (. . . | z0k ). With the purpose to minimize the discrepancy between {z0k} and {zTk }, φ is updated as follow

​ ​

Alternatives for update includes minimizing KL divergence, adversarially estimated diver- gences and energy matching.

Two Methods for Wild Variational Inference

Amortized SVGD

Trainig an inference network f(η;ξ) by iteratively adjusting η so that the outputs of f(η;ξ) moves towards its SVGD-updated counterpart. This process is similar to the amortised MCMC mentioned in last section, differing in the way of moving samples produced by f(η;ξ) with random {ξi} in every iteration. Specific steps in every iteration include calculating Stein variational gradient ∆zi for zi = f(η; ξi), and updating η as follow

KSD Variational Inference

Optimizing η with a standard gradient descent where the gradient is obtained with the purpose to minimize KSD, approximated by a U-statistics as follow

Stein Variational Policy Gradient

​ In the context of reinforced learning, an agent takes action a in the environment and then the environment gives an instant scalar feedback r to the agent. The agent needs to learn a policy π to maximize its expected return

and P determined by the environment.

Taking policies θi to be particles and p(θ) to be the target distribution optimizing the expected return, SVGD would then be applied to draw a sample of p. To obtain the target distribution q(θ), we need to optimized regularized expected return as follow

where the first term plays the role of exploration and the second term plays the role of exploitation such that q is maximizing the expected return but meanwhile not too far from a prior distribution q0. For a specific representation of q, take its derivative w.r.t. q(θ) and set it to zero. Then we would have

from which we obtain the following results

and in each iteration of SVGD, we update {θi} with