# Reading: Stein method and its applications (Part 1)

Published:

Short notes of five papers I read these days on applications of Stein’s method including wild variational inference, reinforcement learning and sampling. Most paper are from the project Stein’s method for practical machine learning that I am quite interested in.

## Approximate Inference with Amortised MCMC

Amortised MCMC is propsed as a framework to approximate the posterior distribution p of interest. With the following three main ingredients:

• a parametric set Q = {qφ} of the sampler distribution
• a transition kernel K(zt|zt−1) in MCMC dynamics
• divergence D(…||…) and update rule for φ

the process of approximating distribution p is described as follow, where we update the distribution in the parametric set Q by minimizing discrepancy between samples before and after MCMC. Firstly, with sample {z0k} from the distribution qφt−1 in (t − 1)th iteration, apply T-step transition to obtain {zTk } where zTk ∼ KT (. . . | z0k ). With the purpose to minimize the discrepancy between {z0k} and {zTk }, φ is updated as follow

​ ​

Alternatives for update includes minimizing KL divergence, adversarially estimated diver- gences and energy matching.

## Two Methods for Wild Variational Inference

### Amortized SVGD

Trainig an inference network f(η;ξ) by iteratively adjusting η so that the outputs of f(η;ξ) moves towards its SVGD-updated counterpart. This process is similar to the amortised MCMC mentioned in last section, differing in the way of moving samples produced by f(η;ξ) with random {ξi} in every iteration. Specific steps in every iteration include calculating Stein variational gradient ∆zi for zi = f(η; ξi), and updating η as follow

### KSD Variational Inference

Optimizing η with a standard gradient descent where the gradient is obtained with the purpose to minimize KSD, approximated by a U-statistics as follow