# Reading: Stein method and its applications (Part 1)

** Published:**

Short notes of five papers I read these days on applications of Stein’s method including wild variational inference, reinforcement learning and sampling. Most paper are from the project Stein’s method for practical machine learning that I am quite interested in.

## Approximate Inference with Amortised MCMC

Amortised MCMC is propsed as a framework to approximate the posterior distribution p of interest. With the following three main ingredients:

- a parametric set Q = {q
_{φ}} of the sampler distribution - a transition kernel K(zt|zt−1) in MCMC dynamics
- divergence D(…||…) and update rule for φ

the process of approximating distribution p is described as follow, where we update the distribution in the parametric set Q by minimizing discrepancy between samples before and after MCMC. Firstly, with sample {z_{0}^{k}} from the distribution q_{φt−1} in (t − 1)th iteration, apply T-step transition to obtain {z_{T}^{k} } where z_{T}^{k} ∼ K_{T} (. . . | z_{0}^{k} ). With the purpose to minimize the discrepancy between {z_{0}^{k}} and {z_{T}^{k} }, φ is updated as follow

Alternatives for update includes minimizing KL divergence, adversarially estimated diver- gences and energy matching.

## Two Methods for Wild Variational Inference

### Amortized SVGD

Trainig an inference network f(η;ξ) by iteratively adjusting η so that the outputs of f(η;ξ) moves towards its SVGD-updated counterpart. This process is similar to the amortised MCMC mentioned in last section, differing in the way of moving samples produced by f(η;ξ) with random {ξ_{i}} in every iteration. Specific steps in every iteration include calculating Stein variational gradient ∆zi for z_{i} = f(η; ξ_{i}), and updating η as follow

### KSD Variational Inference

Optimizing η with a standard gradient descent where the gradient is obtained with the purpose to minimize KSD, approximated by a U-statistics as follow

## Stein Variational Policy Gradient

In the context of reinforced learning, an agent takes action a in the environment and then the environment gives an instant scalar feedback r to the agent. The agent needs to learn a policy π to maximize its expected return

and P determined by the environment.

Taking policies θ_{i} to be particles and p(θ) to be the target distribution optimizing the expected return, SVGD would then be applied to draw a sample of p. To obtain the target distribution q(θ), we need to optimized regularized expected return as follow

where the first term plays the role of exploration and the second term plays the role of exploitation such that q is maximizing the expected return but meanwhile not too far from a prior distribution q0. For a specific representation of q, take its derivative w.r.t. q(θ) and set it to zero. Then we would have

from which we obtain the following results

and in each iteration of SVGD, we update {θ_{i}} with