Processing math: 57%
+ - 0:00:00
Notes for current slide
Notes for next slide



Bayesian Statistics and Computing

Lecture 13: Independent Monte Carlo Methods

Yanfei Kang | BSC | 2021 Spring

1 / 22

Strategies to summarize posterior

  1. If we have familiar distributions, we can simply simulate samples in R. What if there isn’t a built in function for doing this?

  2. Grid approximation.

    • Compute values of the posterior on a grid of points.
    • Approximate the continuous posterior by a discrete posterior.
    • Applicable for one- and two-parameter problems.
  3. If there are more parameters, we consider deterministic tools.

    • MAP
    • Bayesian CLT
    • Numerical Integration
  4. Monte Carlo methods are more general.

    • Independent Monte Carlo
    • Markov Chain Monte Carlo
2 / 22

Direct sampling

3 / 22

Inverse CDF transformation

  • Suppose that we are given UU(0,1) and want to simulate a continuous r.v. X with cdf FX. Put Y=F1X(U), we have FY(y)=P(Yy)=P(F1X(U)y)=P(UFX(y))=FX(y).

  • For any continuous r.v. X, FX(X)U(0,1). Why?

4 / 22

Define Z=FX(x), then FZ(x)=P(FX(X)x)=P(XF1X(x))=FX(F1X(x))=x.

Example: exponential distribution

  • If Xexp(λ), then FX(x)={0 for x<01eλx for x0

  • How to work with inverse CDF transformation?

5 / 22

Example: exponential distribution

lambda <- 1
u <- runif(20)
x <- -1/lambda * log(1 - u)
curve(1 - exp(-x), 0, 5, ylab = "u", main = "Inverse CDF transformation for exp(1)")
for (i in 1:length(x)) {
lines(c(x[i], x[i]), c(u[i], 0), lty = "dashed")
lines(c(x[i], 0), c(u[i], u[i]), lty = "dashed")
}

6 / 22

Example: exponential distribution

# Direct sampling
lambda <- 1
u <- runif(1000)
x <- -1/lambda * log(1 - u)
hist(x)

7 / 22

Rejection sampling

8 / 22

What if we can not write the inverse?

Rejection sampling

  • Although we cannot easily sample from f(θ|y), there exists another density g(θ) (called proposal density), like a Normal distribution or perhaps a t-distribution, from which it is easy for us to sample.

  • Then we can sample from g(θ) directly and then "reject" the samples in a strategic way to make the resulting "non-rejected" samples look like they came from f(θ|y).

9 / 22

Rejection sampling

  1. Sample UUnif(0,1).

  2. Sample θ at random from the probability density proportional to g(θ).

  3. If Uf(θ|y)Mg(θ), accept θ as a draw from f. If the drawn θ is rejected, return to step 1. (With probability f(θ|y)Mg(θ)), accept θ).

The algorithm can be repeated until the desired number of samples from the target density f has been accepted.

10 / 22

Illustration

  • The top curve is an approximation function, Mg(θ), and the bottom curve is the target density, f(θ|y).
  • As required, Mg(θ)f(θ|y) for all θ.
  • The vertical line indicates a single random draw θ from the density proportional to g.
  • The probability that a sampled draw θ is accepted is the ratio of the height of the lower curve to the height of the higher curve at the value θ.
11 / 22

What should g(θ) look like?

  1. XfXg, where Xf is the support of f(θ|y).

  2. For any θXf, f(θ|y)g(θ)M, that is M=sup

A simple choice is choose g(\theta) that heavier tails.

12 / 22

Example

Suppose we want to get samples from N(0,1), We could use the t_2 distribution as our proposal density as it has heavier tails than the Normal.

curve(dnorm(x), -6, 6, xlab = "x", ylab = "Density", n = 200)
curve(dt(x, 2), -6, 6, add = TRUE, col = 4, n = 200)
legend("topright", c("Normal density", "t density"), col = c(1, 4), bty = "n", lty = 1)

13 / 22

Probability of \theta accepted

\begin{aligned} \mathbb{P}(\theta \text { accepted }) &=\mathbb{P}\left(U \leq \frac{f(\theta|y)}{M g(\theta)}\right) \\ &=\int \mathbb{P}\left(U \leq \frac{f(\theta|y)}{M g(\theta)} | \theta\right) g(\theta) d \theta \\ &=\int \frac{f(\theta|y)}{M g(\theta)} g(\theta) d \theta \\ &=\frac{1}{M} \end{aligned}

14 / 22

Probability of \theta accepted

\begin{aligned} \mathbb{P}(\theta \text { accepted }) &=\mathbb{P}\left(U \leq \frac{f(\theta|y)}{M g(\theta)}\right) \\ &=\int \mathbb{P}\left(U \leq \frac{f(\theta|y)}{M g(\theta)} | \theta\right) g(\theta) d \theta \\ &=\int \frac{f(\theta|y)}{M g(\theta)} g(\theta) d \theta \\ &=\frac{1}{M} \end{aligned}

  1. This has implications for how we choose the proposal density g.

  2. We want to choose g so that it matches f as closely as possible.

14 / 22

Back to the example

f/g is maximized at x=1, so M = 1.257. Acceptance probability is about 0.8.

15 / 22

Why rejection sampling work?

It can be shown that the distribution function of the accepted values is equal to the distribution function corresponding to the target density: P(\Theta \leq \theta | \Theta \text{ accepted}) = F(\theta).

What is the problem of rejection sampling?

16 / 22

Rejection sampling

  • Suppose we are interested in \mathbb{E}_{f}[h(\theta|y)], what do we do?

  • Note that in most cases, the candidates generated from g fall within the domain of f, so that they are in fact values that could plausibly come from f.

  • They are simply over- or under-represented in the frequency with which they appear.

  • For example, if g has heavier tails than f, then there will be too many extreme values generated from g. Rejection sampling simply thins out those extreme values to obtain the right proportion.

  • But what if we could take those rejected values and, instead of discarding them, simply downweight or upweight them in a specific way?

17 / 22

Lab 10

Use the rejection sampling algorithm to simulate from N(0, 1). Try the following proposal densities.

  1. Logistic Distribution.
  2. Cauchy distribution.
  3. Student t_5 distribution.
  4. N(1,2) distribution.

For each proposal record the percentage of iterates that are accepted. Which is the best proposal?

18 / 22

Importance sampling

19 / 22

Importance sampling

  • We can rewrite the target \mathbb{E}_{f}[h(\theta|y)]=\mathbb{E}_{g}\left[\frac{f(\theta|y)}{g(\theta)} h(\theta|y)\right].
  • If \theta_{1}, \dots, \theta_{n} \sim g, we get \tilde{\mu}_{n}=\frac{1}{n} \sum_{i=1}^{n} \frac{f\left(\theta_{i}|y\right)}{g\left(\theta_{i}\right)} h\left(\theta_{i}|y\right)=\frac{1}{n} \sum_{i=1}^{n} w_{i} h\left(\theta_{i}|y\right) \approx \mathbb{E}_{f}[h(\theta|y)].
  • w_i are referred to as the importance weights.
20 / 22

Summary

  • Direct sampling
  • Rejection sampling
  • Importance sampling
21 / 22

References and further readings

References

  • Chapter 10 of the BDA book by Gelman et.al.

Further readings

  • Chapter 6 of Advanced Statistical Computing by Roger Peng.
22 / 22

Strategies to summarize posterior

  1. If we have familiar distributions, we can simply simulate samples in R. What if there isn’t a built in function for doing this?

  2. Grid approximation.

    • Compute values of the posterior on a grid of points.
    • Approximate the continuous posterior by a discrete posterior.
    • Applicable for one- and two-parameter problems.
  3. If there are more parameters, we consider deterministic tools.

    • MAP
    • Bayesian CLT
    • Numerical Integration
  4. Monte Carlo methods are more general.

    • Independent Monte Carlo
    • Markov Chain Monte Carlo
2 / 22
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow