+ - 0:00:00
Notes for current slide
Notes for next slide



Bayesian Statistics and Computing

Lecture 13: Independent Monte Carlo Methods

Yanfei Kang | BSC | Beihang University

1 / 23

Strategies to summarize posterior

  1. If we have familiar distributions, we can simply simulate samples in R. What if there isn’t a built in function for doing this?

  2. Grid approximation.

    • Compute values of the posterior on a grid of points.
    • Approximate the continuous posterior by a discrete posterior.
    • Applicable for one- and two-parameter problems.
  3. If there are more parameters, we consider deterministic tools.

    • MAP
    • Bayesian CLT
    • Numerical Integration
  4. Monte Carlo methods are more general.

    • Independent Monte Carlo
    • Markov Chain Monte Carlo
2 / 23

Direct sampling

3 / 23

Inverse CDF transformation

  • Suppose that we are given UU(0,1) and want to simulate a continuous r.v. X with cdf FX. Put Y=FX1(U), we have FY(y)=P(Yy)=P(FX1(U)y)=P(UFX(y))=FX(y).

  • For any continuous r.v. X, FX(X)U(0,1). Why?

4 / 23

Define Z=FX(x), then FZ(x)=P(FX(X)x)=P(XFX1(x))=FX(FX1(x))=x.

Example: exponential distribution

  • If Xexp(λ), then FX(x)={0 for x<01eλx for x0

  • How to work with inverse CDF transformation?

5 / 23

Example: exponential distribution

lambda <- 1
u <- runif(20)
x <- -1/lambda * log(1 - u)
curve(1 - exp(-x), 0, 5, ylab = "u", main = "Inverse CDF transformation for exp(1)")
for (i in 1:length(x)) {
lines(c(x[i], x[i]), c(u[i], 0), lty = "dashed")
lines(c(x[i], 0), c(u[i], u[i]), lty = "dashed")
}

6 / 23

Example: exponential distribution

# Direct sampling
lambda <- 1
u <- runif(1000)
x <- -1/lambda * log(1 - u)
hist(x)

7 / 23

Rejection sampling

8 / 23

What if we can not write the inverse?

Rejection sampling

  • Although we cannot easily sample from f(θ|y), there exists another density g(θ) (called proposal density), like a Normal distribution or perhaps a t-distribution, from which it is easy for us to sample.

  • Then we can sample from g(θ) directly and then "reject" the samples in a strategic way to make the resulting "non-rejected" samples look like they came from f(θ|y).

9 / 23

Rejection sampling

  1. Sample UUnif(0,1).

  2. Sample θ at random from the probability density proportional to g(θ).

  3. If Uf(θ|y)Mg(θ), accept θ as a draw from f. If the drawn θ is rejected, return to step 1. (With probability f(θ|y)Mg(θ)), accept θ).

The algorithm can be repeated until the desired number of samples from the target density f has been accepted.

10 / 23

Illustration

  • The top curve is an approximation function, Mg(θ), and the bottom curve is the target density, f(θ|y).
  • As required, Mg(θ)f(θ|y) for all θ.
  • The vertical line indicates a single random draw θ from the density proportional to g.
  • The probability that a sampled draw θ is accepted is the ratio of the height of the lower curve to the height of the higher curve at the value θ.
11 / 23

What should g(θ) look like?

  1. XfXg, where Xf is the support of f(θ|y).

  2. For any θXf, f(θ|y)g(θ)M, that is M=supθXff(θ|y)g(θ)<.

A simple choice is choose g(θ) that heavier tails.

12 / 23

Example

Suppose we want to get samples from N(0,1), We could use the t2 distribution as our proposal density as it has heavier tails than the Normal.

curve(dnorm(x), -6, 6, xlab = expression(theta), ylab = "Density", n = 200)
curve(dt(x, 2), -6, 6, add = TRUE, col = 4, n = 200)
legend("topright", c("Normal density", "t density"), col = c(1, 4), bty = "n", lty = 1)
x <- rt(200, 2)
rug(x, col = 4)
y <- rnorm(200)
rug(y, col = 1)

13 / 23

Probability of θ accepted

P(θ accepted )=P(Uf(θ|y)Mg(θ))=P(Uf(θ|y)Mg(θ)|θ)g(θ)dθ=f(θ|y)Mg(θ)g(θ)dθ=1M

14 / 23

Probability of θ accepted

P(θ accepted )=P(Uf(θ|y)Mg(θ))=P(Uf(θ|y)Mg(θ)|θ)g(θ)dθ=f(θ|y)Mg(θ)g(θ)dθ=1M

  1. This has implications for how we choose the proposal density g.

  2. We want to choose g so that it matches f as closely as possible.

14 / 23

Back to the example

f/g is maximized at x=1, so M=1.257. Acceptance probability is about 0.8.

15 / 23

Changing df

16 / 23

Why rejection sampling work?

It can be shown that the distribution function of the accepted values is equal to the distribution function corresponding to the target density: P(Θθ|Θ accepted)=F(θ).

What is the problem of rejection sampling?

17 / 23

Rejection sampling

  • Suppose we are interested in Ef[h(θ|y)], what do we do?

  • Note that in most cases, the candidates generated from g fall within the domain of f, so that they are in fact values that could plausibly come from f.

  • They are simply over- or under-represented in the frequency with which they appear.

  • For example, if g has heavier tails than f, then there will be too many extreme values generated from g. Rejection sampling simply thins out those extreme values to obtain the right proportion.

  • But what if we could take those rejected values and, instead of discarding them, simply downweight or upweight them in a specific way?

18 / 23

Lab 10

Use the rejection sampling algorithm to simulate from N(0,1). Try the following proposal densities.

  1. Logistic Distribution.
  2. Cauchy distribution.
  3. Student t5 distribution.
  4. N(1,2) distribution.

For each proposal record the percentage of iterates that are accepted. Which is the best proposal?

19 / 23

Importance sampling

20 / 23

Importance sampling

  • We can rewrite the target Ef[h(θ|y)]=Eg[f(θ|y)g(θ)h(θ|y)].
  • If θ1,,θng, we get μ~n=1ni=1nf(θi|y)g(θi)h(θi|y)=1ni=1nwih(θi|y)Ef[h(θ|y)].
  • wi are referred to as the importance weights.
21 / 23

Summary

  • Direct sampling
  • Rejection sampling
  • Importance sampling
22 / 23

References and further readings

References

  • Chapter 10 of the BDA book by Gelman et.al.

Further readings

  • Chapter 6 of Advanced Statistical Computing by Roger Peng.
23 / 23

Strategies to summarize posterior

  1. If we have familiar distributions, we can simply simulate samples in R. What if there isn’t a built in function for doing this?

  2. Grid approximation.

    • Compute values of the posterior on a grid of points.
    • Approximate the continuous posterior by a discrete posterior.
    • Applicable for one- and two-parameter problems.
  3. If there are more parameters, we consider deterministic tools.

    • MAP
    • Bayesian CLT
    • Numerical Integration
  4. Monte Carlo methods are more general.

    • Independent Monte Carlo
    • Markov Chain Monte Carlo
2 / 23
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
sToggle scribble toolbox
Esc Back to slideshow