Bayesian Statistics

class: left, bottom, inverse, title-slide

# Bayesian Statistics
## Lecture 2: Introduction to Bayesian Thinking
### Yanfei Kang
### 2019/09/03 (updated: 2019-09-16)

---

# Introduction

- We will consider a basic problem of learning about a population proportion.

- Remind you our Bayesian inference idea: `$$\text{prior} + \text{data} \rightarrow \text{posterior}.$$`

- We will illustrate the use of different functional forms for the **prior**.

- From the posterior, one summarizes the probability distribution to perform **inferences** and one can also **predict** the likely outcomes of a new sample.

---
class: inverse, center, middle
# Example

---

# Learning About the Proportion of Heavy Sleepers

- Suppose a person is interested in learning about the sleeping habits of American college students.

- She hears that doctors recommend eight hours of sleep for an average adult.

- **What proportion of college students get at least eight hours of sleep?**

---

# Learning About the Proportion of Heavy Sleepers

Now we reorganize the problem.

1. **Population: ** all American college students.

2. `$p$`: the proportion of this population who sleep at least eight hours.

3. Our aim is to learn about the location of `$p$`, which is the unknown parameter.

---

# Bayesian viewpoint

- Use **probability distribution** to describe a person’s beliefs about the uncertainty in `$p$`.

- This distribution reflects the person’s **subjective prior** opinion about `$p$`.
- A random sample will be taken.

- Before that, research on the sleeping habits to build a prior distribution (e.g. from internet articles or newspaper).

- Prior beliefs: `$p < 0.5$`, can be any value between 0 and 0.5.

---

# Bayesian viewpoint

- Suppose our prior density for `$p$` is `$g(p)$`.

- Regard "success" as "sleeping at least 8 hours".

- We take a random sample with `$s$` successes and `$f$` failures.

- Now we have the likelihood function `$$L(p) \propto p^{s}(1-p)^{f}, 0<p<1.$$`

- The posterior for `$p$` can be obtained by Bayes' rule: `$$g(p|\text{data}) \propto g(p)L(p).$$`

- Let's try three different choices of the prior density.

---
class: inverse, center, middle

# Using a Discrete Prior

---

# Discrete Prior

- List all possible values for `$p$` and assign weights for them.

- `$p$` could be 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95.

- Their weights are 1, 5.2, 8, 7.2, 4.6, 2.1, 0.7, 0.1, 0, 0.

---
# Discrete Prior
.pull-left[

```r
p = seq(0.05, 0.95, by = 0.1)
prior = c(1, 5.2, 8, 7.2, 4.6, 2.1, 0.7, 0.1, 0, 0)
prior = prior/sum(prior)
plot(p, prior, type = "h", ylab="Prior Probability")
```
]

.pull-right[
![](BS-L2-thinking_files/figure-html/plot-label-out-1.png)
]

---
# Likelihood

- In our example, 11 of 27 students sleep a sufficient number of hours.

- `$s = 11$` and `$f = 16.$`

- Likelihood function: `$$L(p) \propto p^{11}(1-p)^{16}.$$`

---
# Posterior

.pull-left[
In R, `pdisc()` from the **LearnBayes** package computes the posterior probabilities.

```r
library(LearnBayes); library(lattice)
data = c(11, 16)
post = pdisc(p, prior, data)
# round(cbind(p, prior, post),2)
PRIOR=data.frame("prior",p,prior)
POST=data.frame("posterior",p,post)
names(PRIOR)=c("Type","P","Probability")
names(POST)=c("Type","P","Probability")
data=rbind(PRIOR,POST)
xyplot(Probability~P|Type,data=data,layout=c(1,2),
       type="h",lwd=3,col="black")
```
]

.pull-right[
![](BS-L2-thinking_files/figure-html/post-out-1.png)
]

---
class: inverse, center, middle

# Using a Beta Prior

---
# Beta Prior

- `$p$` is a continuous parameter.

- Alternative is to construct a density `$g(p)$` on [0, 1].

- Suppose she believes that the proportion is equally likely to be smaller or larger than `$p = 0.3$`.

- And she is 90% confident that `$p$` is less than 0.5.

- A convenient family of densities for a proportion is `$$g(p) \propto p^{a-1}(1-p)^{b-1}, 0<p<1.$$`

- **How to select `$a$` and `$b$`?**

---
# Beta Prior

- Here the person believes that the median and 90th percentiles of the proportion are given, respectively, by 0.3 and 0.5.

- ` beta.select()` in R can find the shape parameters of the beta density that match this prior knowledge.

```r
quantile2=list(p=.9,x=.5)
quantile1=list(p=.5,x=.3)
beta.select(quantile1,quantile2)
```

```
## [1] 3.26 7.19
```

---
# Posterior

- Combining this beta prior with the likelihood function, one can show that the posterior density is also of the beta form with updated parameters `$a + s$` and `$b + f$`.

`$$g(p | \text { data }) \propto p^{a+s-1}(1-p)^{b+f-1}, 0<p<1.$$`

- This is an example of a **conjugate** analysis, where the prior and posterior densities have the same functional form.

-  the prior, likelihood, and posterior are all in the beta family.

---
# Posterior

.pull-left[

```r
# Plot beta prior, likelihood and posterior
a = 3.26; b = 7.19; s = 11; f = 16
*curve(dbeta(x,a+s,b+f), from=0, to=1,
*     xlab="p",ylab="Density",lty=1,lwd=4)
curve(dbeta(x,s+1,f+1),add=TRUE,lty=2,lwd=4)
curve(dbeta(x,a,b),add=TRUE,lty=3,lwd=4)
legend(.7,4,c("Prior","Likelihood","Posterior"),
       lty=c(3,2,1),lwd=c(3,3,3))
```
]

.pull-right[
![](BS-L2-thinking_files/figure-html/post-out-1.png)
]

---
# Making Inferences via Posterior

**1. Is it likely that the proportion of heavy sleepers is greater than 0.5?**

**2. What is the 90% interval estimate for `$p$`?**

???

```r
# Exact inferences
1 - pbeta(0.5, a + s, b + f)
```

```
## [1] 0.0690226
```

```r
qbeta(c(0.05, 0.95), a + s, b + f)
```

```
## [1] 0.2555267 0.5133608
```

```r
# Inferences based on simulation
ps = rbeta(1000, a + s, b + f)
hist(ps,xlab="p",main="")
```

![](BS-L2-thinking_files/figure-html/unnamed-chunk-2-1.png)

```r
sum(ps >= 0.5)/1000
```

```
## [1] 0.061
```

```r
quantile(ps, c(0.05, 0.95))
```

```
##        5%       95% 
## 0.2628391 0.5088270
```

---
class: inverse, center, middle

# Using a Histogram Prior

---
# Posterior Computation

### A "brute-force" method of summarizing posterior computations for an prior `$g(p)$`.

1. Choose a grid of values of `$p$` over an interval that covers the posterior density.

2. Compute the product of the likelihood `$L(p)$` and the prior `$g(p)$` on the grid.

3. Normalize by dividing each product by the sum of the products. In this step, we are approximating the posterior density by a discrete probability distribution on the grid.

4. Take a random sample with replacement from the discrete distribution.

---
# Histogram Prior

Convenient to state the prior beliefs about `$p$` by dividing it into ten subintervals (0, 0.1), (0.1, 0.2), ..., (0.9, 1), and assign probabilities for them.

.pull-left[

```r
midpt = seq(0.05, 0.95, by = 0.1)
prior = c(1, 5.2, 8, 7.2, 4.6, 2.1, 0.7, 0.1, 0, 0)
prior = prior/sum(prior)
curve(histprior(x,midpt,prior), from=0, to=1,
      ylab="Prior density",ylim=c(0,.3))
```
]

.pull-right[
![](BS-L2-thinking_files/figure-html/hist-out-1.png)
]

---
# Posterior

.pull-left[

```r
curve(histprior(x,midpt,prior) * dbeta(x,s+1,f+1),
      from=0, to=1, ylab="Posterior density")
```

![](BS-L2-thinking_files/figure-html/hist-post-1.png)
]

.pull-right[

```r
p = seq(0, 1, length=500)
post = histprior(p, midpt, prior) * dbeta(p, s+1, f+1)
post = post/sum(post)
ps = sample(p, replace = TRUE, prob = post)
hist(ps, xlab="p", main="")
```

![](BS-L2-thinking_files/figure-html/hist-post-sampling-1.png)
]

---
class: inverse, center, middle

# Prediction

---
# Prediction

- Learnt about the population proportion of heavy sleepers `$p$`.

- **What is the number of heavy sleepers `$\tilde{y}$` in a future sample of `$m = 20$` students?**

- If the current beliefs about `$p$` are contained in the density `$g(p)$`, then the predictive density of `$\tilde{y}$` is given by `$$f(\tilde{y})=\int f(\tilde{y} | p) g(p) d p.$$`
    - If `$g$` is a prior density, then we refer to this as the **prior predictive density**.
    - If `$g$` is a posterior, then `$f$` is a **posterior predictive density**.
    
---
# Predictive Density with Discrete Prior

- Let `$f_B(y|n,p)$` denote the binomial sampling density given values of the sample size `$n$` and proportion `$p$`: `$$f_{B}(y | n, p)=\left(\begin{array}{l}{n} \\ {y}\end{array}\right) p^{y}(1-p)^{n-y}, y=0, \ldots, n.$$`

- Then the predictive probability of `$\tilde{y}$` successes in a future sample of size `$m$` is given by `$$f(\tilde{y})=\sum f_{B}\left(\tilde{y} | m, p_{i}\right) g\left(p_{i}\right).$$`

---
# Predictive Density with Discrete Prior

**What is the number of heavy sleepers `$\tilde{y}$` in a future sample of `$m = 20$` students?**

```r
p=seq(0.05, 0.95, by=.1)
prior = c(1, 5.2, 8, 7.2, 4.6, 2.1, 0.7, 0.1, 0, 0)
prior=prior/sum(prior)
m=20; ys=0:20
pred=pdiscp(p, prior, m, ys)
round(cbind(0:20,pred),3)
```

```
##           pred
##  [1,]  0 0.020
##  [2,]  1 0.044
##  [3,]  2 0.069
##  [4,]  3 0.092
##  [5,]  4 0.106
##  [6,]  5 0.112
##  [7,]  6 0.110
##  [8,]  7 0.102
##  [9,]  8 0.089
## [10,]  9 0.074
## [11,] 10 0.059
## [12,] 11 0.044
## [13,] 12 0.031
## [14,] 13 0.021
## [15,] 14 0.013
## [16,] 15 0.007
## [17,] 16 0.004
## [18,] 17 0.002
## [19,] 18 0.001
## [20,] 19 0.000
## [21,] 20 0.000
```

---
# Predictive Density with Beta Prior

- Suppose instead that we model our beliefs about `$p$` using a `$\text{beta}(a, b)$` prior.

- We can analytically integrate out `$p$` to get a closed-form expression for the predictive density: `$$\begin{aligned} f(\tilde{y}) &=\int f_{B}(\tilde{y} | m, p) g(p) d p \\ &=\left(\begin{array}{c}{m} \\ {\tilde{y}}\end{array}\right) \frac{B(a+\tilde{y}, b+m-\tilde{y})}{B(a, b)}, \tilde{y}=0, \ldots, m \end{aligned}$$`

---
# Predictive Density with Beta Prior

**What is the number of heavy sleepers `$\tilde{y}$` in a future sample of `$m = 20$` students?**

```r
ab=c(3.26, 7.19)
m=20; ys=0:20
pred=pbetap(ab, m, ys)
pred
```

```
##  [1] 1.812205e-02 4.511485e-02 7.248106e-02 9.456396e-02 1.084896e-01
##  [6] 1.135841e-01 1.106895e-01 1.015339e-01 8.821389e-02 7.280839e-02
## [11] 5.712007e-02 4.252974e-02 2.994441e-02 1.981684e-02 1.221463e-02
## [16] 6.917948e-03 3.527752e-03 1.568885e-03 5.764529e-04 1.575142e-04
## [21] 2.438293e-05
```

---
# Predictive Density with Beta Prior

- Bear in mind that one convenient way of computing a predictive density for *any* prior is by **simulation**.

- To obtain `$\tilde{y}$`, we first simulate, say, `$p^∗$` from `$g(p)$`, and then simulate `$\tilde{y}$` from the binomial distribution `$f_B(\tilde{y}|p)$`.

.pull-left[

```r
p=rbeta(1000, 3.26, 7.19)
y = rbinom(1000, 20, p)
freq=table(y)
ys=as.integer(names(freq))
predprob=freq/sum(freq)
plot(ys,predprob,type="h",xlab="y",ylab="Predictive Probability")
```
]

.pull-right[
![](BS-L2-thinking_files/figure-html/pred-out-1.png)
]

---
# Predictive Density with Beta Prior

- The probability that `$\tilde{y}$` falls in the interval {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} is 91.8%.

- The probability this sample proportion falls in the interval [1/20, 11/20] is 91.8%.

- Compare with the 91.8% probability interval for the population proportion p.

- In predicting a future sample proportion, there are **two sources of uncertainty**:
    1. the uncertainty in the value of `$p$`.
    2. the uncertainty in the binomial uncertainty in the value of `$\tilde{y}$`.

---
class: inverse, center, middle

# Summary

---
# Summary

- Different priors.

- Posterior distribution.

- Inference.

- Prediction.