layout: true --- class: inverse, center, middle background-image: url(../figs/titlepage16-9.png) background-size: cover <br> <br> # Bayesian Statistics and Computing ## Lecture 0: Course Introduction <img src="../figs/slides.png" width="150px"/> #### *Yanfei Kang | BSC | Beihang University* --- # General information - **Credits: ** 2 credits - **Lecturer: ** Yanfei Kang. For now, you can know me better via my personal webpage: http://yanfei.site. - **Tutor:** Bohan Zhang - **Language: ** Taught in Chinese. Materials are in English. - **Computer language: R** - **Reception hours: ** Questions concerned with this course can be asked via email: yanfeikang@buaa.edu.cn. - **Lecture notes: ** available on on *https://yanfei.site/teaching/bsc*. --- # References 1. [Advanced statistical computing](https://bookdown.org/rdpeng/advstatcomp/). Roger Peng. In progress. 2022. 2. [Statistical computing (Chinese version)](https://yanfei.site/docs/statscompbook/index.html). Yanfei Kang and Feng Li. In progress. 2023. 3. [Introduction to scientific programming and simulation using R](https://yanfei.site/docs/sc/spuRs.pdf). Owen Jones, Robert Maillardet, Andrew Robinson. 2nd Edition. CRC press. 2014. ISBN: 9781466569997. 4. [R programming for data science](https://bookdown.org/rdpeng/rprogdatascience/). Roger Peng. 2022. --- # Unit objectives ![The process of statistical modeling.](./sc.png) --- # Example: linear models Consider the simple linear model: `$$y = \beta_0 + \beta_1 x + \epsilon.$$` | Component | Implementation | | :-------- | :------------- | | Model | Linear regression | | Principle/Technique | Likelihood principle | | Algorithm | Maximization (analytical) | | Statistic | `\(\beta_0\)`, `\(\beta_1\)` | --- # Two statements #### *Computer numbers are not the same as real numbers, and the arithmetic operations on computer numbers are not exactly the same as those of ordinary arithmetic.* #### *The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.* --- # Key issues on the computer 1. Overflow - `NA`s or `Inf`s for very big numbers. 2. Underflow - Similar. 3. Near linear dependence in matrix computation. *Finite precision nature of computer arithmetic*: computers can represent a finite number of numbers. ```r .Machine$double.xmin #> [1] 2.225074e-308 ``` *Note that on most platforms smaller positive values than .Machine$double.xmin can occur. On a typical R platform the smallest positive double is about 5e-324.* --- # Textbooks vs. computers - Disconnection between textbook and what *should* be implemented on the computer. - Textbooks: convenient mathematical formulas. - Computers: directly translating these formulas into computer code is usually not advisable. --- # Using Logarithms Consider the univariate Normal distribution - We compute this value using `dnorm()` in R. ```r dnorm(1) #> [1] 0.2419707 ``` - How about `dnorm(40)`? - Usually you can get away with computing the log of this value (and with `dnorm()` you can set the option `log = TRUE`). - Computing densities with logarithms is much more numerically stable. - `\(\frac{f(x)}{g(x)}\)` `\(\Rightarrow\)` `\(\exp(\log f(x) - \log(g(x)))\)`. --- # Solution to linear regression Linear regression model in matrix form: `$$y = X\beta + \varepsilon.$$` - The solution is written as `$$\hat{\beta} = (X^\prime X)^{-1}X^\prime y.$$` - In R, this could be translated literally as ```r betahat <- solve(t(X) %*% X) %*% t(X) %*% y ``` - The formula presented above is *only* computed in textbooks. - The direct inverse of `\(X^\prime X\)` is very expensive computationally. --- # Simpler approaches - Take the normal equations, `$$X^\prime X\beta = X^\prime y$$` and solve them directly. - In R, we could write ```r solve(crossprod(X), crossprod(X, y)) ``` - Rather than compute the inverse of `\(X^\prime X\)`, we directly compute `\(\hat{\beta}\)` via [Gaussian elimination](https://en.wikipedia.org/wiki/LU_decomposition) `\(\Rightarrow\)` More numerically stable and being *much* faster. - R uses [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition). Its thin version can also well deals with colinearity (discussed later in this course). --- # Example - For example, here's a fake dataset of `\(n=5000\)` observations with `\(p=100\)` predictors. ```r set.seed(2023 - 9 - 8) n <- 5000 p <- 100 X <- matrix(rnorm(n * p), n, p) y <- rnorm(n) ``` ```r library(microbenchmark) microbenchmark(solve(t(X) %*% X) %*% t(X) %*% y, solve(crossprod(X), crossprod(X, y)), qr.coef(qr(X), y)) #> Unit: milliseconds #> expr min lq mean median uq max neval #> solve(t(X) %*% X) %*% t(X) %*% y 58.37165 63.81981 71.50734 68.02168 73.59466 160.72915 100 #> solve(crossprod(X), crossprod(X, y)) 27.60068 29.86148 30.90535 30.30662 31.13763 51.23527 100 #> qr.coef(qr(X), y) 38.34346 42.60331 48.53194 46.12665 51.58434 98.57493 100 ``` --- # Colinearity ```r W <- cbind(X, X[, 1] + rnorm(n, sd = 1e-14)) solve(crossprod(W), crossprod(W, y)) #> Error in solve.default(crossprod(W), crossprod(W, y)): system is computationally singular: reciprocal condition number = 1.392e-16 ``` --- # QR decomposition ```r qr.coef(qr(W), y) #> [1] 0.0207049604 -0.0107798798 -0.0053346446 -0.0139042768 -0.0165625816 -0.0247368954 -0.0055936520 -0.0050670925 #> [9] -0.0141895452 0.0014650371 0.0105871852 0.0236855005 -0.0009123881 0.0395699043 -0.0045459668 0.0156426008 #> [17] 0.0368986437 -0.0062241571 0.0003719454 -0.0001045447 0.0105437196 0.0032069072 -0.0099112962 0.0092091722 #> [25] -0.0056283760 -0.0101836666 0.0004409933 0.0079016686 0.0101626882 -0.0208409039 -0.0054924536 0.0064271925 #> [33] -0.0065590754 -0.0037863800 0.0072126283 -0.0013686652 -0.0261588414 -0.0045260288 -0.0204600332 -0.0108791075 #> [41] 0.0100509988 0.0021846634 0.0096076628 -0.0256744385 -0.0002072185 -0.0203092434 0.0199490920 0.0245456994 #> [49] 0.0141368132 -0.0075821183 -0.0331999816 0.0070728928 -0.0251681102 -0.0145501245 0.0124959304 0.0101724169 #> [57] -0.0041669575 0.0118368027 0.0058702645 0.0017322817 0.0008573064 -0.0068535688 -0.0101274630 -0.0093643459 #> [65] 0.0045194364 -0.0224489607 0.0175468581 0.0135508696 0.0048772551 0.0168445216 -0.0254587733 -0.0012700923 #> [73] -0.0106302388 -0.0017762102 0.0077490915 0.0110904920 -0.0144777066 -0.0079745076 0.0112005231 -0.0012697611 #> [81] -0.0024046193 -0.0094265032 0.0234028545 -0.0087827166 0.0097232715 -0.0188332734 -0.0151177605 0.0129435368 #> [89] -0.0086064316 0.0048073761 -0.0209634872 -0.0133873152 0.0203494525 0.0031885641 -0.0088269877 0.0064179865 #> [97] -0.0016902431 -0.0149829401 -0.0018118949 -0.0161594580 NA ``` --- # Multivariate normal density Now, can you do the following? 1. Write the `\(p\)`-dimensional multivariate normal density. 2. Discuss: how to evaluate it in computers? 3. Try to implement in R. *Hints:* 1. [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition): `\(\Sigma = R^TR\)`. 2. By this, you can also easily calculate the determinant of `\(\Sigma\)`. --- # Unit objectives 1. Master solution of nonlinear equations and optimization tools. 2. Generation of random numbers and simulation. 2. Understand Bayesian methods and computing. 3. Learn about numerical linear algebra. --- # Top ten algorithms In January 2000, *Computing in Science and Engineering magazine* published a list of 10 algorithms which (in their words) had the "greatest influence on the development and practice of science and engineering in the 20th century". ![The process of statistical modeling.](./ten.png) --- # About assignments 1. Please use **SPOC** for assignment submission. 2. Only one single pdf or html file is acceptable. I recommend you write your assignments using [Rmarkdown](https://rmarkdown.rstudio.com/), a productive notebook interface to weave together narrative text and code to produce elegantly formatted output. 3. Use meaningful file names: `Name_StudentNO_n.pdf`.