layout: true --- class: inverse, center, middle background-image: url(../figs/titlepage16-9.png) background-size: cover <br> <br> # Bayesian Statistics and Computing ## Lecture 8: Quasi-Newton Methods <img src="../figs/slides.png" width="150px"/> #### *Yanfei Kang | BSC | Beihang University* --- # Quasi-Newton Methods - One of the most difficult parts of the Newton method is working out the derivatives especially the Hessian. - However methods can be used to approximate the Hessian and also the gradient. - These are known as Quasi-Newton Methods. - In general they will converge slower than pure Newton methods. --- # The BFGS Algorithm Introduced over several papers by Broyden, Fletcher, Goldfarb and Shanno. It is the most popular Quasi-Newton algorithm. - Recall Newton iteration: `$$x_{n+1} = x_n - f^{''}(x_n)^{-1}f^{'}(x_n).$$` - Is there some matrix to replace `\(f^{''}(x_n)\)` or `\(f^{''}(x_n)^{-1}\)`? - Can we use a revised iteration: `\(x_{n+1} = x_n - B_n^{-1}f^{'}(x_n),\)` where `\(B_n\)` is simpler to compute but still allows the algorithm to converge quickly? --- # The BFGS Algorithm - The idea with Quasi-Newton is to find a solution `\(B_n\)` to the problem `$$f^{'}(x_n) - f^{'}(x_{n-1}) = B_n(x_n - x_{n-1}).$$` - Let `\(y_n = f^{'}(x_n) - f^{'}(x_{n-1})\)` and `\(s_n = x_n - x_{n-1}\)`, one updating procedures for `\(B_n\)`: `$$B_n = B_{n-1} + \frac{y_ny_n^{'}}{y_n^{'}s_n} - \frac{B_{n-1}s_n s_n^{'}B_{n-1}^{'}}{s_n^{'}B_{n-1}s_n}.$$` --- # The L-BFGS-B Algorithm - The R function `optim()` also has a variation called L-BFGS-B. - The L-BFGS-B uses less computer memory than BFGS and allows for box constraints. --- # Box Constrains - Box constraints have the form `$$l_i \leq x_i \leq u_i,~ \forall i.$$` - In statistics this can be very useful. Often parameters are constrained. - Variance must be greater than 0. - For a stationary AR(1), coefficients must be between -1 and 1. - Weights in a portfolio must be between 0 and 1. --- # `optim()` in R - `optim()` requires at least two inputs. - Initial values - The function that needs to be optimized - By default it minimises a function. - A function that computes the gradient vector can also be provided. - The optimization method can be set (choices include BFGS, L-BFGS-B and Nelder-Mead). - Lower and upper bounds can be set through the arguments lower and upper if the L-BFGS-B method is used. --- # `optim()` in R - Further arguments can be passed in an argument called `control`. - Some things that can be included in this list are - Maximum number of iterations (`maxit`) - Information about the algorithm (`trace`) - How often to display information about the algorithm (`REPORT`) --- # `optim()` in R - The result of optim can be saved in an object that is a list containing - The value of the function at the turning point (`value`) - The optimal parameters (`par`) - Useful information about whether the algorithm has converged (`convergence`) - For all algorithms convergence = 0 if the algorithm has converged (slightly confusing). --- # Exercise Use `optim()` to carry out maximum likelihood for the linear regression model in our last lecture. --- # References Chapter 3.3 of the book ["Advanced Statistical Computing"](https://bookdown.org/rdpeng/advstatcomp/).