School of Economics and Management
Beihang University
http://yanfei.site

## Quasi-Newton Methods

• One of the most difficult parts of the Newton method is working out the derivatives especially the Hessian.
• However methods can be used to approximate the Hessian and also the gradient.
• These are known as Quasi-Newton Methods.
• In general they will converge slower than pure Newton methods.

## The BFGS Algorithm

Introduced over several papers by Broyden, Fletcher, Goldfarb and Shanno. It is the most popular Quasi-Newton algorithm.

• Recall Newton iteration: $x_{n+1} = x_n - f^{''}(x_n)^{-1}f^{'}(x_n).$
• Is there some matrix to replace $$f^{''}(x_n)$$ or $$f^{''}(x_n)^{-1}$$?
• Can we use a revised iteration: $$x_{n+1} = x_n - B_n^{-1}f^{'}(x_n),$$ where $$B_n$$ is simpler to compute but still allows the algorithm to converge quickly?

## The BFGS Algorithm

• The idea with Quasi-Newton is to find a solution $$B_n$$ to the problem $f^{'}(x_n) - f^{'}(x_{n-1}) = B_n(x_n - x_{n-1}).$
• Let $$y_n = f^{'}(x_n) - f^{'}(x_{n-1})$$ and $$s_n = x_n - x_{n-1}$$, one updating procedures for $$B_n$$: $B_n = B_{n-1} + \frac{y_ny_n^{'}}{y_n^{'}s_n} - \frac{B_{n-1}s_n s_n^{'}B_{n-1}^{'}}{s_n^{'}B_{n-1}s_n}.$

## The L-BFGS-B Algorithm

• The R function optim() also has a variation called L-BFGS-B.
• The L-BFGS-B uses less computer memory than BFGS and allows for box constraints.

## Box Constrains

• Box constraints have the form $l_i \leq x_i \leq u_i,~ \forall i.$
• In statistics this can be very useful. Often parameters are constrained.
• Variance must be greater than 0.
• For a stationary AR(1), coefficients must be between -1 and 1.
• Weights in a portfolio must be between 0 and 1.

## optim() in R

• optim() requires at least two inputs.
• Initial values
• The function that needs to be optimized
• By default it minimises a function.
• A function that computes the gradient vector can also be provided.
• The optimization method can be set (choices include BFGS, L-BFGS-B and Nelder-Mead) .
• Lower and upper bounds can be set through the arguments lower and upper if the L-BFGS-B method is used.

## optim() in R

• Further arguments can be passed in an argument called control.
• Some things that can be included in this list are
• Maximum number of iterations (maxit)
• Information about the algorithm (trace)
• How often to display information about the algorithm (REPORT)

## optim() in R

• The result of optim can be saved in an object that is a list containing
• The value of the function at the turning point (value)
• The optimal parameters (par)
• Useful information about whether the algorithm has converged (convergence)
• For all algorithms convergence = 0 if the algorithm has converged (slightly confusing).

## Lab Session 8

Use optim() to carry out maximum likelihood for the Logistic regression model.