School of Economics and Management
Beihang University
http://yanfei.site

Quasi-Newton Methods

  • One of the most difficult parts of the Newton method is working out the derivatives especially the Hessian.
  • However methods can be used to approximate the Hessian and also the gradient.
  • These are known as Quasi-Newton Methods.
  • In general they will converge slower than pure Newton methods.

The BFGS Algorithm

Introduced over several papers by Broyden, Fletcher, Goldfarb and Shanno. It is the most popular Quasi-Newton algorithm.

  • Recall Newton iteration: \[x_{n+1} = x_n - f^{''}(x_n)^{-1}f^{'}(x_n).\]
  • Is there some matrix to replace \(f^{''}(x_n)\) or \(f^{''}(x_n)^{-1}\)?
  • Can we use a revised iteration: \(x_{n+1} = x_n - B_n^{-1}f^{'}(x_n),\) where \(B_n\) is simpler to compute but still allows the algorithm to converge quickly?

The BFGS Algorithm

  • The idea with Quasi-Newton is to find a solution \(B_n\) to the problem \[f^{'}(x_n) - f^{'}(x_{n-1}) = B_n(x_n - x_{n-1}).\]
  • Let \(y_n = f^{'}(x_n) - f^{'}(x_{n-1})\) and \(s_n = x_n - x_{n-1}\), one updating procedures for \(B_n\): \[B_n = B_{n-1} + \frac{y_ny_n^{'}}{y_n^{'}s_n} - \frac{B_{n-1}s_n s_n^{'}B_{n-1}^{'}}{s_n^{'}B_{n-1}s_n}.\]

The L-BFGS-B Algorithm

  • The R function optim() also has a variation called L-BFGS-B.
  • The L-BFGS-B uses less computer memory than BFGS and allows for box constraints.

Box Constrains

  • Box constraints have the form \[l_i \leq x_i \leq u_i,~ \forall i.\]
  • In statistics this can be very useful. Often parameters are constrained.
    • Variance must be greater than 0.
    • For a stationary AR(1), coefficients must be between -1 and 1.
    • Weights in a portfolio must be between 0 and 1.

optim() in R

  • optim() requires at least two inputs.
    • Initial values
    • The function that needs to be optimized
  • By default it minimises a function.
  • A function that computes the gradient vector can also be provided.
  • The optimization method can be set (choices include BFGS, L-BFGS-B and Nelder-Mead) .
  • Lower and upper bounds can be set through the arguments lower and upper if the L-BFGS-B method is used.

optim() in R

  • Further arguments can be passed in an argument called control.
  • Some things that can be included in this list are
    • Maximum number of iterations (maxit)
    • Information about the algorithm (trace)
    • How often to display information about the algorithm (REPORT)

optim() in R

  • The result of optim can be saved in an object that is a list containing
    • The value of the function at the turning point (value)
    • The optimal parameters (par)
    • Useful information about whether the algorithm has converged (convergence)
  • For all algorithms convergence = 0 if the algorithm has converged (slightly confusing).

Lab Session 8

Use optim() to carry out maximum likelihood for the Logistic regression model.

References