Many problems in business require something to be minimized or maximized.
- Maximizing Revenue
- Minimizing Costs
- Minimizing Delivery Time
- Maximizing Financial Returns
School of Economics and Management
Beihang University
http://yanfei.site
Many problems in business require something to be minimized or maximized.
The method tries to solve an equation in the form of \(f(x) = 0\) through the iterative procedure: \[x_{n+1} = x_n - \frac{f(x_n)}{f^{'}(x_n)}.\]
Please think about Taylor expansion.
Evaluate \(|f(x_{n+1})|\).
What did you learn from mistakes?
D()
newton.raphson <- function(f, init.value, df = NULL, tol = 1e-5, maxiter = 1000) { if (is.null(df)) { df <- D(f, 'x') } niter <- 0 diff <- tol + 1 x <- init.value while (diff >= tol && niter <= maxiter) { niter <- niter + 1 fx <- eval(f) dfx <- eval(df) if (dfx == 0) { warning("Slope is zero: no further improvement possible.") break } diff <- -fx/dfx x <- x + diff diff <- abs(diff) } if (niter > maxiter) { warning("Maximum number of iterations 'maxiter' was reached.") } return(list(root = x, f.root = fx, niter = niter, estim.prec = diff)) }
deriv()
newton.raphson <- function(ftn, x0, tol = 1e-9, max.iter = 100) { # Newton_Raphson algorithm for solving ftn(x)[1] == 0 # we assume that ftn is a function of a single variable that returns # the function value and the first derivative as a vector of length 2 # # x0 is the initial guess at the root # the algorithm terminates when the function value is within distance # tol of 0, or the number of iterations exceeds max.iter # initialise x <- x0 fx <- ftn(x) iter <- 0 # continue iterating until stopping conditions are met while ((abs(fx[1]) > tol) && (iter < max.iter)) { x <- x - fx[1]/fx[2] fx <- ftn(x) iter <- iter + 1 cat("At iteration", iter, "value of x is:", x, "\n") } # output depends on success of algorithm if (abs(fx[1]) > tol) { cat("Algorithm failed to converge\n") return(NULL) } else { cat("Algorithm converged\n") return(x) } } f <- expression(x^2 - 5) df <- deriv(f, 'x', func = TRUE) ftn <- function(x){ dfx <- df(x) f <- dfx[1] gradf <- attr(dfx, 'gradient')[1,] return(c(f, gradf)) } newton.raphson(ftn, 1)
Now try these functions:
Evaluate \(|f^{'}(x_{n+1})|\).
Three stopping rules can be used.
Simply take the partial derivatives and put them in a vector \[ \frac{\partial y}{\partial{\mathbf{x}}}= \left( \begin{array}{c} \frac{\partial y}{\partial x_1}\\ \frac{\partial y}{\partial x_2}\\ \vdots\\ \frac{\partial y}{\partial x_d} \end{array} \right)\] This is called the gradient vector.
The function \(y=x_1^2-x_1x_2+x_2^2+e^{x_2}\) has gradient vector
\[\frac{\partial y}{\partial{\mathbf{x}}}=\left(\begin{array}{c} 2x_1-x_2\\ -x_1+2x_2+e^{x_2} \end{array} \right).\]
Simply take the second order partial derivatives. This will give a matrix \[ \frac{\partial y}{\partial{\mathbf{x}}\partial{\mathbf{x}}'}= \left( \begin{array}{cccc} \frac{\partial^2 y}{\partial x_1^2}&\frac{\partial^2 y}{\partial x_1\partial x_2}&\cdots&\frac{\partial^2 y}{\partial x_1\partial x_d}\\ \frac{\partial^2 y}{\partial x_2\partial x_1}&\frac{\partial^2 y}{\partial x_2^2}&\cdots&\frac{\partial^2 y}{\partial x_2\partial x_d}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial^2 y}{\partial x_d\partial x_1}&\frac{\partial^2 y}{\partial x_d\partial x_2}&\cdots&\frac{\partial^2 y}{\partial x_d^2}\\ \end{array} \right).\]
This is called the Hessian matrix.
The function \(y=x_1^2-x_1x_2+x_2^2+e^{x_2}\) has Hessian matrix \[\frac{\partial y}{\partial{\mathbf{x}}\partial{ \mathbf{x}}'}=\left(\begin{array}{cc} 2 & -1\\ -1 & 2 + e^{x_2} \end{array} \right) \]
We can now generalise the update step in Newton's method: \[ \mathbf{x_{n+1}}=\mathbf{x_n}-\left(\frac{\partial^2 f({ \mathbf{x}})}{\partial {\mathbf{x}}\partial{\mathbf{x}}'}\right)^{-1} \frac{\partial f({\mathbf{x}})}{\partial {\mathbf{x}}}\]
Now write code to minimise \(y=x_1^2-x_1x_2+x_2^2+e^{x_2}\).
Assume you want to make a regression model \[ y_i = \beta_0 + \beta_1x_{i} + \epsilon_i,~\text{where}~\epsilon_i \sim N(0, 1). \]
## Generate some data beta0 <- 1 beta1 <- 3 sigma <- 1 n <- 1000 x <- rnorm(n, 3, 1) y <- beta0 + x * beta1 + rnorm(n, mean = 0, sd = sigma) # plot(x, y, col = "blue", pch = 20) ## Make the log normal likelihood function func = function(beta){ sum((y-beta[1]-beta[2]*x)^2) } grad = function(beta){ matrix(c(sum(-2*(y-beta[1]-beta[2]*x)),sum(-2*x*(y-beta[1]-beta[2]*x))),2,1) } hess = function(beta){ matrix(c(2*length(x),2*sum(x),2*sum(x),2*sum(x^2)),2,2) } ## The optimization source('./RCode/newton.R') optimOut <- newton(function(beta){list(func(beta), grad(beta), hess(beta))}, c(1.1, 1.3)) beta0Hat <- optimOut[1] beta1Hat <- optimOut[2] yHat <- beta0Hat + beta1Hat * x ## Plot plot(x, y, pch = 20, col = "blue") points(sort(x), yHat[order(x)], type = "l", col = "red", lwd = 2)
myLM <- lm(y~x) myLMCoef <- myLM$coefficients yHatOLS <- myLMCoef[1] + myLMCoef[2] * x plot(x, y, pch = 20, col = "blue") points(sort(x), yHat[order(x)], type = "l", col = "red", lwd = 10) points(sort(x), yHatOLS[order(x)], type = "l", col = "blue", lty="dashed", lwd = 2, pch = 20)
Use Newton’s method to find the maximum likelihood estimate for the coefficients in a logistic regression. The steps are:
Chapters 10 and 12 of the book "Introduction to Scientific Programming and Simulation Using R".