Many problems in business require something to be minimized or maximized.
- Maximizing Revenue
- Minimizing Costs
- Minimizing Delivery Time
- Maximizing Financial Returns
School of Economics and Management
Beihang University
http://yanfei.site
Many problems in business require something to be minimized or maximized.
The method tries to solve an equation in the form of \(f(x) = 0\) through the iterative procedure: \[x_{n+1} = x_n - \frac{f(x_n)}{f^{'}(x_n)}.\]
Please think about Taylor expansion.
Evaluate \(|f(x_{n+1})|\).
What did you learn from mistakes?
D()newton.raphson <- function(f, init.value, df = NULL, tol = 1e-5, maxiter = 1000) {
if (is.null(df)) {
df <- D(f, 'x')
}
niter <- 0
diff <- tol + 1
x <- init.value
while (diff >= tol && niter <= maxiter) {
niter <- niter + 1
fx <- eval(f)
dfx <- eval(df)
if (dfx == 0) {
warning("Slope is zero: no further improvement possible.")
break
}
diff <- -fx/dfx
x <- x + diff
diff <- abs(diff)
}
if (niter > maxiter) {
warning("Maximum number of iterations 'maxiter' was reached.")
}
return(list(root = x, f.root = fx, niter = niter, estim.prec = diff))
}
deriv()newton.raphson <- function(ftn, x0, tol = 1e-9, max.iter = 100) {
# Newton_Raphson algorithm for solving ftn(x)[1] == 0
# we assume that ftn is a function of a single variable that returns
# the function value and the first derivative as a vector of length 2
#
# x0 is the initial guess at the root
# the algorithm terminates when the function value is within distance
# tol of 0, or the number of iterations exceeds max.iter
# initialise
x <- x0
fx <- ftn(x)
iter <- 0
# continue iterating until stopping conditions are met
while ((abs(fx[1]) > tol) && (iter < max.iter)) {
x <- x - fx[1]/fx[2]
fx <- ftn(x)
iter <- iter + 1
cat("At iteration", iter, "value of x is:", x, "\n")
}
# output depends on success of algorithm
if (abs(fx[1]) > tol) {
cat("Algorithm failed to converge\n")
return(NULL)
} else {
cat("Algorithm converged\n")
return(x)
}
}
f <- expression(x^2 - 5)
df <- deriv(f, 'x', func = TRUE)
ftn <- function(x){
dfx <- df(x)
f <- dfx[1]
gradf <- attr(dfx, 'gradient')[1,]
return(c(f, gradf))
}
newton.raphson(ftn, 1)
Now try these functions:
Evaluate \(|f^{'}(x_{n+1})|\).
Three stopping rules can be used.
Simply take the partial derivatives and put them in a vector \[ \frac{\partial y}{\partial{\mathbf{x}}}= \left( \begin{array}{c} \frac{\partial y}{\partial x_1}\\ \frac{\partial y}{\partial x_2}\\ \vdots\\ \frac{\partial y}{\partial x_d} \end{array} \right)\] This is called the gradient vector.
The function \(y=x_1^2-x_1x_2+x_2^2+e^{x_2}\) has gradient vector
\[\frac{\partial y}{\partial{\mathbf{x}}}=\left(\begin{array}{c} 2x_1-x_2\\ -x_1+2x_2+e^{x_2} \end{array} \right).\]
Simply take the second order partial derivatives. This will give a matrix \[ \frac{\partial y}{\partial{\mathbf{x}}\partial{\mathbf{x}}'}= \left( \begin{array}{cccc} \frac{\partial^2 y}{\partial x_1^2}&\frac{\partial^2 y}{\partial x_1\partial x_2}&\cdots&\frac{\partial^2 y}{\partial x_1\partial x_d}\\ \frac{\partial^2 y}{\partial x_2\partial x_1}&\frac{\partial^2 y}{\partial x_2^2}&\cdots&\frac{\partial^2 y}{\partial x_2\partial x_d}\\ \vdots&\vdots&\ddots&\vdots\\ \frac{\partial^2 y}{\partial x_d\partial x_1}&\frac{\partial^2 y}{\partial x_d\partial x_2}&\cdots&\frac{\partial^2 y}{\partial x_d^2}\\ \end{array} \right).\]
This is called the Hessian matrix.
The function \(y=x_1^2-x_1x_2+x_2^2+e^{x_2}\) has Hessian matrix \[\frac{\partial y}{\partial{\mathbf{x}}\partial{ \mathbf{x}}'}=\left(\begin{array}{cc} 2 & -1\\ -1 & 2 + e^{x_2} \end{array} \right) \]
We can now generalise the update step in Newton's method: \[ \mathbf{x_{n+1}}=\mathbf{x_n}-\left(\frac{\partial^2 f({ \mathbf{x}})}{\partial {\mathbf{x}}\partial{\mathbf{x}}'}\right)^{-1} \frac{\partial f({\mathbf{x}})}{\partial {\mathbf{x}}}\]
Now write code to minimise \(y=x_1^2-x_1x_2+x_2^2+e^{x_2}\).
Assume you want to make a regression model \[ y_i = \beta_0 + \beta_1x_{i} + \epsilon_i,~\text{where}~\epsilon_i \sim N(0, 1). \]
## Generate some data
beta0 <- 1
beta1 <- 3
sigma <- 1
n <- 1000
x <- rnorm(n, 3, 1)
y <- beta0 + x * beta1 + rnorm(n, mean = 0, sd = sigma)
# plot(x, y, col = "blue", pch = 20)
## Make the log normal likelihood function
func = function(beta){
sum((y-beta[1]-beta[2]*x)^2)
}
grad = function(beta){
matrix(c(sum(-2*(y-beta[1]-beta[2]*x)),sum(-2*x*(y-beta[1]-beta[2]*x))),2,1)
}
hess = function(beta){
matrix(c(2*length(x),2*sum(x),2*sum(x),2*sum(x^2)),2,2)
}
## The optimization
source('./RCode/newton.R')
optimOut <- newton(function(beta){list(func(beta), grad(beta), hess(beta))}, c(1.1, 1.3))
beta0Hat <- optimOut[1]
beta1Hat <- optimOut[2]
yHat <- beta0Hat + beta1Hat * x
## Plot
plot(x, y, pch = 20, col = "blue")
points(sort(x), yHat[order(x)], type = "l", col = "red", lwd = 2)
myLM <- lm(y~x)
myLMCoef <- myLM$coefficients
yHatOLS <- myLMCoef[1] + myLMCoef[2] * x
plot(x, y, pch = 20, col = "blue")
points(sort(x), yHat[order(x)], type = "l", col = "red", lwd = 10)
points(sort(x), yHatOLS[order(x)], type = "l",
col = "blue", lty="dashed", lwd = 2, pch = 20)
Use Newton’s method to find the maximum likelihood estimate for the coefficients in a logistic regression. The steps are:
Chapters 10 and 12 of the book "Introduction to Scientific Programming and Simulation Using R".