Nelder-Mead
This is based on the pure Python implementation by François Chollet, also found in the supplemental section. This is mostly just an academic exercise on my part. I’m not sure how much one would use the basic NM for many situations. In my experience BFGS and other approaches would be faster, more accurate, and less sensitive to starting values for the types of problems I’ve played around with. Others who actually spend their time researching such things seem to agree.
There were two issues regarding the original code on GitHub, and I’ve implemented the suggested corrections with notes. The initial R function code is not very R-like, as the goal was to keep more similar to the original Python for comparison, which used a list approach. I also provide a more R-like/cleaner version that uses matrices instead of lists, but which still sticks the same approach for the most part. For both functions, comparisons are made using the optimx package, but feel free to use base R’s optim instead.
Functions
First Version
f
function to optimize, must return a scalar score and operate over an array of the same dimensions as x_startx_start
initial positionstep
look-around radius in initial stepno_improve_thr
See no_improv_breakno_improv_break
break after no_improv_break iterations with an improvement lower than no_improv_thrmax_iter
always break after this number of iterations. Set it to 0 to loop indefinitely.alpha
parameters of the algorithm (see Wikipedia page for reference)gamma
parameters of the algorithm (see Wikipedia page for reference)rho
parameters of the algorithm (see Wikipedia page for reference)sigma
parameters of the algorithm (see Wikipedia page for reference)verbose
Print iterations?
This function returns the best parameter array and best score.
<- function(
nelder_mead
f,
x_start,step = 0.1,
no_improve_thr = 1e-12,
no_improv_break = 10,
max_iter = 0,
alpha = 1,
gamma = 2,
rho = 0.5,
sigma = 0.5,
verbose = FALSE
) {# init
= length(x_start)
dim = f(x_start)
prev_best = 0
no_improv = list(list(x_start = x_start, prev_best = prev_best))
res
for (i in 1:dim) {
= x_start
x = x[i] + step
x[i] = f(x)
score = append(res, list(list(x_start = x, prev_best = score)))
res
}
# simplex iter
= 0
iters
while (TRUE) {
# order
= sapply(res, `[[`, 2)
idx = res[order(idx)] # ascending order
res = res[[1]][[2]]
best
# break after max_iter
if (max_iter > 0 & iters >= max_iter) return(res[[1]])
= iters + 1
iters
# break after no_improv_break iterations with no improvement
if (verbose) message(paste('...best so far:', best))
if (best < (prev_best - no_improve_thr)) {
= 0
no_improv = best
prev_best else {
} = no_improv + 1
no_improv
}
if (no_improv >= no_improv_break) return(res[[1]])
# centroid
= rep(0, dim)
x0 for (tup in 1:(length(res)-1)) {
for (i in 1:dim) {
= x0[i] + res[[tup]][[1]][i] / (length(res)-1)
x0[i]
}
}
# reflection
= x0 + alpha * (x0 - res[[length(res)]][[1]])
xr = f(xr)
rscore if (res[[1]][[2]] <= rscore &
< res[[length(res)-1]][[2]]) {
rscore length(res)]] = list(xr, rscore)
res[[next
}
# expansion
if (rscore < res[[1]][[2]]) {
# xe = x0 + gamma*(x0 - res[[length(res)]][[1]]) # issue with this
= x0 + gamma * (xr - x0)
xe = f(xe)
escore if (escore < rscore) {
length(res)]] = list(xe, escore)
res[[next
else {
} length(res)]] = list(xr, rscore)
res[[next
}
}
# contraction
# xc = x0 + rho*(x0 - res[[length(res)]][[1]]) # issue with wiki consistency for rho values (and optim)
= x0 + rho * (res[[length(res)]][[1]] - x0)
xc = f(xc)
cscore if (cscore < res[[length(res)]][[2]]) {
length(res)]] = list(xc, cscore)
res[[next
}
# reduction
= res[[1]][[1]]
x1 = list()
nres for (tup in res) {
= x1 + sigma * (tup[[1]] - x1)
redx = f(redx)
score = append(nres, list(list(redx, score)))
nres
}
= nres
res
}
res }
Example
The function to minimize.
= function(x) {
f sin(x[1]) * cos(x[2]) * (1 / (abs(x[3]) + 1))
}
Estimate.
nelder_mead(
f, c(0, 0, 0),
max_iter = 1000,
no_improve_thr = 1e-12
)
[[1]]
[1] -1.570797e+00 -2.235577e-07 1.637460e-14
[[2]]
[1] -1
Compare to optimx. You may see warnings.
::optimx(
optimxpar = c(0, 0, 0),
fn = f,
method = "Nelder-Mead",
control = list(
alpha = 1,
gamma = 2,
beta = 0.5,
maxit = 1000,
reltol = 1e-12
) )
p1 p2 p3 value fevals gevals niter convcode kkt1 kkt2 xtime
Nelder-Mead -1.570796 1.394018e-08 1.088215e-16 -1 861 NA NA 0 TRUE TRUE 0
A Regression Model
I find a regression model to be more applicable/intuitive for my needs, so provide an example for that case.
Data Setup
library(tidyverse)
set.seed(8675309)
= 500
N = 5 # number of predictors
n_preds = cbind(1, matrix(rnorm(N * n_preds), ncol = n_preds))
X = runif(ncol(X), -1, 1)
beta = X %*% beta + rnorm(nrow(X)) y
Least squares loss function to minimize.
= function(b) {
f crossprod(y - X %*% b)[,1] # if using optimx need scalar
}
= nelder_mead(
fit_nm
f, runif(ncol(X)),
max_iter = 2000,
no_improve_thr = 1e-12,
verbose = FALSE
)
Comparison
Compare to optimx.
= optimx::optimx(
fit_nm_optimx runif(ncol(X)),
fn = f, # model function
method = 'Nelder-Mead',
control = list(
alpha = 1,
gamma = 2,
beta = 0.5,
#rho
maxit = 2000,
reltol = 1e-12
) )
We’ll add base R lm estimates.
p1 | p2 | p3 | p4 | p5 | p6 | value | |
---|---|---|---|---|---|---|---|
nm_func | -0.962 | 0.594 | 0.049 | 0.276 | 0.975 | -0.075 | 501.315 |
nm_optimx | -0.962 | 0.594 | 0.049 | 0.276 | 0.975 | -0.075 | 501.315 |
lm | -0.962 | 0.594 | 0.049 | 0.276 | 0.975 | -0.075 | 501.315 |
Second Version
This is a more natural R approach in my opinion.
= function(
nelder_mead2
f,
x_start,step = 0.1,
no_improve_thr = 1e-12,
no_improv_break = 10,
max_iter = 0,
alpha = 1,
gamma = 2,
rho = 0.5,
sigma = 0.5,
verbose = FALSE
) {
# init
= length(x_start)
npar = npar + 1
nc = f(x_start)
prev_best = 0
no_improv = matrix(c(x_start, prev_best), ncol = nc)
res colnames(res) = c(paste('par', 1:npar, sep = '_'), 'score')
for (i in 1:npar) {
= x_start
x = x[i] + step
x[i] = f(x)
score = rbind(res, c(x, score))
res
}
# simplex iter
= 0
iters
while (TRUE) {
# order
= res[order(res[, nc]), ] # ascending order
res = res[1, nc]
best
# break after max_iter
if (max_iter & iters >= max_iter) return(res[1, ])
= iters + 1
iters
# break after no_improv_break iterations with no improvement
if (verbose) message(paste('...best so far:', best))
if (best < (prev_best - no_improve_thr)) {
= 0
no_improv = best
prev_best else {
} = no_improv + 1
no_improv
}
if (no_improv >= no_improv_break)
return(res[1, ])
= nrow(res)
nr
# centroid: more efficient than previous double loop
= colMeans(res[(1:npar), -nc])
x0
# reflection
= x0 + alpha * (x0 - res[nr, -nc])
xr
= f(xr)
rscore
if (res[1, 'score'] <= rscore & rscore < res[npar, 'score']) {
= c(xr, rscore)
res[nr,] next
}
# expansion
if (rscore < res[1, 'score']) {
= x0 + gamma * (xr - x0)
xe = f(xe)
escore if (escore < rscore) {
= c(xe, escore)
res[nr, ] next
else {
} = c(xr, rscore)
res[nr, ] next
}
}
# contraction
= x0 + rho * (res[nr, -nc] - x0)
xc
= f(xc)
cscore
if (cscore < res[nr, 'score']) {
= c(xc, cscore)
res[nr,] next
}
# reduction
= res[1, -nc]
x1
= res
nres
for (i in 1:nr) {
= x1 + sigma * (res[i, -nc] - x1)
redx = f(redx)
score = c(redx, score)
nres[i, ]
}
= nres
res
} }
Example
The function to minimize.
= function(x) {
f sin(x[1]) * cos(x[2]) * (1 / (abs(x[3]) + 1))
}
nelder_mead2(
f, c(0, 0, 0),
max_iter = 1000,
no_improve_thr = 1e-12
)
par_1 par_2 par_3 score
-1.570797e+00 -2.235577e-07 1.622809e-14 -1.000000e+00
::optimx(
optimxpar = c(0, 0, 0),
fn = f,
method = "Nelder-Mead",
control = list(
alpha = 1,
gamma = 2,
beta = 0.5,
maxit = 1000,
reltol = 1e-12
) )
p1 p2 p3 value fevals gevals niter convcode kkt1 kkt2 xtime
Nelder-Mead -1.570796 1.394018e-08 1.088215e-16 -1 861 NA NA 0 TRUE TRUE 0.001
A Regression Model
set.seed(8675309)
= 500
N = 5
n_preds = cbind(1, matrix(rnorm(N * n_preds), ncol = n_preds))
X = runif(ncol(X), -1, 1)
beta = X %*% beta + rnorm(nrow(X)) y
Least squares loss function to minimize.
= function(b) {
f crossprod(y - X %*% b)[,1] # if using optimx need scalar
}
= nelder_mead2(
fit_nm
f, runif(ncol(X)),
max_iter = 2000,
no_improve_thr = 1e-12
)
Comparison
Compare to optimx and lm as before.
= optimx::optimx(
fit_nm_optimx runif(ncol(X)),
fn = f,
method = 'Nelder-Mead',
control = list(
alpha = 1,
gamma = 2,
beta = 0.5,
maxit = 2000,
reltol = 1e-12
)1:(n_preds + 1)]
)[
= lm.fit(X, y)$coef fit_lm
nm_func | nm_optimx | lm | truth | |
---|---|---|---|---|
p1 | -0.962 | -0.962 | -0.962 | -0.909 |
p2 | 0.594 | 0.594 | 0.594 | 0.620 |
p3 | 0.049 | 0.049 | 0.049 | 0.074 |
p4 | 0.276 | 0.276 | 0.276 | 0.320 |
p5 | 0.975 | 0.975 | 0.975 | 0.956 |
p6 | -0.075 | -0.075 | -0.075 | -0.080 |
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/nelder_mead.R