Extreme Learning Machine
A very simple implementation of an extreme learning machine for regression, which can be seen as a quick way to estimate a ‘good enough’ neural net, one that can be nearly as performant but with a lot less computational overhead. See elmNN and ELMR for some R package implementations. I add comparison to generalized additive models (elm/neural networks and GAMs are adaptive basis function models).
- http://www.extreme-learning-machines.org
- G.B. Huang, Q.Y. Zhu and C.K. Siew, Extreme Learning Machine: Theory and Applications.
Data Setup
One variable, complex function.
library(tidyverse)
library(mgcv)
set.seed(123)
= 5000
n = runif(n)
x # x = rnorm(n)
= sin(2*(4*x-2)) + 2* exp(-(16^2) * ((x-.5)^2))
mu = rnorm(n, mu, .3)
y
= data.frame(x, y)
d
qplot(x, y, color = I('#ff55001A'))
Motorcycle accident data.
data('mcycle', package = 'MASS')
= matrix(mcycle$times, ncol = 1)
times = mcycle$accel accel
Function
<- function(X, y, n_hidden = NULL, active_fun = tanh) {
elm # X: an N observations x p features matrix
# y: the target
# n_hidden: the number of hidden nodes
# active_fun: activation function
= ncol(X) + 1
pp1 = matrix(rnorm(pp1*n_hidden), pp1, n_hidden) # random weights
w0 = active_fun(cbind(1, scale(X)) %*% w0) # compute hidden layer
h = MASS::ginv(h) %*% y # find weights for hidden layer
B
= h %*% B # fitted values
fit
list(
fit = fit,
loss = crossprod(y - fit),
B = B,
w0 = w0
) }
Estimation
= as.matrix(x, ncol = 1)
X_mat
= elm(X_mat, y, n_hidden = 100)
fit_elm str(fit_elm)
List of 4
$ fit : num [1:5000, 1] -1.0239 0.7311 -0.413 0.0806 -0.4112 ...
$ loss: num [1, 1] 442
$ B : num [1:100, 1] 217 -608 1408 -1433 -4575 ...
$ w0 : num [1:2, 1:100] 0.35 0.814 -0.517 -2.692 -1.097 ...
ggplot(aes(x, y), data = d) +
geom_point(color = '#ff55001A') +
geom_line(aes(y = fit_elm$fit), color = '#00aaff')
cor(fit_elm$fit[,1], y)^2
[1] 0.8862518
= elm(times, accel, n_hidden = 100)
fit_elm_mcycle cor(fit_elm_mcycle$fit[,1], accel)^2
[1] 0.8122349
Comparison
We’ll compare to a generalized additive model with gaussian process approximation.
= gam(y ~ s(x, bs = 'gp', k = 20), data = d)
fit_gam summary(fit_gam)$r.sq
[1] 0.8856188
%>%
d mutate(fit_elm = fit_elm$fit,
fit_gam = fitted(fit_gam)) %>%
ggplot() +
geom_point(aes(x, y), color = '#ff55001A') +
geom_line(aes(x, y = fit_elm), color = '#1e90ff') +
geom_line(aes(x, y = fit_gam), color = '#990024')
= gam(accel ~ s(times), data = mcycle)
fit_gam_mcycle summary(fit_gam_mcycle)$r.sq
[1] 0.7832988
%>%
mcycle ggplot(aes(times, accel)) +
geom_point(color = '#ff55001A') +
geom_line(aes(y = fit_elm_mcycle$fit), color = '#1e90ff') +
geom_line(aes(y = fitted(fit_gam_mcycle)), color = '#990024')
Supplemental Example
Yet another example with additional covariates.
= gamSim(eg = 7, n = 10000) d
Gu & Wahba 4 term additive model, correlated predictors
= as.matrix(d[, 2:5])
X = d[, 1]
y
= c(10, 25, 100, 250, 500, 1000) n_nodes
The following estimation over multiple models will take several seconds.
= purrr::map(n_nodes, function(n) elm(X, y, n_hidden = n)) fit_elm
Now find the best fitting model.
# estimate
= which.min(map_dbl(fit_elm, function(x) x$loss))
best_loss = fit_elm[[best_loss]] fit_best
A quick check of the fit.
# str(fit_best)
# qplot(fit_best$fit[, 1], y, alpha = .2)
cor(fit_best$fit[, 1], y)^2
[1] 0.7241967
And compare again to mgcv. In this case, we’re comparing fit on test data of the same form.
= gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = d)
fit_gam gam.check(fit_gam)
Method: GCV Optimizer: magic
Smoothing parameter selection converged after 15 iterations.
The RMS GCV score gradient at convergence was 9.309879e-07 .
The Hessian was positive definite.
Model rank = 37 / 37
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(x0) 9.00 4.71 1.00 0.48
s(x1) 9.00 4.89 1.00 0.35
s(x2) 9.00 8.96 0.99 0.20
s(x3) 9.00 1.00 1.00 0.47
summary(fit_gam)$r.sq
[1] 0.6952978
= gamSim(eg = 7) # default n = 400 test_data0
Gu & Wahba 4 term additive model, correlated predictors
= cbind(1, scale(test_data0[, 2:5]))
test_data
# remember to use your specific activation function here
= tanh(test_data %*% fit_best$w0) %*% fit_best$B
elm_prediction = predict(fit_gam, newdata = test_data0)
gam_prediction
cor(data.frame(elm_prediction, gam_prediction), test_data0$y)^2
[,1]
elm_prediction 0.6873090
gam_prediction 0.7185687
Source
Original code available at: https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/elm.R