Linear Regression
We start our demonstrations with a standard regression model via maximum likelihood or least squares loss. Also included are examples for QR decomposition and normal equations. This can serve as an entry point for those starting out in the wider world of computational statistics, as maximum likelihood is the fundamental approach used in most applied statistics, but which is also a key aspect of the Bayesian approach. Least squares loss is not confined to the standard regression setting, but is widely used in more predictive/‘algorithmic’ approaches e.g. in machine learning and elsewhere. You can find a Python version of the code in the supplemental section.
Data Setup
We will simulate some data to make our results known and easier to manipulate.
library(tidyverse)
set.seed(123) # ensures replication
# predictors and target
= 100 # sample size
N = 2 # number of desired predictors
k = matrix(rnorm(N * k), ncol = k)
X = -.5 + .2*X[, 1] + .1*X[, 2] + rnorm(N, sd = .5) # increasing N will get estimated values closer to these
y
= data.frame(X, y) dfXy
Functions
A maximum likelihood approach.
<- function(par, X, y) {
lm_ml # par: parameters to be estimated
# X: predictor matrix with intercept column
# y: target
# setup
= par[-1] # coefficients
beta = par[1] # error variance
sigma2 = sqrt(sigma2)
sigma = nrow(X)
N
# linear predictor
= X %*% beta # linear predictor
LP = LP # identity link in the glm sense
mu
# calculate likelihood
= dnorm(y, mean = mu, sd = sigma, log = TRUE) # log likelihood
L # L = -.5*N*log(sigma2) - .5*(1/sigma2)*crossprod(y-mu) # alternate log likelihood form
-sum(L) # optim by default is minimization, and we want to maximize the likelihood
# (see also fnscale in optim.control)
}
An approach via least squares loss function.
<- function(par, X, y) {
lm_ls # arguments-
# par: parameters to be estimated
# X: predictor matrix with intercept column
# y: target
# setup
= par # coefficients
beta
# linear predictor
= X %*% beta # linear predictor
LP = LP # identity link
mu
# calculate least squares loss function
= crossprod(y - mu)
L }
Estimation
Setup for use with optim.
= cbind(1, X) X
Initial values. Note we’d normally want to handle the sigma differently as it’s
bounded by zero, but we’ll ignore for demonstration. Also sigma2
is not
required for the LS approach as it is the objective function.
= c(1, rep(0, ncol(X)))
init names(init) = c('sigma2', 'intercept', 'b1', 'b2')
= optim(
fit_ML par = init,
fn = lm_ml,
X = X,
y = y,
control = list(reltol = 1e-8)
)
= optim(
fit_LS par = init[-1],
fn = lm_ls,
X = X,
y = y,
control = list(reltol = 1e-8)
)
= fit_ML$par
pars_ML = c(sigma2 = fit_LS$value / (N-k-1), fit_LS$par) # calculate sigma2 and add pars_LS
Comparison
Compare to lm
which uses QR decomposition.
= lm(y ~ ., dfXy) fit_lm
Example of QR. Not shown.
# QRX = qr(X)
# Q = qr.Q(QRX)
# R = qr.R(QRX)
# Bhat = solve(R) %*% crossprod(Q, y)
# alternate: qr.coef(QRX, y)
sigma2 | intercept | b1 | b2 | |
---|---|---|---|---|
fit_ML | 0.219 | -0.432 | 0.133 | 0.112 |
fit_LS | 0.226 | -0.432 | 0.133 | 0.112 |
fit_lm | 0.226 | -0.432 | 0.133 | 0.112 |
The slight difference in sigma is roughly dividing by N vs. N-k-1 in the
traditional least squares approach. It diminishes with increasing N as both tend
toward whatever sd^2
you specify when creating the y
target above.
Compare to glm, which by default assumes gaussian family with identity link
and uses lm.fit
.
= glm(y ~ ., data = dfXy)
fit_glm summary(fit_glm)
Call:
glm(formula = y ~ ., data = dfXy)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.93651 -0.33037 -0.06222 0.31068 1.03991
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.43247 0.04807 -8.997 1.97e-14 ***
X1 0.13341 0.05243 2.544 0.0125 *
X2 0.11191 0.04950 2.261 0.0260 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.2262419)
Null deviance: 24.444 on 99 degrees of freedom
Residual deviance: 21.945 on 97 degrees of freedom
AIC: 140.13
Number of Fisher Scoring iterations: 2
Via normal equations.
= solve(t(X) %*% X) %*% t(X) %*% y # coefficients coefs
Compare.
sqrt(crossprod(y - X %*% coefs) / (N - k - 1))
summary(fit_lm)$sigma
sqrt(fit_glm$deviance / fit_glm$df.residual)
c(sqrt(pars_ML[1]), sqrt(pars_LS[1]))
# rerun by adding 3-4 zeros to the N
[,1]
[1,] 0.4756489
[1] 0.4756489
[1] 0.4756489
sigma2 sigma2
0.4684616 0.4756490
Python
The above is available as a Python demo in the supplemental section.
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/standard_lm.R