Multinomial Regression
For more detail on these types of models, see my document. In general we can use multinomial models for multi-category target variables, or more generally, multi-count data.
Standard (Categorical) Model
Data Setup
First, lets get some data. 200 entering high school students make program choices: general program, vocational program, and academic program. We will be interested in their choice, using their writing score as a proxy for scholastic ability and their socioeconomic status, a categorical variable of low, middle, and high values.
library(haven)
library(tidyverse)
library(mlogit)
program = read_dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta") %>%
as_factor() %>%
mutate(prog = relevel(prog, ref = "academic"))
head(program[, 1:5])# A tibble: 6 x 5
id female ses schtyp prog
<dbl> <fct> <fct> <fct> <fct>
1 45 female low public vocation
2 108 male middle public general
3 15 male high public vocation
4 67 male low public vocation
5 153 male middle public vocation
6 51 female high public general
# convert to long form for mlogit
program_long = program %>%
select(id, prog, ses, write) %>%
mlogit.data(
data = ,
shape = 'wide',
choice = 'prog',
id.var = 'id'
)
head(program_long)~~~~~~~
first 10 observations out of 600
~~~~~~~
id prog ses write chid alt idx
1 1 FALSE low 44 11 academic 11:emic
2 1 FALSE low 44 11 general 11:eral
3 1 TRUE low 44 11 vocation 11:tion
4 2 FALSE middle 41 9 academic 9:emic
5 2 FALSE middle 41 9 general 9:eral
6 2 TRUE middle 41 9 vocation 9:tion
7 3 TRUE low 65 159 academic 159:emic
8 3 FALSE low 65 159 general 159:eral
9 3 FALSE low 65 159 vocation 159:tion
10 4 TRUE low 50 30 academic 30:emic
~~~ indexes ~~~~
chid id alt
1 11 1 academic
2 11 1 general
3 11 1 vocation
4 9 2 academic
5 9 2 general
6 9 2 vocation
7 159 3 academic
8 159 3 general
9 159 3 vocation
10 30 4 academic
indexes: 1, 1, 2
We go ahead and run a model via mlogit for later comparison.
fit_mlogit = mlogit(prog ~ 1| write + ses, data = program_long)
mlogit_coefs = coef(fit_mlogit)[c(1,5,7,3,2,6,8,4)] # reorderFunction
Multinomial model via maximum likelihood
multinom_ml <- function(par, X, y) {
levs = levels(y)
ref = levs[1] # reference level (category label 1)
y0 = y == ref
y1 = y == levs[2] # category 2
y2 = y == levs[3] # category 3
beta = matrix(par, ncol = 2)
# more like mnlogit package depiction in its function
# V1 = X %*% beta[ ,1]
# V2 = X %*% beta[ ,2]
# ll = sum(-log(1 + exp(V1) + exp(V2))) + sum(V1[y1], V2[y2])
V = X %*% beta # a vectorized approach
baseProbVec = 1 / (1 + rowSums(exp(V))) # reference group probabilities
loglik = sum(log(baseProbVec)) + crossprod(c(V), c(y1, y2))
loglik
}fit = optim(
runif(8,-.1, .1),
multinom_ml,
X = model.matrix(prog ~ ses + write, data = program),
y = program$prog,
control = list(
maxit = 1000,
reltol = 1e-12,
ndeps = rep(1e-8, 8),
trace = TRUE,
fnscale = -1,
type = 3
),
method = 'BFGS'
)initial value 638.963532
iter 10 value 180.008322
final value 179.981726
converged
# fit$parComparison
An initial comparison.
| fit_coefs | mlogit_coefs | |
|---|---|---|
| (Intercept):general | 2.8522 | 2.8522 |
| sesmiddle:general | -0.5333 | -0.5333 |
| seshigh:general | -1.1628 | -1.1628 |
| write:general | -0.0579 | -0.0579 |
| (Intercept):vocation | 5.2182 | 5.2182 |
| sesmiddle:vocation | 0.2914 | 0.2914 |
| seshigh:vocation | -0.9827 | -0.9827 |
| write:vocation | -0.1136 | -0.1136 |
The following uses dmultinom for the likelihood, similar to other modeling demonstrations in this document.
X = model.matrix(prog ~ ses + write, data = program)
y = program$prog
pars = matrix(fit$par, ncol = 2)
V = X %*% pars
acadprob = 1 / (1+rowSums(exp(V)))
fitnonacad = exp(V) * matrix(rep(acadprob, 2), ncol = 2)
fits = cbind(acadprob, fitnonacad)
yind = model.matrix( ~ -1 + prog, data = program)# because dmultinom can't take matrix for prob
ll = 0
for (i in 1:200){
ll = ll + dmultinom(yind[i, ], size = 1, prob = fits[i, ], log = TRUE)
}
ll[1] -179.9817
fit$value[1] -179.9817
logLik(fit_mlogit)'log Lik.' -179.9817 (df=8)
Alternative specific and constant variables
Now we add alternative specific and alternative constant variables to the
previous individual specific covariates.. In this example, price is
alternative invariant (Z) income is individual/alternative specific (X),
and catch is alternative specific (Y).
We can use the fish data from the mnlogit
package.
library(mnlogit) # note we are now using mnlogit
data(Fish)
head(Fish) mode income alt price catch chid
1.beach FALSE 7083.332 beach 157.930 0.0678 1
1.boat FALSE 7083.332 boat 157.930 0.2601 1
1.charter TRUE 7083.332 charter 182.930 0.5391 1
1.pier FALSE 7083.332 pier 157.930 0.0503 1
2.beach FALSE 1250.000 beach 15.114 0.1049 2
2.boat FALSE 1250.000 boat 10.534 0.1574 2
fm = formula(mode ~ price | income | catch)
fit_mnlogit = mnlogit(fm, Fish)
# fit_mnlogit = mlogit(fm, Fish)
# summary(fit_mnlogit)The likelihood function.
multinom_ml <- function(par, X, Y, Z, respVec, choice) {
# Args-
# X dim nrow(Fish)/K x p + 1 (intercept)
# Z, Y nrow(N); Y has alt specific coefs; then for Z ref group dropped so nrow = nrow*(K-1)/K
# for ll everything through previous X the same
# then calc probmat for Y and Z, add to X probmat, and add to base
N = sum(choice)
K = length(unique(respVec))
levs = levels(respVec)
xpar = matrix(par[1:6], ncol = K-1)
ypar = matrix(par[7:10], ncol = K)
zpar = matrix(par[length(par)], ncol = 1)
# Calc X
Vx = X %*% xpar
# Calc Y (mnlogit finds N x 1 results by going through 1:N, N+1:N*2 etc; then
# makes 1 vector, then subtracts the first 1:N from whole vector, then makes
# Nxk-1 matrix with N+1:end values (as 1:N are just zero)); creating the
# vector and rebuilding the matrix is unnecessary though
Vy = sapply(1:K, function(alt)
Y[respVec == levs[alt], , drop = FALSE] %*% ypar[alt])
Vy = Vy[,-1] - Vy[,1]
# Calc Z
Vz = Z %*% zpar
Vz = matrix(Vz, ncol = 3)
# all Vs must fit into N x K -1 matrix where N is nobs (i.e. individuals)
V = Vx + Vy + Vz
ll0 = crossprod(c(V), choice[-(1:N)])
baseProbVec <- 1 / (1 + rowSums(exp(V)))
loglik = sum(log(baseProbVec)) + ll0
loglik
# note fitted values via
# fitnonref = exp(V) * matrix(rep(baseProbVec, K-1), ncol = K-1)
# fitref = 1-rowSums(fitnonref)
# fits = cbind(fitref, fitnonref)
}inits = runif(11, -.1, .1)
mdat = mnlogit(fm, Fish)$model # this data already ordered!As X has a constant value across alternatives, the coefficients regard the selection of the alternative relative to reference.
X = cbind(1, mdat[mdat$`_Alt_Indx_` == 'beach', 'income'])
dim(X)[1] 1182 2
head(X) [,1] [,2]
[1,] 1 7083.332
[2,] 1 1250.000
[3,] 1 3750.000
[4,] 1 2083.333
[5,] 1 4583.332
[6,] 1 4583.332
Y will use the complete data to start. Coefficients will be differences from the reference alternative coefficient.
Y = as.matrix(mdat[, 'catch', drop = FALSE])
dim(Y)[1] 4728 1
Z are difference scores from reference group.
Z = as.matrix(mdat[mdat$`_Alt_Indx_` != 'beach', 'price', drop = FALSE])
Z = Z - mdat[mdat$`_Alt_Indx_` == 'beach', 'price']
dim(Z)[1] 3546 1
respVec = mdat$`_Alt_Indx_` # first 10 should be 0 0 1 0 1 0 0 0 1 1 after beach droppedmultinom_ml(inits, X, Y, Z, respVec, choice = mdat$mode) [,1]
[1,] -162384.5
fit = optim(
par = rep(0, 11),
multinom_ml,
X = X,
Y = Y,
Z = Z,
respVec = respVec,
choice = mdat$mode,
control = list(
maxit = 1000,
reltol = 1e-12,
ndeps = rep(1e-8, 11),
trace = TRUE,
fnscale = -1,
type = 3
),
method = 'BFGS'
)initial value 1638.599935
iter 10 value 1253.603448
iter 20 value 1199.143447
final value 1199.143445
converged
Comparison
Compare fits.
| fit_coefs | mnlogit_coefs |
|---|---|
| 0.842 | 0.842 |
| 0.000 | 0.000 |
| 2.155 | 2.155 |
| 0.000 | 0.000 |
| 1.043 | 1.043 |
| 0.000 | 0.000 |
| 3.118 | 3.118 |
| 2.542 | 2.542 |
| 0.759 | 0.759 |
| 2.851 | 2.851 |
| -0.025 | -0.025 |
| fit_ll | mnlogit_ll |
|---|---|
| -1199.143 | -1199.143 |
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/multinomial.R