Multinomial Regression
For more detail on these types of models, see my document. In general we can use multinomial models for multi-category target variables, or more generally, multi-count data.
Standard (Categorical) Model
Data Setup
First, lets get some data. 200 entering high school students make program choices: general program, vocational program, and academic program. We will be interested in their choice, using their writing score as a proxy for scholastic ability and their socioeconomic status, a categorical variable of low, middle, and high values.
library(haven)
library(tidyverse)
library(mlogit)
= read_dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta") %>%
program as_factor() %>%
mutate(prog = relevel(prog, ref = "academic"))
head(program[, 1:5])
# A tibble: 6 x 5
id female ses schtyp prog
<dbl> <fct> <fct> <fct> <fct>
1 45 female low public vocation
2 108 male middle public general
3 15 male high public vocation
4 67 male low public vocation
5 153 male middle public vocation
6 51 female high public general
# convert to long form for mlogit
= program %>%
program_long select(id, prog, ses, write) %>%
mlogit.data(
data = ,
shape = 'wide',
choice = 'prog',
id.var = 'id'
)
head(program_long)
~~~~~~~
first 10 observations out of 600
~~~~~~~
id prog ses write chid alt idx
1 1 FALSE low 44 11 academic 11:emic
2 1 FALSE low 44 11 general 11:eral
3 1 TRUE low 44 11 vocation 11:tion
4 2 FALSE middle 41 9 academic 9:emic
5 2 FALSE middle 41 9 general 9:eral
6 2 TRUE middle 41 9 vocation 9:tion
7 3 TRUE low 65 159 academic 159:emic
8 3 FALSE low 65 159 general 159:eral
9 3 FALSE low 65 159 vocation 159:tion
10 4 TRUE low 50 30 academic 30:emic
~~~ indexes ~~~~
chid id alt
1 11 1 academic
2 11 1 general
3 11 1 vocation
4 9 2 academic
5 9 2 general
6 9 2 vocation
7 159 3 academic
8 159 3 general
9 159 3 vocation
10 30 4 academic
indexes: 1, 1, 2
We go ahead and run a model via mlogit for later comparison.
= mlogit(prog ~ 1| write + ses, data = program_long)
fit_mlogit = coef(fit_mlogit)[c(1,5,7,3,2,6,8,4)] # reorder mlogit_coefs
Function
Multinomial model via maximum likelihood
<- function(par, X, y) {
multinom_ml = levels(y)
levs = levs[1] # reference level (category label 1)
ref
= y == ref
y0 = y == levs[2] # category 2
y1 = y == levs[3] # category 3
y2
= matrix(par, ncol = 2)
beta
# more like mnlogit package depiction in its function
# V1 = X %*% beta[ ,1]
# V2 = X %*% beta[ ,2]
# ll = sum(-log(1 + exp(V1) + exp(V2))) + sum(V1[y1], V2[y2])
= X %*% beta # a vectorized approach
V = 1 / (1 + rowSums(exp(V))) # reference group probabilities
baseProbVec
= sum(log(baseProbVec)) + crossprod(c(V), c(y1, y2))
loglik
loglik }
= optim(
fit runif(8,-.1, .1),
multinom_ml,X = model.matrix(prog ~ ses + write, data = program),
y = program$prog,
control = list(
maxit = 1000,
reltol = 1e-12,
ndeps = rep(1e-8, 8),
trace = TRUE,
fnscale = -1,
type = 3
),method = 'BFGS'
)
initial value 638.963532
iter 10 value 180.008322
final value 179.981726
converged
# fit$par
Comparison
An initial comparison.
fit_coefs | mlogit_coefs | |
---|---|---|
(Intercept):general | 2.8522 | 2.8522 |
sesmiddle:general | -0.5333 | -0.5333 |
seshigh:general | -1.1628 | -1.1628 |
write:general | -0.0579 | -0.0579 |
(Intercept):vocation | 5.2182 | 5.2182 |
sesmiddle:vocation | 0.2914 | 0.2914 |
seshigh:vocation | -0.9827 | -0.9827 |
write:vocation | -0.1136 | -0.1136 |
The following uses dmultinom for the likelihood, similar to other modeling demonstrations in this document.
= model.matrix(prog ~ ses + write, data = program)
X = program$prog
y = matrix(fit$par, ncol = 2)
pars = X %*% pars
V = 1 / (1+rowSums(exp(V)))
acadprob = exp(V) * matrix(rep(acadprob, 2), ncol = 2)
fitnonacad = cbind(acadprob, fitnonacad)
fits = model.matrix( ~ -1 + prog, data = program) yind
# because dmultinom can't take matrix for prob
= 0
ll
for (i in 1:200){
= ll + dmultinom(yind[i, ], size = 1, prob = fits[i, ], log = TRUE)
ll
}
ll
[1] -179.9817
$value fit
[1] -179.9817
logLik(fit_mlogit)
'log Lik.' -179.9817 (df=8)
Alternative specific and constant variables
Now we add alternative specific and alternative constant variables to the
previous individual specific covariates.. In this example, price
is
alternative invariant (Z
) income
is individual/alternative specific (X
),
and catch
is alternative specific (Y
).
We can use the fish
data from the mnlogit
package.
library(mnlogit) # note we are now using mnlogit
data(Fish)
head(Fish)
mode income alt price catch chid
1.beach FALSE 7083.332 beach 157.930 0.0678 1
1.boat FALSE 7083.332 boat 157.930 0.2601 1
1.charter TRUE 7083.332 charter 182.930 0.5391 1
1.pier FALSE 7083.332 pier 157.930 0.0503 1
2.beach FALSE 1250.000 beach 15.114 0.1049 2
2.boat FALSE 1250.000 boat 10.534 0.1574 2
= formula(mode ~ price | income | catch)
fm
= mnlogit(fm, Fish)
fit_mnlogit # fit_mnlogit = mlogit(fm, Fish)
# summary(fit_mnlogit)
The likelihood function.
<- function(par, X, Y, Z, respVec, choice) {
multinom_ml
# Args-
# X dim nrow(Fish)/K x p + 1 (intercept)
# Z, Y nrow(N); Y has alt specific coefs; then for Z ref group dropped so nrow = nrow*(K-1)/K
# for ll everything through previous X the same
# then calc probmat for Y and Z, add to X probmat, and add to base
= sum(choice)
N = length(unique(respVec))
K = levels(respVec)
levs
= matrix(par[1:6], ncol = K-1)
xpar = matrix(par[7:10], ncol = K)
ypar = matrix(par[length(par)], ncol = 1)
zpar
# Calc X
= X %*% xpar
Vx
# Calc Y (mnlogit finds N x 1 results by going through 1:N, N+1:N*2 etc; then
# makes 1 vector, then subtracts the first 1:N from whole vector, then makes
# Nxk-1 matrix with N+1:end values (as 1:N are just zero)); creating the
# vector and rebuilding the matrix is unnecessary though
= sapply(1:K, function(alt)
Vy == levs[alt], , drop = FALSE] %*% ypar[alt])
Y[respVec
= Vy[,-1] - Vy[,1]
Vy
# Calc Z
= Z %*% zpar
Vz = matrix(Vz, ncol = 3)
Vz
# all Vs must fit into N x K -1 matrix where N is nobs (i.e. individuals)
= Vx + Vy + Vz
V
= crossprod(c(V), choice[-(1:N)])
ll0 <- 1 / (1 + rowSums(exp(V)))
baseProbVec = sum(log(baseProbVec)) + ll0
loglik
loglik
# note fitted values via
# fitnonref = exp(V) * matrix(rep(baseProbVec, K-1), ncol = K-1)
# fitref = 1-rowSums(fitnonref)
# fits = cbind(fitref, fitnonref)
}
= runif(11, -.1, .1)
inits = mnlogit(fm, Fish)$model # this data already ordered! mdat
As X
has a constant value across alternatives, the coefficients regard the selection of the alternative relative to reference.
= cbind(1, mdat[mdat$`_Alt_Indx_` == 'beach', 'income'])
X dim(X)
[1] 1182 2
head(X)
[,1] [,2]
[1,] 1 7083.332
[2,] 1 1250.000
[3,] 1 3750.000
[4,] 1 2083.333
[5,] 1 4583.332
[6,] 1 4583.332
Y
will use the complete data to start. Coefficients will be differences from the reference alternative coefficient.
= as.matrix(mdat[, 'catch', drop = FALSE])
Y dim(Y)
[1] 4728 1
Z
are difference scores from reference group.
= as.matrix(mdat[mdat$`_Alt_Indx_` != 'beach', 'price', drop = FALSE])
Z = Z - mdat[mdat$`_Alt_Indx_` == 'beach', 'price']
Z dim(Z)
[1] 3546 1
= mdat$`_Alt_Indx_` # first 10 should be 0 0 1 0 1 0 0 0 1 1 after beach dropped respVec
multinom_ml(inits, X, Y, Z, respVec, choice = mdat$mode)
[,1]
[1,] -162384.5
= optim(
fit par = rep(0, 11),
multinom_ml,X = X,
Y = Y,
Z = Z,
respVec = respVec,
choice = mdat$mode,
control = list(
maxit = 1000,
reltol = 1e-12,
ndeps = rep(1e-8, 11),
trace = TRUE,
fnscale = -1,
type = 3
),method = 'BFGS'
)
initial value 1638.599935
iter 10 value 1253.603448
iter 20 value 1199.143447
final value 1199.143445
converged
Comparison
Compare fits.
fit_coefs | mnlogit_coefs |
---|---|
0.842 | 0.842 |
0.000 | 0.000 |
2.155 | 2.155 |
0.000 | 0.000 |
1.043 | 1.043 |
0.000 | 0.000 |
3.118 | 3.118 |
2.542 | 2.542 |
0.759 | 0.759 |
2.851 | 2.851 |
-0.025 | -0.025 |
fit_ll | mnlogit_ll |
---|---|
-1199.143 | -1199.143 |
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/multinomial.R