# Multinomial Regression

For more detail on these types of models, see my document. In general we can use multinomial models for multi-category target variables, or more generally, multi-count data.

## Standard (Categorical) Model

### Data Setup

First, lets get some data. 200 entering high school students make program choices: general program, vocational program, and academic program. We will be interested in their choice, using their writing score as a proxy for scholastic ability and their socioeconomic status, a categorical variable of low, middle, and high values.

library(haven)
library(tidyverse)
library(mlogit)

as_factor() %>%
mutate(prog = relevel(prog, ref = "academic"))

head(program[, 1:5])
# A tibble: 6 x 5
id female ses    schtyp prog
<dbl> <fct>  <fct>  <fct>  <fct>
1    45 female low    public vocation
2   108 male   middle public general
3    15 male   high   public vocation
4    67 male   low    public vocation
5   153 male   middle public vocation
6    51 female high   public general 
# convert to long form for mlogit
program_long = program %>%
select(id, prog, ses, write) %>%
mlogit.data(
data = ,
shape = 'wide',
choice = 'prog',
id.var = 'id'
)

head(program_long)
~~~~~~~
first 10 observations out of 600
~~~~~~~
id  prog    ses write chid      alt      idx
1   1 FALSE    low    44   11 academic  11:emic
2   1 FALSE    low    44   11  general  11:eral
3   1  TRUE    low    44   11 vocation  11:tion
4   2 FALSE middle    41    9 academic   9:emic
5   2 FALSE middle    41    9  general   9:eral
6   2  TRUE middle    41    9 vocation   9:tion
7   3  TRUE    low    65  159 academic 159:emic
8   3 FALSE    low    65  159  general 159:eral
9   3 FALSE    low    65  159 vocation 159:tion
10  4  TRUE    low    50   30 academic  30:emic

~~~ indexes ~~~~
chid id      alt
2    11  1  general
3    11  1 vocation
5     9  2  general
6     9  2 vocation
8   159  3  general
9   159  3 vocation
indexes:  1, 1, 2 

We go ahead and run a model via mlogit for later comparison.

fit_mlogit   = mlogit(prog ~ 1| write + ses, data = program_long)
mlogit_coefs = coef(fit_mlogit)[c(1,5,7,3,2,6,8,4)]  # reorder

### Function

Multinomial model via maximum likelihood

multinom_ml <- function(par, X, y) {
levs = levels(y)
ref  = levs                # reference level (category label 1)

y0 = y == ref
y1 = y == levs             # category 2
y2 = y == levs             # category 3

beta = matrix(par, ncol = 2)

# more like mnlogit package depiction in its function
# V1 = X %*% beta[ ,1]
# V2 = X %*% beta[ ,2]
# ll = sum(-log(1 + exp(V1) + exp(V2))) + sum(V1[y1], V2[y2])

V = X %*% beta                           # a vectorized approach
baseProbVec = 1 / (1 + rowSums(exp(V)))  # reference group probabilities

loglik = sum(log(baseProbVec))  + crossprod(c(V), c(y1, y2))

loglik
}
fit = optim(
runif(8,-.1, .1),
multinom_ml,
X = model.matrix(prog ~ ses + write, data = program),
y = program$prog, control = list( maxit = 1000, reltol = 1e-12, ndeps = rep(1e-8, 8), trace = TRUE, fnscale = -1, type = 3 ), method = 'BFGS' ) initial value 638.963532 iter 10 value 180.008322 final value 179.981726 converged # fit$par

### Comparison

An initial comparison.

fit_coefs mlogit_coefs
(Intercept):general 2.8522 2.8522
sesmiddle:general -0.5333 -0.5333
seshigh:general -1.1628 -1.1628
write:general -0.0579 -0.0579
(Intercept):vocation 5.2182 5.2182
sesmiddle:vocation 0.2914 0.2914
seshigh:vocation -0.9827 -0.9827
write:vocation -0.1136 -0.1136

The following uses dmultinom for the likelihood, similar to other modeling demonstrations in this document.

X = model.matrix(prog ~ ses + write, data = program)
y = program$prog pars = matrix(fit$par, ncol = 2)
V = X %*% pars
yind = model.matrix( ~ -1 + prog, data = program)
# because dmultinom can't take matrix for prob
ll = 0

for (i in 1:200){
ll = ll + dmultinom(yind[i, ], size = 1, prob = fits[i, ], log  = TRUE)
}

ll
 -179.9817
fit$value  -179.9817 logLik(fit_mlogit) 'log Lik.' -179.9817 (df=8) ## Alternative specific and constant variables Now we add alternative specific and alternative constant variables to the previous individual specific covariates.. In this example, price is alternative invariant (Z) income is individual/alternative specific (X), and catch is alternative specific (Y). We can use the fish data from the mnlogit package. library(mnlogit) # note we are now using mnlogit data(Fish) head(Fish)  mode income alt price catch chid 1.beach FALSE 7083.332 beach 157.930 0.0678 1 1.boat FALSE 7083.332 boat 157.930 0.2601 1 1.charter TRUE 7083.332 charter 182.930 0.5391 1 1.pier FALSE 7083.332 pier 157.930 0.0503 1 2.beach FALSE 1250.000 beach 15.114 0.1049 2 2.boat FALSE 1250.000 boat 10.534 0.1574 2 fm = formula(mode ~ price | income | catch) fit_mnlogit = mnlogit(fm, Fish) # fit_mnlogit = mlogit(fm, Fish) # summary(fit_mnlogit) The likelihood function. multinom_ml <- function(par, X, Y, Z, respVec, choice) { # Args- # X dim nrow(Fish)/K x p + 1 (intercept) # Z, Y nrow(N); Y has alt specific coefs; then for Z ref group dropped so nrow = nrow*(K-1)/K # for ll everything through previous X the same # then calc probmat for Y and Z, add to X probmat, and add to base N = sum(choice) K = length(unique(respVec)) levs = levels(respVec) xpar = matrix(par[1:6], ncol = K-1) ypar = matrix(par[7:10], ncol = K) zpar = matrix(par[length(par)], ncol = 1) # Calc X Vx = X %*% xpar # Calc Y (mnlogit finds N x 1 results by going through 1:N, N+1:N*2 etc; then # makes 1 vector, then subtracts the first 1:N from whole vector, then makes # Nxk-1 matrix with N+1:end values (as 1:N are just zero)); creating the # vector and rebuilding the matrix is unnecessary though Vy = sapply(1:K, function(alt) Y[respVec == levs[alt], , drop = FALSE] %*% ypar[alt]) Vy = Vy[,-1] - Vy[,1] # Calc Z Vz = Z %*% zpar Vz = matrix(Vz, ncol = 3) # all Vs must fit into N x K -1 matrix where N is nobs (i.e. individuals) V = Vx + Vy + Vz ll0 = crossprod(c(V), choice[-(1:N)]) baseProbVec <- 1 / (1 + rowSums(exp(V))) loglik = sum(log(baseProbVec)) + ll0 loglik # note fitted values via # fitnonref = exp(V) * matrix(rep(baseProbVec, K-1), ncol = K-1) # fitref = 1-rowSums(fitnonref) # fits = cbind(fitref, fitnonref) } inits = runif(11, -.1, .1) mdat = mnlogit(fm, Fish)$model  # this data already ordered!

As X has a constant value across alternatives, the coefficients regard the selection of the alternative relative to reference.

X = cbind(1, mdat[mdat$_Alt_Indx_ == 'beach', 'income']) dim(X)  1182 2 head(X)  [,1] [,2] [1,] 1 7083.332 [2,] 1 1250.000 [3,] 1 3750.000 [4,] 1 2083.333 [5,] 1 4583.332 [6,] 1 4583.332 Y will use the complete data to start. Coefficients will be differences from the reference alternative coefficient. Y = as.matrix(mdat[, 'catch', drop = FALSE]) dim(Y)  4728 1 Z are difference scores from reference group. Z = as.matrix(mdat[mdat$_Alt_Indx_ != 'beach', 'price', drop = FALSE])
Z = Z - mdat[mdat$_Alt_Indx_ == 'beach', 'price'] dim(Z)  3546 1 respVec = mdat$_Alt_Indx_ # first 10 should be 0 0 1 0 1 0 0 0 1 1 after beach dropped
multinom_ml(inits, X, Y, Z, respVec, choice = mdat$mode)  [,1] [1,] -162384.5 fit = optim( par = rep(0, 11), multinom_ml, X = X, Y = Y, Z = Z, respVec = respVec, choice = mdat$mode,
control = list(
maxit   = 1000,
reltol  = 1e-12,
ndeps   = rep(1e-8, 11),
trace   = TRUE,
fnscale = -1,
type    = 3
),
method = 'BFGS'
)
initial  value 1638.599935
iter  10 value 1253.603448
iter  20 value 1199.143447
final  value 1199.143445
converged

### Comparison

Compare fits.

fit_coefs mnlogit_coefs
0.842 0.842
0.000 0.000
2.155 2.155
0.000 0.000
1.043 1.043
0.000 0.000
3.118 3.118
2.542 2.542
0.759 0.759
2.851 2.851
-0.025 -0.025
fit_ll mnlogit_ll
-1199.143 -1199.143

## Source

Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/multinomial.R