Bayesian IRT
The following shows some code demonstration for one through four parameter IRT models, though will only extensively explore the first two. You can learn more about IRT models in general in my structural equation modeling document.
One Parameter IRT
Data Setup
This data set has the responses of 316 participants on 24 items of a questionnaire on verbal aggression. Other covariates are also provided. For simplicity I will focus on the four ‘DoShout’ items.
library(tidyverse)
data("VerbAgg", package = "lme4")
glimpse(VerbAgg)
Rows: 7,584
Columns: 9
$ Anger <int> 20, 11, 17, 21, 17, 21, 39, 21, 24, 16, 15, 18, 36, 22, 16, 18, 23, 16, 21, 25, 22, 15, 26, 13, 33, 17, 17, 22, 21, 17, 19, 18, 33, 19, 25, 17, 12, 14, 25, 22, 20, 25, 12, 16, 23, 19, 22, 15, 25, 35, 24…
$ Gender <fct> M, M, F, F, F, F, F, F, F, F, F, F, M, M, F, F, F, F, F, F, F, F, F, F, F, F, F, F, M, F, F, M, M, F, F, F, F, F, M, M, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, M, F, F, F, F, F, F, F, F, F…
$ item <fct> S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantCurse, S1WantC…
$ resp <ord> no, no, perhaps, perhaps, perhaps, yes, yes, no, no, yes, perhaps, yes, yes, yes, perhaps, perhaps, perhaps, perhaps, no, perhaps, perhaps, yes, yes, perhaps, no, no, yes, yes, perhaps, yes, yes, perhap…
$ id <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,…
$ btype <fct> curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse, curse,…
$ situ <fct> other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other, other,…
$ mode <fct> want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want, want…
$ r2 <fct> N, N, Y, Y, Y, Y, Y, N, N, Y, Y, Y, Y, Y, Y, Y, Y, Y, N, Y, Y, Y, Y, Y, N, N, Y, Y, Y, Y, Y, Y, N, Y, Y, Y, N, N, N, N, Y, N, Y, N, N, Y, Y, N, N, Y, N, N, Y, Y, N, Y, Y, Y, N, N, Y, Y, Y, Y, Y, N, Y, Y…
= VerbAgg %>%
verbagg_items filter(btype == 'shout', situ == 'self') %>%
select(id, item, r2)
head(verbagg_items)
id item r2
1 1 S3WantShout Y
2 2 S3WantShout N
3 3 S3WantShout N
4 4 S3WantShout N
5 5 S3WantShout N
6 6 S3WantShout Y
= verbagg_items %>%
verbagg_items_wide pivot_wider(id_cols = id, names_from = item, names_prefix = 'item_', values_from = r2)
head(verbagg_items_wide)
# A tibble: 6 x 5
id item_S3WantShout item_S4WantShout item_S3DoShout item_S4DoShout
<fct> <fct> <fct> <fct> <fct>
1 1 Y N N Y
2 2 N N N N
3 3 N N N N
4 4 N N N N
5 5 N N N N
6 6 Y N N N
While we often think of the data in ‘wide form’, with one row per person and multiple columns respective to each item, and the subsequent Stan code will use that, it is generally both tidier and more straightforward for modeling with the long format, where one can use standard mixed model approaches. r2
is the target variable of interest in that case.
In the long format, the model for a single person is as follows, where \(Z\) is the latent person (\(p\))score, and \(i\) is the \(i^{th}\) item.
\[\textrm{logit}(\pi) = \textrm{disc} (Z_p - \beta_i)\]
Another formulation is the following, and corresponds to what brms will use.
\[\textrm{logit}(\pi) = \beta_i + \textrm{disc}\cdot Z_p\]
Model Code
data {
int N; // Number of people
int J; // Number of items
int Y[N,J]; // Binary Target
}
transformed data{
}
parameters {
vector[J] difficulty; // Item difficulty
real<lower = 0> discrim; // Item discrimination (constant)
vector[N] Z; // Latent person ability
}
model {
matrix[N, J] lmat;
// priors
Z ~ normal(0, 1);
discrim ~ student_t(3, 0, 5);
difficulty ~ student_t(3, 0, 5);
for (j in 1:J){
lmat[,j] = discrim * (Z - difficulty[j]);
}
// likelihood
for (j in 1:J) Y[,j] ~ bernoulli_logit(lmat[,j]);
}
Estimation
First we create a Stan-friendly data list and then estimate the model. The following assumes a character string or file (bayes_irt1_model
) of the previous model code.
= apply(
verbagg_items_wide_mat as.matrix(verbagg_items_wide[, -1]) == 'Y',
2,
as.integer
)
=
stan_data list(
N = nrow(verbagg_items_wide_mat),
J = ncol(verbagg_items_wide_mat),
Y = verbagg_items_wide_mat
)
library(rstan)
= sampling(
fit_1pm
bayes_irt1_model,data = stan_data,
thin = 4
)
Comparison
Now we compare to brms. I use the author’s article as a guide for this model, and note again that it is following the second parameterization depicted above.
library(brms)
# half normal for variance parameter, full for coefficients
<-
prior_1pm prior("normal(0, 3)", class = "sd", group = "id") +
prior("normal(0, 3)", class = "b")
= brm(
brms_1pm ~ 0 + item + (1 | id),
r2 data = verbagg_items,
family = bernoulli,
prior = prior_1pm,
thin = 4,
cores = 4
)
If you want to compare to standard IRT in either parameterization, you can use the ltm package.
library(ltm)
= rasch(verbagg_items_wide_mat, IRT.param = FALSE)
irt_rasch_par1 = rasch(verbagg_items_wide_mat, IRT.param = TRUE) irt_rasch_par2
print(
fit_1pm,digits = 3,
par = c('discrim', 'difficulty'),
probs = c(.025, .5, 0.975)
)
Inference for Stan model: 0715d0cee215483765f01043abb55ea9.
4 chains, each with iter=2000; warmup=1000; thin=4;
post-warmup draws per chain=250, total post-warmup draws=1000.
mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
discrim 1.859 0.007 0.197 1.502 1.852 2.250 812 1.001
difficulty[1] 0.968 0.004 0.121 0.750 0.965 1.215 913 1.001
difficulty[2] 0.665 0.004 0.106 0.460 0.663 0.874 788 1.000
difficulty[3] 1.854 0.006 0.176 1.534 1.846 2.237 844 1.005
difficulty[4] 1.255 0.005 0.137 1.003 1.250 1.524 804 0.998
Samples were drawn using NUTS(diag_e) at Wed Nov 25 17:39:05 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
summary(brms_1pm)
Family: bernoulli
Links: mu = logit
Formula: r2 ~ 0 + item + (1 | id)
Data: verbagg_items (Number of observations: 1264)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 4;
total post-warmup samples = 1000
Group-Level Effects:
~id (Number of levels: 316)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.90 0.21 1.53 2.32 1.00 832 890
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
itemS3WantShout -1.79 0.22 -2.25 -1.37 1.00 947 883
itemS4WantShout -1.23 0.21 -1.66 -0.83 1.01 860 809
itemS3DoShout -3.43 0.31 -4.08 -2.86 1.00 907 914
itemS4DoShout -2.35 0.25 -2.86 -1.90 1.00 812 852
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
= fixef(brms_1pm)[,'Estimate']
brms_diff = VarCorr(brms_1pm)$id$sd[1]
brms_discrim = summary(fit_1pm, digits = 3, par = c('discrim', 'difficulty'))$summary[,'mean'] fit_params
After extracting, we can show either parameterization for either model. For example, brms item difficulties = our model -discrim*difficulties
.
# A tibble: 5 x 5
parma model brms model_par2 brms_par1
<chr> <dbl> <dbl> <dbl> <dbl>
1 discrim 1.86 1.90 1.86 1.90
2 difficulty[1] 0.968 -1.79 -1.80 0.942
3 difficulty[2] 0.665 -1.23 -1.24 0.647
4 difficulty[3] 1.85 -3.43 -3.45 1.81
5 difficulty[4] 1.26 -2.35 -2.33 1.23
Two Parameter IRT
Now we can try a two parameter model. Data setup is the same as before.
Model Code
data {
int N;
int J;
int Y[N, J];
}
parameters {
vector[J] difficulty;
vector<lower = 0>[J] discrim; // Now per-item discrimination
vector[N] Z;
}
model {
matrix[N, J] lmat;
// priors
Z ~ normal(0, 1);
discrim ~ student_t(3, 0, 5);
difficulty ~ student_t(3, 0, 5);
for (j in 1:J){
lmat[,j] = discrim[j] * (Z - difficulty[j]);
}
// likelihood
for (j in 1:J) Y[,j] ~ bernoulli_logit(lmat[,j]);
}
Estimation
First, our custom Stan model. The following assumes a character string or file (bayes_irt2_model
) of the previous model code.
library(rstan)
= sampling(
fit_2pm
bayes_irt2_model,data = stan_data,
thin = 4,
iter = 4000,
warmup = 3000,
cores = 4,
control = list(adapt_delta = .99)
)
Comparison
Now we compare to brms. I use the author’s article as a guide for this model, and note that it is following the second parameterization. Took a little over 30 seconds on my machine, though of course you may experience differently.
library(brms)
# half normal for variance parameter, full for coefficients
<-
prior_2pm prior("normal(0, 5)", class = "b", nlpar = "Z") +
prior("normal(0, 5)", class = "b", nlpar = "logdiscr") +
prior("constant(1)", class = "sd", group = "id", nlpar = "Z") +
prior("normal(0, 3)", class = "sd", group = "item", nlpar = "Z") +
prior("normal(0, 3)", class = "sd", group = "item", nlpar = "logdiscr")
= bf(
formula_2pm ~ exp(logdiscr) * Z,
r2 ~ 1 + (1 |i| item) + (1 | id),
Z ~ 1 + (1 |i| item),
logdiscr nl = TRUE
)
= brm(
brms_2pm
formula_2pm,data = verbagg_items,
family = bernoulli,
prior = prior_2pm,
thin = 4,
iter = 4000,
warmup = 3000,
cores = 4,
control = list(adapt_delta = .99, max_treedepth = 15)
)
print(
fit_2pm,digits = 3,
par = c('discrim', 'difficulty'),
probs = c(.025, .5, 0.975)
)
Inference for Stan model: 503e7ed2dc96cbd6b316fe17abc2a86f.
4 chains, each with iter=4000; warmup=3000; thin=4;
post-warmup draws per chain=250, total post-warmup draws=1000.
mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
discrim[1] 1.033 0.008 0.249 0.599 1.003 1.567 891 1.003
discrim[2] 3.026 0.098 1.630 1.603 2.632 7.436 279 1.000
discrim[3] 1.462 0.014 0.389 0.803 1.425 2.321 784 0.998
discrim[4] 4.461 0.158 3.688 1.882 3.609 12.287 544 1.002
difficulty[1] 1.420 0.011 0.314 0.946 1.376 2.207 866 0.998
difficulty[2] 0.595 0.004 0.106 0.401 0.589 0.815 738 1.000
difficulty[3] 2.196 0.017 0.464 1.595 2.115 3.144 703 0.998
difficulty[4] 1.034 0.006 0.129 0.820 1.021 1.311 513 1.003
Samples were drawn using NUTS(diag_e) at Wed Nov 25 17:40:25 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
summary(brms_2pm)
Family: bernoulli
Links: mu = logit
Formula: r2 ~ exp(logdiscr) * Z
Z ~ 1 + (1 | i | item) + (1 | id)
logdiscr ~ 1 + (1 | i | item)
Data: verbagg_items (Number of observations: 1264)
Samples: 4 chains, each with iter = 4000; warmup = 3000; thin = 4;
total post-warmup samples = 1000
Group-Level Effects:
~id (Number of levels: 316)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Z_Intercept) 1.00 0.00 1.00 1.00 1.00 1000 1000
~item (Number of levels: 4)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Z_Intercept) 1.09 0.73 0.33 3.07 1.00 706 816
sd(logdiscr_Intercept) 0.94 0.79 0.05 2.87 1.00 515 648
cor(Z_Intercept,logdiscr_Intercept) 0.18 0.50 -0.87 0.91 1.00 781 758
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Z_Intercept -1.22 0.67 -2.49 0.11 1.00 733 710
logdiscr_Intercept 0.66 0.56 -0.36 2.04 1.00 896 654
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
= coef(brms_2pm)$item[,,'Z_Intercept'][,'Estimate']
brms_diff = exp(coef(brms_2pm)$item[,,'logdiscr_Intercept'][,'Estimate'])
brms_discrim
= summary(fit_2pm, digits = 3, par = 'difficulty')$summary[,'mean']
fit_diff = summary(fit_2pm, digits = 3, par = 'discrim')$summary[,'mean'] fit_discrim
# A tibble: 8 x 3
parma model brms
<chr> <dbl> <dbl>
1 discrim[1] 1.03 1.15
2 discrim[2] 3.03 2.39
3 discrim[3] 1.46 1.62
4 discrim[4] 4.46 3.24
5 difficulty[1] 1.42 -1.30
6 difficulty[2] 0.595 -0.622
7 difficulty[3] 2.20 -1.98
8 difficulty[4] 1.03 -1.09
Here is the non-Bayesian demo if interested.
library(ltm)
= ltm(verbagg_items_wide_mat ~ z1, IRT.param = FALSE)
irt_2pm_par1 = ltm(verbagg_items_wide_mat ~ z1, IRT.param = TRUE)
irt_2pm_par2
coef(irt_2pm_par1)
coef(irt_2pm_par2)
Three Parameter IRT
For the three parameter model I only show the Stan code. This model adds a per-item guessing parameter, which serves as a lower bound, to the two parameter model.
data {
int N;
int J;
int Y[N,J];
}
parameters {
vector[J] difficulty;
vector<lower = 0>[J] discrim;
vector<lower = 0, upper = .25>[J] guess;
vector[N] Z;
}
model {
matrix[N, J] pmat;
// priors
Z ~ normal(0, 1);
discrim ~ student_t(3, 0, 5);
difficulty ~ student_t(3, 0, 5);
guess ~ beta(1, 19);
for (j in 1:J){
pmat[,j] = guess[j] + (1 - guess[j]) * inv_logit(discrim[j] * (Z - difficulty[j]));
}
// likelihood
for (j in 1:J) Y[,j] ~ bernoulli(pmat[,j]);
}
Four Parameter IRT
For the four parameter model I only show the Stan code. This model adds a per-item ceiling parameter, which serves as an upper bound, to the three parameter model.
data {
int N;
int J;
int Y[N,J];
}
parameters {
vector[J] difficulty;
vector<lower = 0>[J] discrim;
vector<lower = 0, upper = .25>[J] guess;
vector<lower = .95, upper = 1>[J] ceiling;
vector[N] Z;
}
model {
matrix[N, J] pmat;
// priors
Z ~ normal(0, 1);
discrim ~ student_t(3, 0, 5);
difficulty ~ student_t(3, 0, 5);
guess ~ beta(1, 19);
ceiling ~ beta(49, 1);
for (j in 1:J){
pmat[,j] = guess[j] + (ceiling[j] - guess[j]) * inv_logit(discrim[j] *
(Z - difficulty[j]));
}
// likelihood
for (j in 1:J) Y[,j] ~ bernoulli(pmat[,j]);
}
Source
Original code available at: https://github.com/m-clark/Miscellaneous-R-Code/tree/master/ModelFitting/Bayesian/StanBugsJags/IRT_models