This vignette is intended to show the wide variety of
lme4::lmer
models that can be handled by {equatiomatic}.
The output uses the notation from Gelman and Hill. If
you notice any errors in the notation, please file an
issue. Similarly, if you try to use {equatiomatic} with an lmer
model and end up with an error (either in notation or code) I would
really appreciate if you could file an issue with a
reproducible example.
This vignette displays many of the models that are covered by the package tests. It also illustrates many of the features of {equatiomatic}, including
A basic two-level unconditional model:
mathi∼N(αj[i],σ2)αj∼N(μαj,σ2αj), for sch.id j = 1,…,J
And a model with multiple levels:
um_long3 <- lmer(score ~ 1 + (1 | sid) + (1 | school) + (1 | district),
data = sim_longitudinal
)
#> boundary (singular) fit: see help('isSingular')
extract_eq(um_long3)
scorei∼N(αj[i],k[i],l[i],σ2)αj∼N(μαj,σ2αj), for sid j = 1,…,Jαk∼N(μαk,σ2αk), for school k = 1,…,Kαl∼N(μαl,σ2αl), for district l = 1,…,L
mathi∼N(μ,σ2)μ=αj[i]+β1(female)+β2(ses)+β3(minority)αj∼N(μαj,σ2αj), for sch.id j = 1,…,J
Note that in the above, the mean structure at level 1 is broken out
into a separate line. You can override this with the
mean_separate
argument.
mathi∼N(αj[i]+β1(female)+β2(ses)+β3(minority),σ2)αj∼N(μαj,σ2αj), for sch.id j = 1,…,J
Similarly, just like with standard lm
models, you can
specify wrapping, and how many terms per line
mathi∼N(μ,σ2)μ=αj[i]+β1(female) +β2(ses)+β3(minority)αj∼N(μαj,σ2αj), for sch.id j = 1,…,J
And one more example with multiple levels
lev1_long <- lmer(score ~ wave + (1 | sid) + (1 | school) + (1 | district),
data = sim_longitudinal
)
#> boundary (singular) fit: see help('isSingular')
extract_eq(lev1_long)
scorei∼N(αj[i],k[i],l[i]+β1(wave),σ2)αj∼N(μαj,σ2αj), for sid j = 1,…,Jαk∼N(μαk,σ2αk), for school k = 1,…,Kαl∼N(μαl,σ2αl), for district l = 1,…,L
This should generally work regardless of the complexity.
mathi∼N(μ,σ2)μ=αj[i]+β1(female)+β2(ses)+β3j[i](minority)(αjβ3j)∼N((μαjμβ3j),(σ2αjραjβ3jρβ3jαjσ2β3j)), for sch.id j = 1,…,J
Notice that it correctly parses which variable is randomly varying here. We can also make it more complicated.
mathi∼N(μ,σ2)μ=αj[i]+β1j[i](female)+β2j[i](ses)+β3(minority)(αjβ1jβ2j)∼N((μαjμβ1jμβ2j),(σ2αjραjβ1jραjβ2jρβ1jαjσ2β1jρβ1jβ2jρβ2jαjρβ2jβ1jσ2β2j)), for sch.id j = 1,…,J
And even really complicated. Note the model below gives a warning.
hsb3 <- lmer(
math ~ female * ses + minority +
(ses * female + minority | sch.id),
hsb
)
#> boundary (singular) fit: see help('isSingular')
extract_eq(hsb3)
mathi∼N(μ,σ2)μ=αj[i]+β1j[i](female)+β2j[i](ses)+β3j[i](minority)+β4j[i](female×ses)(αjβ1jβ2jβ3jβ4j)∼N((μαjμβ1jμβ2jμβ3jμβ4j),(σ2αjραjβ1jραjβ2jραjβ3jραjβ4jρβ1jαjσ2β1jρβ1jβ2jρβ1jβ3jρβ1jβ4jρβ2jαjρβ2jβ1jσ2β2jρβ2jβ3jρβ2jβ4jρβ3jαjρβ3jβ1jρβ3jβ2jσ2β3jρβ3jβ4jρβ4jαjρβ4jβ1jρβ4jβ2jρβ4jβ3jσ2β4j)), for sch.id j = 1,…,J
In the sim_longitudinal
data that comes with the
package, the only level 1 predictor is wave
. The
group
and treatement
variables are at the
student level (level 2) and prop_low
is at the school
level. Let’s also add a district level variable (just the average score
for each district).
# calculate district means
dist_mean <- tapply(
sim_longitudinal$score,
sim_longitudinal$district,
mean
)
# put them in a df to merge
dist_mean <- data.frame(
district = names(dist_mean),
dist_mean = dist_mean
)
# create a new df with dist_mean added
d <- merge(sim_longitudinal, dist_mean, by = "district")
Now we can fit a model that should have predictors at every level.
We’ll allow wave
to vary randomly at each level too.
group_preds_m1 <- lmer(score ~ wave + group + treatment + prop_low + dist_mean +
(wave | sid) + (wave | school) + (wave | district),
data = d
)
#> boundary (singular) fit: see help('isSingular')
extract_eq(group_preds_m1)
scorei∼N(αj[i],k[i],l[i]+β1j[i],k[i],l[i](wave),σ2)(αjβ1j)∼N((γα0+γα1(grouplow)+γα2(groupmedium)+γα3(treatment1)μβ1j),(σ2αjραjβ1jρβ1jαjσ2β1j)), for sid j = 1,…,J(αkβ1k)∼N((γα0+γα1(prop_low)μβ1k),(σ2αkραkβ1kρβ1kαkσ2β1k)), for school k = 1,…,K(αlβ1l)∼N((γα0+γα1(dist_mean)μβ1l),(σ2αlραlβ1lρβ1lαlσ2β1l)), for district l = 1,…,L
Interactions with multilevel models can be tricky because they can be within or across levels, and the notation changes depending on whether the random effect for the lower level (in the interaction term) is specified as varying randomly within the given level. Luckily, {equatiomatic} handles all of this for you.
First, let’s look at a model with interactions only at level 1.
mathi∼N(μ,σ2)μ=αj[i]+β1(minority)+β2(female)+β3(female×minority)αj∼N(μαj,σ2αj), for sch.id j = 1,…,J
And now an interaction at only level 2
mathi∼N(αj[i],σ2)αj∼N(γα0+γα1(meanses)+γα2(sector)+γα3(meanses×sector),σ2αj), for sch.id j = 1,…,J
But more complicated are cross level interactions. Here’s a quick example.
cl_long1 <- lmer(score ~ wave * treatment + (wave | sid) + (1 | school) +
(1 | district),
data = sim_longitudinal
)
extract_eq(cl_long1)
scorei∼N(αj[i],k[i],l[i]+β1j[i](wave),σ2)(αjβ1j)∼N((γα0+γα1(treatment1)γβ10+γβ11(treatment1)),(σ2αjραjβ1jρβ1jαjσ2β1j)), for sid j = 1,…,Jαk∼N(μαk,σ2αk), for school k = 1,…,Kαl∼N(μαl,σ2αl), for district l = 1,…,L
Note that the treatement
variable is shown as a
predictor of the level 1 intercept and the level 1 slope. But if the
slope is not randomly varying at this level, then the notation has to
change.
cl_long2 <- lmer(score ~ wave * treatment + (1 | sid) + (1 | school) +
(1 | district),
data = sim_longitudinal
)
#> boundary (singular) fit: see help('isSingular')
extract_eq(cl_long2)
scorei∼N(αj[i],k[i],l[i]+β1(wave),σ2)αj∼N(γα0+γα1(treatment1)+γα2(treatment1×wave),σ2αj), for sid j = 1,…,Jαk∼N(μαk,σ2αk), for school k = 1,…,Kαl∼N(μαl,σ2αl), for district l = 1,…,L
This works even for really complicated models, including three-way interactions that contain a cross-level interaction. For example
cl_long3 <- lmer(
score ~ wave * group * treatment + wave * prop_low * treatment +
(wave | sid) + (wave | school) +
(wave + treatment | district),
sim_longitudinal
)
#> boundary (singular) fit: see help('isSingular')
extract_eq(cl_long3)
scorei∼N(αj[i],k[i],l[i]+β1j[i],k[i],l[i](wave),σ2)(αjβ1j)∼N((γα0+γα1(grouplow)+γα2(groupmedium)+γα3l[i](treatment1)+γα4(grouplow×treatment1)+γα5(groupmedium×treatment1)γβ10+γβ11(grouplow)+γβ12(groupmedium)+γβ13(treatment1)+γβ14(grouplow×treatment1)+γβ15(groupmedium×treatment1)),(σ2αjραjβ1jρβ1jαjσ2β1j)), for sid j = 1,…,J(αkβ1k)∼N((γα0+γα1(prop_low)+γα2(prop_low×treatment1)γβ10+γβ11(prop_low)+γβ11(prop_low×treatment1)),(σ2αkραkβ1kρβ1kαkσ2β1k)), for school k = 1,…,K(αlβ1lγ3l)∼N((μαlμβ1lμγ3l),(σ2αlραlβ1lραlγ3lρβ1lαlσ2β1lρβ1lγ3lργ3lαlργ3lβ1lσ2γ3l)), for district l = 1,…,L
Finally, there is some support for alternative variance-covariance specifications. For example, you may want to specify a model with only the variance terms estimated, and not the covariances.
hsb_varsonly <- lmer(math ~ minority * female + (minority * female || sch.id),
data = hsb
)
#> boundary (singular) fit: see help('isSingular')
extract_eq(hsb_varsonly)
mathi∼N(μ,σ2)μ=αj[i]+β1j[i](minority)+β2j[i](female)+β3j[i](female×minority)(αjβ1jβ2jβ3j)∼N((μαjμβ1jμβ2jμβ3j),(σ2αj0000σ2β1j0000σ2β2j0000σ2β3j)), for sch.id j = 1,…,J
Or maybe you want to group by the same thing multiple times. In this exact model I don’t think this makes any sense, but there are cases where it can. Note that, again, this model produces a warning.
hsb_doublegroup <- lmer(math ~ minority * female +
(minority | sch.id) + (female | sch.id),
data = hsb
)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.00320004 (tol = 0.002, component 1)
extract_eq(hsb_doublegroup)
mathi∼N(μ,σ2)μ=αj[i],k[i]+β1j[i](minority)+β2k[i](female)+β3(female×minority)(αjβ1j)∼N((μαjμβ1j),(σ2αjραjβ1jρβ1jαjσ2β1j)), for sch.id j = 1,…,J(αkβ2k)∼N((μαkμβ2k),(σ2αkραkβ2kρβ2kαkσ2β2k)), for sch.id.1 k = 1,…,K
And finally, you can have a mix of different things and it should generally still work.
long_mixed_ranef <- lmer(
score ~ wave +
(wave || sid) + (wave | school) + (1 | school) + (wave || district),
sim_longitudinal
)
#> boundary (singular) fit: see help('isSingular')
extract_eq(long_mixed_ranef)
scorei∼N(αj[i],k[i],l[i],m[i]+β1j[i],k[i],m[i](wave),σ2)(αjβ1j)∼N((μαjμβ1j),(σ2αj00σ2β1j)), for sid j = 1,…,J(αkβ1k)∼N((μαkμβ1k),(σ2αkραkβ1kρβ1kαkσ2β1k)), for school k = 1,…,Kαl∼N(μαl,σ2αl), for school.1 l = 1,…,L(αmβ1m)∼N((μαmμβ1m),(σ2αm00σ2β1m)), for district m = 1,…,M
With that said, this is the part of the code that I would consider the most “experimental” at this point, so please do reach out if you have issues or run into use cases that are not supported.
Because of the multilevel nature of the notation used by {equatiomatic}, tracking the estimated coefficients can at times be a little difficult. To help with this, we keep the Greek notation with the estimated coefficients as subscripts. For example, let’s extract the equation from our model with group-level predictors.
^scorei∼N(−20.6αj[i],k[i],l[i]+0.17β1j[i],k[i],l[i](wave),σ2)(αjβ1j)∼N((−5.23γα1(grouplow)−4.03γα2(groupmedium)−1.97γα3(treatment1)0),(9.29−0.2−0.20.29)), for sid j = 1,…,J(αkβ1k)∼N((12.81γα1(prop_low)0),(2.84110)), for school k = 1,…,K(αlβ1l)∼N((1.21γα1(dist_mean)0),(0−1−10)), for district l = 1,…,L
There are still a few things I’m planning to implement that I haven’t quite gotten around to yet. These include
lm
extraction, including intercept
, and
raw_tex
Additionally, the Greek notation as subscripts may not always be desirable, particularly if the model is fairly simple. Future developments will likely make this optional.
The range of models that you can fit with lme4::lmer
is
huge, but I hope this will handle a wide range of models. I also
recognize that some users may want a different notation for the models.
At this point, I’m not planning to implement other notations, but that
could change if there’s enough demand for it. I’m open to any/all
suggestions for how to improve the package, and would love pull
requests to help support the package, big or small.