This vignette is aimed to give you a high-level overview of the types of models supported by the galamm package, and to point you to relevant vignettes where you can find more information.
Generalized additive latent and mixed models (GALAMMs) (Sørensen, Fjell, and Walhovd 2023) is an extension of generalized linear latent and mixed models (GLLAMMs) (Rabe-Hesketh, Skrondal, and Pickles 2004; Skrondal and Rabe-Hesketh 2004) which allows both observed responses and latent variables to depend smoothly on observed variables. Smoothly here means that the relationship is not assumed to follow a particular parametric form, e.g., as specified by a linear model. Instead, an a priori assumption is made that the relationship is smooth, and the model then attempts to learn the relationship from the data. GALAMM uses smoothing splines to obtain this, identically to how generalized additive models (GAMs) (Wood 2017) are estimated.
The GLLAMM framework contains many elements which are currently not implemented in the galamm package. This includes both nonparametric random effects and a large number of model families, e.g., for censored responses. If you need any of this, but not semiparametric estimation, the Stata based GLLAMM package is likely the place you should go. Conversely, galamm incorporates crossed random effects easily and efficiently, while these are hard to specify using GLLAMM.
GALAMMs are specified using three building blocks. First, \(n\) responses \(y_{1}, \dots, y_{n}\) are assumed independently distributed according to an exponential family with density
\[ f\left(y | \theta, \phi\right) = \exp \left( \frac{y\theta(\mu) - b\left(\theta(\mu)\right)}{\phi} + c\left(y, \phi\right) \right) \]
here \(\mu = g^{-1}(\nu)\) is the mean, \(g^{-1}(\cdot)\) is the inverse of link function \(g(\cdot)\), \(\nu\) is a “nonlinear predictor”, \(\phi\) is a dispersion parameter, and \(b(\cdot)\) and \(c(\cdot)\) are known functions. In contrast to what is assumed, e.g., by lme4 (Bates et al. 2015), the functions \(b(\cdot)\), \(c(\cdot)\), and \(g(\cdot)\) are allowed to vary between observations. That is, the observations can come from different members of the exponential family. The vignette on models with mixed response types describes this in detail.
Using canonical link functions, the response model simplifies to
\[ f\left(y | \nu, \phi\right) = \exp \left( \frac{y\nu - b\left(\nu\right)}{\phi} + c\left(y, \phi\right) \right) \]
Next, the nonlinear predictor, which corresponds to the measurement model in a classical structural equation model, is defined by
\[ \nu = \sum_{s=1}^{S} f_{s}\left(\mathbf{x}\right) + \sum_{l=2}^{L}\sum_{m=1}^{M_{l}} \eta_{m}^{(l)} \mathbf{z}^{(l)}_{m}{}^{'}\boldsymbol{\lambda}_{m}^{(l)}, \]
where \(\mathbf{x}\) are explanatory variables, \(f_{s}(\mathbf{x})\), \(s=1,\dots,S\) are smooth functions, \(\eta_{m}^{(l)}\) are latent variables varying at level \(l\), and \(\boldsymbol{\lambda}_{m}^{(l)}{}^{T} \mathbf{z}_{m}^{(l)}\) is the weighted sum of a vector of explanatory variables \(\mathbf{z}_{m}^{(l)}\) varying at level \(l\) and parameters \(\boldsymbol{\lambda}_{m}^{(l)}\). Let
\[\boldsymbol{\eta}^{(l)} = [\eta_{1}^{(l)}, \dots, \eta_{M_{l}}^{(l)}]^{T} \in \mathbb{R}^{M_{l}}\]
be the vector of all latent variables at level \(l\), and
\[\boldsymbol{\eta} = [\boldsymbol{\eta}^{(2)}, \dots, \boldsymbol{\eta}^{(L)}]^{T} \in \mathbb{R}^{M}\]
the vector of all latent variables belonging to a given level-2 unit, where \(M = \sum_{l=2}^{L} M_{l}\). The word “level” is here used to denote a grouping level; they are not necessarily hierarchical.
The structural model specifies how the latent variables are related to each other and to observed variables, and is given by
\[ \boldsymbol{\eta} = \mathbf{B}\boldsymbol{\eta} + \mathbf{h}\left(\mathbf{w}\right) + \boldsymbol{\zeta} \]
where \(\mathbf{B}\) is an \(M \times M\) matrix of regression coefficients for regression among latent variables and \(\mathbf{w} \in \mathbb{R}^{Q}\) is a vector of \(Q\) predictors for the latent variables. \(\mathbf{h}(\mathbf{w}) = [\mathbf{h}_{2}(\mathbf{w}), \dots, \mathbf{h}_{L}(\mathbf{w})] \in \mathbb{R}^{M}\) is a vector of smooth functions whose components \(\mathbf{h}_{l}(\mathbf{w}) \in \mathbb{R}^{M_{l}}\) are vectors of functions predicting the latent variables varying at level \(l\), and depending on a subset of the elements \(\mathbf{w}\). \(\boldsymbol{\zeta}\) is a vector of normally distributed random effects, \(\boldsymbol{\zeta}^{(l)} \sim N(\mathbf{0}, \boldsymbol{\Psi}^{(l)})\) for \(l=2,\dots,L\), where \(\boldsymbol{\Psi}^{(l)} \in \mathbb{R}^{M_{l} \times M_{l}}\) is the covariance matrix of random effects at level \(l\). Defining the \(M \times M\) covariance matrix \(\boldsymbol{\Psi} = \text{diag}(\boldsymbol{\Psi}^{(2)}, \dots, \boldsymbol{\Psi}^{(L)})\), we also have \(\boldsymbol{\zeta} \sim N(\mathbf{0}, \boldsymbol{\Psi})\).
In Sørensen, Fjell, and Walhovd (2023) we show that any model specified as above can be transformed to a GLLAMM, which is essentially a generalized nonlinear mixed model. This transformation is rather complex, so we won’t spell it out here, but the key steps are:
In galamm we use the same transformations as the gamm4 package does.
In mixed model representation, the nonlinear predictor can be written on the form
\[ \boldsymbol{\nu} = \mathbf{X}(\boldsymbol{\lambda}, \mathbf{B}) \boldsymbol{\beta} + \mathbf{Z}(\boldsymbol{\lambda}, \mathbf{B}) \boldsymbol{\zeta} \]
where \(\mathbf{X}(\boldsymbol{\lambda}, \mathbf{B})\) is the regression matrix for fixed effects \(\boldsymbol{\beta}\) and \(\mathbf{Z}(\boldsymbol{\lambda}, \mathbf{B})\) is the regression matrix for random effects \(\boldsymbol{\zeta}\). In contrast to with generalized linear mixed models, however, both matrices will in general depend on factor loadings \(\boldsymbol{\lambda}\) and regression coefficients between latent variables \(\mathbf{B}\). Both of these are parameters that need to be estimated, and hence \(\mathbf{X}(\boldsymbol{\lambda}, \mathbf{B})\) and \(\boldsymbol{\beta}\) and \(\mathbf{Z}(\boldsymbol{\lambda}, \mathbf{B})\) need to be updated throughout the estimation process.
Plugging the nonlinear predictor into the structural model, we obtain the joint likelihood for the model. We then obtain the marginal likelihood by integrating over the random effects, yielding a marginal likelihood function of the form
\[ L\left(\boldsymbol{\beta}, \boldsymbol{\Lambda}, \boldsymbol{\Gamma}, \boldsymbol{\lambda}, \mathbf{B}, \boldsymbol{\phi}\right) = \left(2 \pi \phi_{1}\right)^{-r/2} \int_{\mathbb{R}^{r}} \exp\left( g\left(\boldsymbol{\beta}, \boldsymbol{\Lambda}, \boldsymbol{\Gamma}, \boldsymbol{\lambda}, \mathbf{B}, \boldsymbol{\phi}, \mathbf{u}\right) \right) \text{d} \mathbf{u} \]
where \(\mathbf{u}\) is a standardized version of \(\boldsymbol{\zeta}\). In order to evaluate the marginal likelihood at a given set of parameter values, we use the Laplace approximation combined with sparse matrix operations, extending Bates et al. (2015)’s algorithm for linear mixed models.
We obtain maximum marginal likelihood estimates by maximizing \(L\left(\boldsymbol{\beta}, \boldsymbol{\Lambda},
\boldsymbol{\Gamma}, \boldsymbol{\lambda}, \mathbf{B},
\boldsymbol{\phi}\right)\), subject to possible constraints,
e.g., that variances are non-negative. For this, we use the L-BFGS-B
algorithm implement in stats::optim
. The predicted values
of random effects, \(\widehat{\mathbf{u}}\) are obtained as
posterior modes at the final estimates.
To see how galamm is used in practice, take a look at the vignettes describing models with different components.