%\VignetteIndexEntry{Examples of focused model comparison: skew-normal models} %\VignetteEngine{knitr::knitr} %\VignetteDepends{fic,sn} \documentclass[nojss,nofooter]{jss} \author{Christopher Jackson\\\email{chris.jackson@mrc-bsu.cam.ac.uk}} \title{Examples of focused model comparison: skew-normal models} \Plainauthor{Christopher Jackson, MRC Biostatistics Unit} \Abstract{ The following example (from \citet{claeskens:hjort:book}) illustrates how focused model comparison can be performed using the \pkg{fic} package in a situation where: \begin{itemize} \item a novel class of models is defined and fitted by custom \proglang{R} functions \item a simple model is extended in two different directions to define the models being compared \end{itemize} } \Keywords{models} \begin{document} \section{Skew-normal models} An outcome $y_i$ and a covariate $x_i$ are observed for individuals $i=1,\ldots,n$. Four different models are compared: two normal models with a constant variance, one without (1) and one with (2) a linear regression term, and two ``skew-normal'' models without (3) and with (4) the linear regression term. The skew-normal model is defined by an error term $\sigma\epsilon_i$, where the $\epsilon_i$ are independently distributed with a density $f(u|\lambda) = \lambda \Phi(u) ^{\lambda - 1} \phi(u)$. All four models are nested in the ``wide'' model (4). \begin{enumerate} \item[(1)] $y_i \sim N(\beta_0, \sigma^2) $ \item[(2)] $y_i \sim N(\beta_0 + \beta_1 x_i, \sigma^2) $ \item[(3)] $y_i = \beta_0 + \sigma\epsilon_i$ \item[(4)] $y_i = \beta_0 + \beta_1 x_i + \sigma\epsilon_i $ \end{enumerate} %%https://www-sciencedirect-com.ezp.lib.cam.ac.uk/science/article/pii/S0047259X08000341#b3 %% explains relation to standard skew normal param %% dsn also provided with sn package, in different parameterisation \section{Fitting the models in R} To implement this class of models in \proglang{R}, firstly we define the log density function of the general skew-normal model with mean, scale and skewness parameters $\mu,\sigma,\lambda$ indicated by arguments \code{mean}, \code{sigma} and \code{lambda}. This defines the distribution of $y_i$ in model (3) with mean $\mu_i = \beta_0$, and in (4) with $\mu_i = \beta_0 + \beta_1x_i$. Models (1) and (2) are defined by setting $\lambda=1$ in models (3) and (4) respectively. <<>>= ldsnorm <- function(x, mean, sd, lambda){ log(lambda) + (lambda-1)*pnorm(x, mean, sd, log.p=TRUE) + dnorm(x, mean, sd, log=TRUE) } @ The models are fitted to data from the Australian Institute of Sports \citep{cook1994introduction}, available as \code{ais} from the \pkg{sn} package \citep{sn}. The outcome $y_i$ is haematocrit level \code{Hc}, and the covariate is body mass index \code{BMI}. Observe the skewed distribution of the outcome and a mild association between the variables. <>= if (!require("sn")) stop("The `sn` package should be installed to run code in this vignette") data(ais) par(mfrow=c(1,2)) plot(density(ais$Hc), xlab="Haematocrit level", main="") plot(ais$BMI, ais$Hc, pch=19, xlab="Body mass index", ylab="Haematocrit level") @ The following defines the minus log likelihood for these data as a function of four parameters $\beta_0, \beta_1, \sigma$ and $\lambda$. <<>>= mloglik <- function(b0, b1, sd, lambda){ -sum(ldsnorm(ais$Hc, b0 + b1*ais$BMI, sd, lambda)) } @ Then to obtain the maximum likelihood estimates under models 1 to 4, \code{mloglik} is rewritten as a function of a single vector \code{par}, containing either 2, 3 or 4 parameters to be minimised over, depending on the model. Note here that models 1 and 3 have $\beta_2$ fixed at 0, and models 1 and 2 have $\lambda$ fixed at 1. The positive parameters $\sigma$ and $\lambda$ will be estimated by unconstrained maximisation on the log scale. These functions can be passed to the \code{nlm} function for optimisation. <<>>= fn1 <- function(par) mloglik(par[1], 0, exp(par[2]), 1) fn2 <- function(par) mloglik(par[1], par[2], exp(par[3]), 1) fn3 <- function(par) mloglik(par[1], 0, exp(par[2]), exp(par[3])) fn4 <- function(par) mloglik(par[1], par[2], exp(par[3]), exp(par[4])) @ \code{nlm} also requires plausible initial values for the parameters. A vector of these (\code{ini}) is obtained as follows by fitting model (2) with \code{lm} and extracting the coefficients with \code{coef} (for $\beta_0$ and $\beta_1$) and the residual standard deviation for $\log(\sigma)$. An initial value of 0 is used for $\log(\lambda)$. \footnote{Note these are the exact maximum likelihood estimates for model 2. The added value of \code{nlm} in fitting models 1 and 2, compared to simply using \code{lm}, is to conveniently provide the covariance matrix at the maximum likelihood estimates in the same form as for the skew normal models, making focused model comparison more convenient here. } <<>>= lm2 <- lm(Hc ~ BMI, data=ais) cf <- unname(coef(lm2)) ini <- c(beta0=cf[1], beta1=cf[2], logsigma=log(summary(lm2)$sigma), loglambda=0) @ The appropriate objective function (\code{fn1}--\code{fn4}) is then optimised for each model, starting from the given initial values\footnote{A warning message of \code{NA/Inf replaced by maximum positive value} can be ignored and is the result of \code{nlm} trying out extreme and implausible values on the way to finding the maximum likelihood.}. <>= opt1 <- nlm(fn1, ini[c("beta0","logsigma")], hessian=TRUE) opt2 <- nlm(fn2, ini[c("beta0","beta1","logsigma")], hessian=TRUE) opt3 <- nlm(fn3, ini[c("beta0","logsigma","loglambda")], hessian=TRUE) opt4 <- nlm(fn4, ini, hessian=TRUE) @ Finally, the information required by the \code{fic} function (the estimates and covariance matrices) is extracted from the \code{nlm} results and arranged into a list, for each of the four models. <<>>= mod1 <- list(est=opt1$estimate, vcov=solve(opt1$hessian) ) mod2 <- list(est=opt2$estimate, vcov=solve(opt2$hessian) ) mod3 <- list(est=opt3$estimate, vcov=solve(opt3$hessian) ) mod4 <- list(est=opt4$estimate, vcov=solve(opt4$hessian) ) @ \section{Focused model comparison} We now perform a focused comparison of the four models. Two alternative focuses are investigated: the mean and median outcome at a covariate value of interest. Expressions for the mean and median of the skew normal, in the parameterisation used here, are given by \citet{claeskens:hjort:book} and implemented in the following \proglang{R} functions: <<>>= mean_snorm <- function(mu, sigma, lambda){ f <- function(u){u*exp(ldsnorm(u, 0, 1, lambda))} mu + sigma * integrate(f, -Inf, Inf)$value } median_snorm <- function(mu, sigma, lambda){ mu + sigma * qnorm(0.5^(1/lambda)) } @ As described in the main \code{fic} package vignette, the focus function supplied to \code{fic} should have arguments defined by a vector \code{par} of parameters of the biggest model (in this case the four-parameter model 4), and optionally also a matrix of covariate values \code{X}, and should return the corresponding focus quantity. In this example, the two focus functions are <<>>= focus1 <- function(par, X){ mean_snorm(mu = par[1] + X %*% par[2], sigma = exp(par[3]), lambda = exp(par[4])) } focus2 <- function(par, X){ median_snorm(mu = par[1] + X %*% par[2], sigma = exp(par[3]), lambda = exp(par[4])) } @ The \code{inds} matrix, required by \code{fic}, is now constructed. Recall this indicates which parameters (columns) are included in each of the models (rows) being compared. The narrow model is in the first row, and the wide model in the last. The rows are given names to describe the models, so the output is easier to read. The functions \code{fns} required to extract the estimates and covariance matrix from the fitted model objects are then defined. Finally \code{fic} is called to compare the four models for focuses defined by the mean and median for average men and women, with covariate values defined by \code{med.bmi}. The \code{sub} argument is supplied to ensure the focus estimates are returned too -- note that \code{fic} can only automatically fit the submodels for standard \proglang{R} model classes such as \code{glm}. <<>>= inds <- rbind("intcpt" =c(1,0,1,0), "cov" =c(1,1,1,0), "intcpt_skew"=c(1,0,1,1), "cov_skew" =c(1,1,1,1)) fns <- list(coef=function(x)x$est, vcov=function(x)x$vcov, nobs=function(x)nrow(ais)) med.bmi <- rbind(male=23.56, female=21.82) library(fic) fmean <- fic(mod4, inds=inds, fns=fns, focus=focus1, X=med.bmi, FIC=TRUE, sub=list(mod1, mod2, mod3, mod4)) fmean fmed <- fic(mod4, inds=inds, fns=fns, focus=focus2, X=med.bmi, FIC=TRUE, sub=list(mod1, mod2, mod3, mod4)) fmed @ <>= ggplot_fic(fmean) ggplot_fic(fmed) @ For both the mean and the median, including the covariate is judged to be essential. Including the skewness is pointless for estimating the mean, since, as discussed by \citet{claeskens:hjort:book}, the skewness term provides no extra information. The utility of including the skewness is also doubtful for estimating the median too --- given that the covariate is included, the focus estimates and RMSE are very similar whether or not the skewness is also included. % for median: % can't reproduce calculation in book - looks strange on p165 % delta2 value given as 45.031 in book, but sqrt(n)(lamwide - 1) = % sqrt(nrow(ais))*(exp(mod4$est[4]) - 1) = 14.2*(1.95 - 1) = 13.5 \bibliography{fic} \end{document}