% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stb.R
\name{stb.VCA}
\alias{stb.VCA}
\title{Simultaneous Tolerance Bounds on Residuals and Random Effects for 'VCA' Objects.}
\usage{
\method{stb}{VCA}(obj, term = NULL, mode = c("raw", "student", "standard",
  "pearson"), N = 5000, alpha = 0.05, algo = c("rank", "R", "C"),
  q.type = 2L, plot = TRUE, legend = TRUE, orient = 1, main1 = NULL,
  main2 = NULL, seed = NULL, type = 1, pb = TRUE, parallel = TRUE,
  Ncpu = 2, ...)
}
\arguments{
\item{obj}{(VCA) object}

\item{term}{(character, integer) specifying a type of residuals if one of c("conditional",
"marginal"), or, the name of a random term (one of obj$re.assign$terms). If 'term'
is a integer, it is interpreted as the i-th random term in 'obj$re.assign$terms'.}

\item{mode}{(character) string specifying a possible transformation of random effects or 
residuals (see \code{\link{residuals.VCA}} and \code{\link{ranef.VCA}}for details)}

\item{N}{(integer) specifying the number of simulated datasets \eqn{y_sim}}

\item{alpha}{(numeric) value 0 < alpha < 1 specifying the min. 100(1-alpha)\% coverage of the
simultaneous tolerance band (STB)}

\item{algo}{(character) (string) specifying the method to be used for constructing a 100(1-alpha)\% STB,
choose "rank" for the rank-based, "C" for a C-implementation of the quantile-based, and
"R" for an R-implentation of the quantile-based algorithm (see details).}

\item{q.type}{(integer) value specifying the quantile type to be used as offered in \code{\link{quantile}}
in case 'algo="R"'.    Whenever 'algo="C"', quantiles are computed according to SAS PCTLDEF5, 
which is identical to type 2. The rank-based algorithm does not employ quantiles.}

\item{plot}{(logical) TRUE = create 'stbVCA' object and plot it, FALSE = only create the 'stbVCA' object}

\item{legend}{(logical) TRUE = add legend to the plot(s)}

\item{orient}{(integer) in QQ-plot, specifying whether to plot expected values vs. observed values (1)
or observed vs. expected (2)}

\item{main1}{(character) string specifying an user-defined main-title of the 'type=1' plot (STB)}

\item{main2}{(character) string specifying an user-defined main-title of the 'type=2' plot (STI)}

\item{seed}{(integer) value used as seed for the RNG}

\item{type}{(integer) 1 = QQ-plot with simultaneous tolerance band (STB), 2 = residual plot with simultaneous 
tolerance interval (STI), 3 = both plot at once}

\item{pb}{(logical) TRUE = a text-based progress bar will display the simulation progress}

\item{parallel}{(logical) TRUE = parallel processing will be attempted on 'Ncpu' cores of the local machine.
   FALSE = no parallel processing applied, only in this case, 'pb' will have an effect, since this 
is only available for non-parallel processing.}

\item{Ncpu}{(integer) specifying the number of CPUs on which the parallelization will be carried out.
   In case that 'Ncup' is larger than the number of existing CPUs, the max. number of CPUs will be used
   instead. Note, that setting 'Ncpu' to the max. number available may not result in the min. time 
spent on computing.}

\item{...}{additional arguments passed to other methods}
}
\value{
(stbVCA) object, a list with all information needed to create the QQ-plot with ~100(1-alpha)\% STB.
}
\description{
Simulate \eqn{N}-times data incorporating the estimated variance-covariance
matrix of observations \eqn{y} and construct a 100(1-alpha)\% simultaneous tolerance band.
}
\details{
A Linear Mixed Models, noted in standard matrix notation, can be written as \eqn{y = Xb + Zg + e}, where
\eqn{y} is the column vector of observations, \eqn{X} and \eqn{Z} are design matrices assigning fixed (\eqn{b}),
respectively, random (\eqn{g}) effects to observations, and \eqn{e} is the column vector of residual errors.

Here, simulation is performed incorporating the variance-covariance matrix \eqn{V = ZGZ^{T}+R}{V = ZGZ'+R} of observations \eqn{y}. 
There are two types of random variates in a mixed model, random effects \eqn{g} and residual errors \eqn{e}. These
follow a multivariate normal distribution with expectation zero and covariance matrices \eqn{G} and \eqn{R}. See the 1st 
reference for a detailed description of the properties. 
Following Schuetzenmeister and Piepho (2012), a Cholesky decomposition \eqn{V = CC'} is applied to \eqn{V}, 
yielding the upper triangular matrix \eqn{C}, which can be used to simulate a new set of observations 
\eqn{y_{sim}=Cz}{y_sim = Cz}, where \eqn{z} is a vector of independent standard normal deviates of the same size as \eqn{y}.
Actually, \eqn{y_sim = C'z} is used here, because the Cholesky decomposition in \code{R} is defined as \eqn{V=C'C}. 
For each simulated dataset, random variates of interest ('term') are extracted, possibly transformed ('mode') and
stored in ordered form (order statistics) in a \eqn{N x n} matrix, \eqn{N} being the number of simulated datasets and
\eqn{n} being the number of random variates of interest. For each column of this matrix tolerance intervals 
are computed iteratively untill the joint coverage is as close to but >= 100(1-alpha)/% or the max. number of 
iterations is reached. This quantile-based algorithm is exact for \eqn{N --> Infinity}.

SAS-quantile definition PCTLDEF=5 is used in the fast C-implementation of the STB-algorithm (\code{\link{SASquantile}}),
i.e. in case \code{algo="C"}. One can compute and plot two types of plots (see argument 'type'). Simultaneous tolerance
bands (STB) are helpful in assessing the general distribution of a random variate, i.e. checking for departure from
the normality assumption. Outlying observations may also be detected using STB. Simultaneous tolerance intervals (STI)
are taylored for identification of extreme values (possible outliers). STI are a simplification of STB, where simultaneous
coverage is only required for extreme values of each simulation, i.e. an STB is constructed from the min and max values 
from all N simulations. This results in lower and upper bounds, which can be used in residuals plots for assessing outliers.

One can choose between 3 methods for constructing the 100(1-alpha)\% STB. The fastest one is the rank-based algorithm ("rank"), which
should only be applied for reasonably large number of simulations (rule of thumb: N>5000). For fewer simulations,
the quantile-based algorithm is recommended. It exists in two flavours, a native R-implementation ("R") and a pure C-implementation ("C").
Both can be applied using parallel-processing (see arguments 'parallel' and 'Ncpu'). Only the R-implementation allows to specify
a specific quantile-definition other than \code{type=2} of function \code{\link{quantile}}.
}
\examples{
\dontrun{
library(VCA)
data(dataEP05A2_1)
fit <- anovaVCA(y~day/run, dataEP05A2_1)
fit

# use studentized conditional residuals
stb.obj1 <- stb(fit, term="cond", mode="student", N=1000)

# plot it again
plot(stb.obj1)

# now request also plotting the corresponding residual plot
# capture additional computation results which are invisibly 
# returned
stb.obj1 <- plot(stb.obj1, type=3)

# use other type of legend in QQ-plot
plot(stb.obj1, stb.lpos="topleft")

# use random effects "day" and apply standardization
stb.obj2 <- stb(fit, term="day", mode="stand", N=1000)
# plot it again
plot(stb.obj2)

# more complex example
data(Orthodont)
Ortho <- Orthodont
Ortho$age2 <- Ortho$age - 11
Ortho$Subject <- factor(as.character(Ortho$Subject))
fit.Ortho <- anovaMM(distance~Sex+Sex:age2+(Subject)+(Subject):age2-1, Ortho)

# studentized conditional residuals
stb.cr.stud <- stb(fit.Ortho, term="cond", mode="stud", N=1000)

# same model fitted via REML (same covariance structure of random effects by
# constraining it to be diagonal)
fit.Ortho.reml1 <- remlMM(distance~Sex*age2+(Subject)*age2, Ortho, cov=FALSE)

# allow block-diagonal covariance structure of random effects due to non-zero
# correlation between intercept and slope of random regression part,
# not 'cov=TRUE' is the default
fit.Ortho.reml2 <- remlMM(distance~Sex*age2+(Subject)*age2, Ortho)
fit.Ortho.reml1
fit.Ortho.reml2

# covariance matrices of random effects 'G' differ
getMat(fit.Ortho.reml1, "G")[1:10, 1:10]
getMat(fit.Ortho.reml2, "G")[1:10, 1:10]

# therefore, (conditional) residuals differ
resid(fit.Ortho.reml1)
resid(fit.Ortho.reml2)

# therefore, STB differ

# studentized conditional residuals
system.time({
stb.cr.stud.reml1 <- stb(fit.Ortho.reml1, term="cond", mode="stud", 
                         N=5000, Ncpu=2, seed=11) })
system.time({
stb.cr.stud.reml2 <- stb(fit.Ortho.reml2, term="cond", mode="stud", 
                         N=5000, Ncpu=4, seed=11) })

# same seed-value should yield identical results
system.time({
stb.cr.stud.reml3 <- stb(fit.Ortho.reml2, term="cond", mode="stud", 
                         N=5000, Ncpu=4, seed=11) })

par(mfrow=c(1,2))
plot(stb.cr.stud.reml2)
plot(stb.cr.stud.reml3)

# both type of plots side-by-side
plot(stb.cr.stud.reml2, type=3)

# and enabling identification of points
# identified elements in the 1st plot will
# be automatically added to the 2nd one
plot(stb.cr.stud.reml2, type=3, pick=TRUE)

# raw "day" random effects
stb.re.subj <- stb(fit.Ortho, term="Subject", N=1000)

# identify points using the mouse
stb.re.subj <- plot(stb.re.subj, pick=TRUE, type=3)

# now click on points
}

}
\author{
Andre Schuetzenmeister \email{andre.schuetzenmeister@roche.com}
}
\references{
Schuetzenmeister, A. and Piepho, H.P. (2012). Residual analysis of linear mixed models using a simulation approach.
Computational Statistics and Data Analysis, 56, 1405-1416

Schuetzenmeister, A., Jensen, U., Piepho, H.P., 2012. Checking the assumptions of normality and homoscedasticity in 
the general linear model using diagnostic plots. Communications in Statistics-Simulation and Computation 41, 141-154.
}
\seealso{
\code{\link{getSTB}}, \code{\link{fastSTB}}, \code{\link{rankSTB}}

\code{\link{fastSTB}}
}

