---
title: "Plotting Reference Intervals with ``superb``"
bibliography: "../inst/REFERENCES.bib"
csl: "../inst/apa-6th.csl"
output: 
  rmarkdown::html_vignette
description: >
  This vignette shows how to illustrate reference intervals
  using superb.
vignette: >
  %\VignetteIndexEntry{Plotting Reference Intervals with ``superb``}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---


```{r, echo = FALSE, warning=FALSE, message = FALSE, results = 'hide'}
cat("this will be hidden; use for general initializations.\n")
library(superb)
library(ggplot2)
options(superb.feedback = c('design','warnings') )
```

In this vignette, we show how to illustrate reference intervals
using ``superb``. We also show how the precision of this interval 
limits could be illustrated. 
See @nccls00 for a definition of reference intervals. In what follows,
we use the package ``referenceIntervals`` from @f20.

## What is a reference interval?

The reference interval is a description of the whole population. This
is in contrast with other error bar functions who are associated
with a specific statistics. For example, there exists the standard 
error of the mean, or the confidence interval of the median. The 
reference interval on the other hand, is an interval of all the
individuals in the sample, and it aims at illustrating the normal 
range of individuals in the studied population. It is therefore
much wider than say, the confidence interval of the mean because 
the mean is not interested with the whole sample, only with the 
center of the sample.

As an example, in a  pharmacological cross-over study, five drug
concentrations A through E are tested for their effect on 
glucose level (in mg/dL). The authors might be interested in knowing 
whether the glucose levels are different. In that case, a 
confidence interval for the five mean glucose level might be
useful.

However, the research may also want to indentify the "normal range"
of values of glucose levels with either of the concentrations.
Note tha "normal range" is a misnomer creating confusions with the
"normal distribution". Instead, we stick to the expression
``Reference interval`` (as promoted by many, e.g., @nccls00).

This interval is meant to capture a given fraction of the population
individuals. These individuals are variable, but this variability
is ok (I don't want to say *normal*). This range could be
"associated with good health" for example [@nccls00, p. 3].

Herein, we show how to illustrate ``reference intervals``. To make
the relation with the individuals more evident, we will also 
present these intervals in conjunction with alternative representations.



## Displaying a reference interval

To accomplish the basic computations, we rely on the excellent
``referenceIntervals`` from @f20. This package offers many more
options that will not be explored herein. 

Before we begin, let's load the needed libraries...

```{r, eval=FALSE, message=FALSE, warning=FALSE}
library(ggplot2)            # for the graphing commands
library(superb)             # for superbPlot and GRD
library(referenceIntervals) # for computing reference intervals
```
```{r, eval=TRUE, message=FALSE, warning=FALSE, echo=FALSE}
# the above pretend that referenceInterval was installed, but it is not installed
# because its dependencies (gWidgets2tcltk, extremeValues) crashes travis-CI.com and shinyapps.io...
# I reproduce the relevant code here as is
library(ggplot2)            # for the graphing commands
library(superb)             # for superbPlot and GRD
library(boot)
#library(car)               # not working on Travis anymore; this is a nightmare...
library(stats)


### Power families:
basicPower <<- function(U,lambda, gamma=NULL) {
 if(!is.null(gamma)) basicPower(t(t(as.matrix(U) + gamma)), lambda) else{
 bp1 <- function(U,lambda){
  if(any(U[!is.na(U)] <= 0)) stop("First argument must be strictly positive.")
  if (abs(lambda) <= 1.e-6) log(U) else (U^lambda)
  }
  out <- U
  out <- if(is.matrix(out) | is.data.frame(out)){
    if(is.null(colnames(out))) colnames(out) <-
        paste("Z", 1:dim(out)[2],sep="")
    for (j in 1:ncol(out)) {out[, j] <- bp1(out[, j],lambda[j])
       colnames(out)[j] <- if(abs(lambda[j]) <= 1.e-6)
           paste("log(", colnames(out)[j],")", sep="") else
           paste(colnames(out)[j], round(lambda[j], 2), sep="^")}
    out}  else
    bp1(out, lambda)
  out}}

bcPower <<- function(U, lambda, jacobian.adjusted=FALSE, gamma=NULL) {
 if(!is.null(gamma)) bcPower(t(t(as.matrix(U) + gamma)), lambda, jacobian.adjusted) else{
 bc1 <- function(U, lambda){
  if(any(U[!is.na(U)] <= 0)) stop("First argument must be strictly positive.")
  z <- if (abs(lambda) <= 1.e-6) log(U) else ((U^lambda) - 1)/lambda
  if (jacobian.adjusted == TRUE) {
    z * (exp(mean(log(U), na.rm=TRUE)))^(1-lambda)} else z
  }
  out <- U
  out <- if(is.matrix(out) | is.data.frame(out)){
    if(is.null(colnames(out))) colnames(out) <-
        paste("Z", 1:dim(out)[2], sep="")
    for (j in 1:ncol(out)) {out[, j] <- bc1(out[, j], lambda[j]) }
    colnames(out) <- paste(colnames(out), round(lambda, 2), sep="^")
    out}  else
    bc1(out, lambda)
  out}}

yjPower <<- function(U, lambda, jacobian.adjusted=FALSE) {
 yj1 <- function(U, lambda){
  nonnegs <- U >= 0
  z <- rep(NA, length(U))
  z[which(nonnegs)] <- bcPower(U[which(nonnegs)]+1, lambda, jacobian.adjusted=FALSE)
  z[which(!nonnegs)] <- -bcPower(-U[which(!nonnegs)]+1, 2-lambda, jacobian.adjusted=FALSE)
  if (jacobian.adjusted == TRUE)
        z * (exp(mean(log((1 + abs(U))^(2 * nonnegs - 1)), na.rm=TRUE)))^(1 -
            lambda)
    else z
  }
  out <- U
  out <- if(is.matrix(out) | is.data.frame(out)){
    if(is.null(colnames(out))) colnames(out) <-
        paste("Z", 1:dim(out)[2], sep="")
    for (j in 1:ncol(out)) {out[, j] <- yj1(out[, j], lambda[j]) }
    colnames(out) <- paste(colnames(out), round(lambda, 2), sep="^")
    out}  else
    yj1(out, lambda)
  out}

powerTransform <<- function(object, ...) UseMethod("powerTransform")

powerTransform.default <<- function(object, family="bcPower", ...) {
   y <- object
   if(!inherits(y, "matrix") & !inherits(y, "data.frame")) {
       y <- matrix(y,ncol=1)
       colnames(y) <- c(paste(deparse(substitute(object))))}
   y <- na.omit(y)
   x <- rep(1, dim(y)[1])
   estimateTransform(x, y, NULL, family=family, ...)
   }

powerTransform.lm <<- function(object, family="bcPower", ...) {
    mf <- if(is.null(object$model))
            update(object, model=TRUE, method="model.frame")$model
            else object$model
    mt <- attr(mf, "terms")
        y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (is.null(w)) w <- rep(1, dim(mf)[1])
    if (is.empty.model(mt)) {
        x <- matrix(rep(1,dim(mf)[1]), ncol=1) }
    else {
        x <- model.matrix(mt, mf)   }
  estimateTransform(x, y, w, family=family, ...)
  }

powerTransform.formula <<- function(object, data, subset, weights,
              na.action, family="bcPower", ...) {
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("object", "data", "subset", "weights", "na.action"),
         names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- as.name("model.frame")
    names(mf)[which(names(mf)=="object")] <- "formula"
    mf <- eval(mf, parent.frame())
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (is.null(w)) w <- rep(1, dim(mf)[1])
    if (is.empty.model(mt)) {
        x <- matrix(rep(1, dim(mf)[1]), ncol=1) }
    else {
        x <- model.matrix(mt, mf)   }
  estimateTransform(x, y, w, family=family, ...)
  }

estimateTransform <<- function(X, Y, weights=NULL, family="bcPower", ...) {
  Y <- as.matrix(Y)
  switch(family,
         bcnPower = estimateTransform.bcnPower(X, Y, weights,  ...),
         estimateTransform.default(X, Y, weights, family, ...)
  )
}

# estimateTransform.default is renamed 'estimateTransform
estimateTransform.default <<- function(X, Y, weights=NULL,
                    family="bcPower", start=NULL, method="L-BFGS-B", ...) { 
  fam <- function (U, lambda, jacobian.adjusted = FALSE, gamma = NULL) {
    if (!is.null(gamma)) 
        bcPower(t(t(as.matrix(U) + gamma)), lambda, jacobian.adjusted)
    else {
        bc1 <- function(U, lambda) {
            if (any(U[!is.na(U)] <= 0)) 
                stop("First argument must be strictly positive.")
            z <- if (abs(lambda) <= 1e-06) 
                log(U)
            else ((U^lambda) - 1)/lambda
            if (jacobian.adjusted == TRUE) {
                z * (exp(mean(log(U), na.rm = TRUE)))^(1 - lambda)
            }
            else z
        }
        out <- U
        out <- if (is.matrix(out) | is.data.frame(out)) {
            if (is.null(colnames(out))) 
                colnames(out) <- paste("Z", 1:dim(out)[2], 
                  sep = "")
            for (j in 1:ncol(out)) {
                out[, j] <- bc1(out[, j], lambda[j])
            }
            colnames(out) <- paste(colnames(out), round(lambda, 
                2), sep = "^")
            out
        }
        else bc1(out, lambda)
        out
    }
  }
  Y <- as.matrix(Y) # coerces Y to be a matrix.
  X <- as.matrix(X) # coerces X to be a matrix.
  w <- if(is.null(weights)) 1 else sqrt(weights)
  nc <- dim(Y)[2]
  nr <- nrow(Y)
  xqr <- qr(w * X)
  llik <- function(lambda){
    (nr/2)*log(((nr - 1)/nr) *
                 det(var(qr.resid(xqr, w*fam(Y, lambda, j=TRUE, ...)))))
  }
  llik1d <- function(lambda,Y){
    (nr/2)*log(((nr - 1)/nr) * var(qr.resid(xqr, w*fam(Y, lambda, j=TRUE, ...))))
  }
  if (is.null(start)) {
    start <- rep(1, nc)
    for (j in 1:nc){
      res<- suppressWarnings(optimize(
        f = function(lambda) llik1d(lambda,Y[ , j, drop=FALSE]),
        lower=-3, upper=+3))
      start[j] <- res$minimum
    }
  }
  res <- optim(start, llik, hessian=TRUE, method=method,  ...)
  if(res$convergence != 0)
    warning(paste("Convergence failure: return code =", res$convergence))
  res$start<-start
  res$lambda <- res$par
  names(res$lambda) <-
    if (is.null(colnames(Y))) paste("Y", 1:dim(Y)[2], sep="")
  else colnames(Y)
  roundlam <- res$lambda
  stderr <- sqrt(diag(solve(res$hessian)))
  lamL <- roundlam - 1.96 * stderr
  lamU <- roundlam + 1.96 * stderr
  for (val in rev(c(1, 0, -1, .5, .33, -.5, -.33, 2, -2))) {
    sel <- lamL <= val & val <= lamU
    roundlam[sel] <- val
  }
  res$roundlam <- roundlam
  res$invHess <- solve(res$hessian)
  res$llik <- res$value
  res$par <- NULL
  res$family<-family
  res$xqr <- xqr
  res$y <- Y
  res$x <- as.matrix(X)
  res$weights <- weights
  res$family<-family
  class(res) <- "powerTransform"
  res
}

print.powerTransform <<- function(x, ...) {
   lambda <- x$lambda
   if (length(lambda) > 1) cat("Estimated transformation parameters \n")
   else cat("Estimated transformation parameter \n")
   print(x$lambda)
   invisible(x)}

summary.powerTransform <<- function(object,...){
    one <- 1==length(object$lambda)
    label <- paste(object$family,
       (if(one) "Transformation to Normality" else
                "Transformations to Multinormality"), "\n")
    lambda<-object$lambda
    roundlam <- round(object$roundlam, 2)
    stderr<-sqrt(diag(object$invHess))
    df<-length(lambda)
#    result <- cbind(lambda, roundlam, stderr, lambda - 1.96*stderr, lambda + 1.96*stderr)
    result <- cbind(lambda, roundlam, lambda - 1.96*stderr, lambda + 1.96*stderr)
    rownames(result)<-names(object$lambda)
#    colnames(result)<-c("Est Power", "Rnd Pwr", "Std Err", "Lwr bnd", "Upr Bnd")
    colnames(result)<-c("Est Power", "Rounded Pwr", "Wald Lwr Bnd", "Wald Upr Bnd")
    tests <- testTransform(object, 0)
    tests <- rbind(tests, testTransform(object, 1))
#    if ( !(all(object$roundlam==0) | all(object$roundlam==1) |
#        length(object$roundlam)==1 ))
#           tests <- rbind(tests, testTransform(object, object$roundlam))
    family<-object$family
    out <-  list(label=label, result=result, tests=tests,family=family)
    class(out) <- "summary.powerTransform"
    out
    }

print.summary.powerTransform <<- function(x, digits=4, ...) {
    n.trans <- nrow(x$result)
    cat(x$label)
    print(round(x$result, digits))
   if(!is.null(x$family)){
    if(x$family=="bcPower" || x$family=="bcnPower"){
      if (n.trans > 1) cat("\nLikelihood ratio test that transformation parameters are equal to 0\n (all log transformations)\n")
      else cat("\nLikelihood ratio test that transformation parameter is equal to 0\n (log transformation)\n")
      print(x$tests[1,])
      if (n.trans > 1) cat("\nLikelihood ratio test that no transformations are needed\n")
      else cat("\nLikelihood ratio test that no transformation is needed\n")
      print(x$tests[2,])
    }
     if(x$family=="yjPower"){
         if (n.trans > 1) cat("\n Likelihood ratio test that all transformation parameters are equal to 0\n")
         else cat("\n Likelihood ratio test that transformation parameter is equal to 0\n")
       print(x$tests[1,])
     }

   }else{
       if (n.trans > 1) cat("\nLikelihood ratio tests about transformation parameters \n")
       else cat("\nLikelihood ratio test about transformation parameter \n")
     print(x$tests)
   }
}

coef.powerTransform <<- function(object, round=FALSE, ...)
  if(round==TRUE) object$roundlam else object$lambda

vcov.powerTransform <<- function(object,...) {
  ans <- object$invHess
  rownames(ans) <- names(coef(object))
  colnames(ans) <- names(coef(object))
  ans}



horn.outliers <<- function (data)
{
#   This function implements Horn's algorithm for outlier detection using
#   Tukey's interquartile fences.

	boxcox = powerTransform(data);
	lambda = boxcox$lambda;
	transData = data^lambda;
    descriptives = summary(transData);
    Q1 = descriptives[[2]];
    Q3 = descriptives[[5]];
    IQR = Q3 - Q1;

	out = transData[transData <= (Q1 - 1.5*IQR) | transData >= (Q3 + 1.5*IQR)];
	sub = transData[transData > (Q1 - 1.5*IQR) & transData < (Q3 + 1.5*IQR)];

    return(list(outliers = out^(1/lambda), subset = sub^(1/lambda)));
}
refLimit <<- function(data, out.method = "horn", out.rm = FALSE, RI = "p", CI = "p",
					refConf = 0.95, limitConf = 0.90, bootStat = "basic"){

	cl = class(data);
	if(cl == "data.frame"){
		frameLabels = colnames(data);
		dname = deparse(substitute(data));
		result = lapply(data, singleRefLimit, dname, out.method, out.rm, RI, CI, refConf, limitConf, bootStat);
		for(i in 1:length(data)){
			result[[i]]$dname = frameLabels[i];
		}
		class(result) = "interval";
	}
	else{
		frameLabels = NULL;
		dname = deparse(substitute(data));
		result = singleRefLimit(data, dname, out.method, out.rm, RI, CI, refConf, limitConf, bootStat);
	}

	return(result);
}
singleRefLimit <<- function(data, dname = "default", out.method = "horn", out.rm = FALSE,
						RI = "p", CI = "p", refConf = 0.95, limitConf = 0.90, bootStat = "basic")
{
#	This function determines a reference interval from a vector of data samples.
#	The default is a parametric calculation, but other options include a non-parametric
#	calculation of reference interval with bootstrapped confidence intervals around the
#	limits, and also the robust algorithm for calculating the reference interval with
#	bootstrapped confidence intervals of the limits.

	if(out.method == "dixon"){
		output = dixon.outliers(data);
	}
	else if(out.method == "cook"){
		output = cook.outliers(data);
	}
	else if(out.method == "vanderLoo"){
		output = vanderLoo.outliers(data);
	}
	else{
		output = horn.outliers(data);
	}
	if(out.rm == TRUE){
		data = output$subset;
	}

  if(!bootStat %in% c("basic", "norm", "perc", "stud", "bca")) {
		bootStat = "basic";
	}

	outliers = output$outliers;
  n = length(data);
  mean = mean(data, na.rm = TRUE);
  sd = sd(data, na.rm = TRUE);
  norm = NULL;

#	Calculate a nonparametric reference interval.
    if(RI == "n"){

    	methodRI = "Reference Interval calculated nonparametrically";

        data = sort(data);
		holder = nonparRI(data, indices = 1:length(data), refConf);
        lowerRefLimit = holder[1];
        upperRefLimit = holder[2];
        if(CI == "p"){
        	CI = "n";
        }
    }

#	Calculate a reference interval using the robust algorithm method.
    if(RI == "r"){

    	methodRI = "Reference Interval calculated using Robust algorithm";

        holder = robust(data, 1:length(data), refConf);
        lowerRefLimit = holder[1];
        upperRefLimit = holder[2];
        CI = "boot";
    }

#	Calculate a reference interval parametrically, with parametric confidence interval
#	around the limits.
    if(RI == "p"){

#		http://www.statsdirect.com/help/parametric_methods/reference_range.htm
#		https://en.wikipedia.org/wiki/Reference_range#Confidence_interval_of_limit

		methodRI = "Reference Interval calculated parametrically";
		methodCI = "Confidence Intervals calculated parametrically";

		refZ = qnorm(1 - ((1 - refConf) / 2));
		limitZ = qnorm(1 - ((1 - limitConf) / 2));

		lowerRefLimit = mean - refZ * sd;
		upperRefLimit = mean + refZ * sd;
        se = sqrt(((sd^2)/n) + (((refZ^2)*(sd^2))/(2*n)));
        lowerRefLowLimit = lowerRefLimit - limitZ * se;
        lowerRefUpperLimit = lowerRefLimit + limitZ * se;
        upperRefLowLimit = upperRefLimit - limitZ * se;
        upperRefUpperLimit = upperRefLimit + limitZ * se;

        shap_normalcy = shapiro.test(data);
        shap_output = paste(c("Shapiro-Wilk: W = ", format(shap_normalcy$statistic,
        					digits = 6), ", p-value = ", format(shap_normalcy$p.value,
        					digits = 6)), collapse = "");
        ks_normalcy = suppressWarnings(ks.test(data, "pnorm", m = mean, sd = sd));
        ks_output = paste(c("Kolmorgorov-Smirnov: D = ", format(ks_normalcy$statistic,
        					digits = 6), ", p-value = ", format(ks_normalcy$p.value,
        					digits = 6)), collapse = "");
        if(shap_normalcy$p.value < 0.05 | ks_normalcy$p.value < 0.05){
        	norm = list(shap_output, ks_output);

        }
        else{
        	norm = list(shap_output, ks_output);
        }
    }

#	Calculate confidence interval around limits nonparametrically.
    if(CI == "n"){

    	if(n < 120){
    		cat("\nSample size too small for non-parametric confidence intervals,
    		bootstrapping instead\n");
    		CI = "boot";
    	}
    	else{

    		methodCI = "Confidence Intervals calculated nonparametrically";

    		ranks = nonparRanks[which(nonparRanks$SampleSize == n),];
  		  	lowerRefLowLimit = data[ranks$Lower];
    		lowerRefUpperLimit = data[ranks$Upper];
    		upperRefLowLimit = data[(n+1) - ranks$Upper];
    		upperRefUpperLimit = data[(n+1) - ranks$Lower];
		}
    }

#	Calculate bootstrapped confidence intervals around limits.
	if(CI == "boot" & (RI == "n" | RI == "r")){

		methodCI = "Confidence Intervals calculated by bootstrapping, R = 5000";

		if(RI == "n"){
			bootresult = boot::boot(data = data, statistic = nonparRI, refConf = refConf, R = 5000);
		}
		if(RI == "r"){
			bootresult = boot::boot(data = data, statistic = robust, refConf = refConf, R = 5000);
		}
    	bootresultlower = boot::boot.ci(bootresult, conf = limitConf, type=bootStat, index = c(1,2));
    	bootresultupper = boot::boot.ci(bootresult, conf = limitConf, type=bootStat, index = c(2,2));
			bootresultlength = length(bootresultlower[[4]]);
    	lowerRefLowLimit = bootresultlower[[4]][bootresultlength - 1];
    	lowerRefUpperLimit = bootresultlower[[4]][bootresultlength];
    	upperRefLowLimit = bootresultupper[[4]][bootresultlength - 1];
    	upperRefUpperLimit = bootresultupper[[4]][bootresultlength];
    }

    RVAL = list(size = n, dname = dname, out.method = out.method, out.rm = out.rm,
    			outliers = outliers, methodRI = methodRI, methodCI = methodCI,
    			norm = norm, refConf = refConf, limitConf = limitConf,
    			Ref_Int = c(lowerRefLimit = lowerRefLimit, upperRefLimit = upperRefLimit),
    			Conf_Int = c(lowerRefLowLimit = lowerRefLowLimit,
    						lowerRefUpperLimit = lowerRefUpperLimit,
        					upperRefLowLimit = upperRefLowLimit,
        					upperRefUpperLimit = upperRefUpperLimit));
    class(RVAL) = "interval";
    return(RVAL);
}

RI.mean <<- function(data, gamma = 0.95) {
    refLimit(data, refConf = gamma)$Ref_Int
}
ciloRI.mean <<- function(data, gamma = c(0.95, 0.90) ) {
    refLimit(data, refConf = gamma[1], limitConf = gamma[2] )$Conf_Int[1:2]
}
cihiRI.mean <<- function(data, gamma = c(0.95, 0.90) ) {
    refLimit(data, refConf = gamma[1], limitConf = gamma[2] )$Conf_Int[3:4]
}


```

(if some of these packages are not on your computer, first install it
with for example `install.packages("referenceIntervals")` )

... then let's create a ficticious data set on the fly:

```{r, message=FALSE, echo=TRUE}
glucoselevels <- GRD(BSFactors = "concentration(A,B,C,D,E)", 
                    SubjectsPerGroup = 100,
                    RenameDV = "gl",
                    Effects = list("concentration" = extent(10) ),
                    Population = list(mean = 100, stddev = 20) ) 
```
```{r, message=FALSE, echo=FALSE}
# as the package referenceIntervals cannot accommodate numbers below zero, lets remove them
glucoselevels$gl[glucoselevels$gl<10]<-10
```

This dataset will generate 100 individuals randomly 
for each of the five concentrations 
(column ``concentration``, with levels from A to E). The 
dependent variable, *glucose level*, is abbreviated to ``gl``.

Here is a snapshot of it:
```{r}
head(glucoselevels)
```

A simple plot could show the mean for each concentration along with
the 95% confidence interval of the means, with e.g., 

```{r, message=FALSE, echo=TRUE, fig.width = 4, fig.cap="**Figure 1**. Mean glucose level as a function of concentration."}
superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", 
            errorbar  = "CI",
            gamma     = 0.95,
            plotStyle = "line")
```

As seen, glucose level means are affected by the concentration (that 
effect was requested in the ``GRD()`` command above). Also note the 
vertical scale: it is very restricted around the means, yet, 
individuals are often much outside the visible part of the scale.

To convince yourself of that, ask for the smallest glucose level
and the highest:
```{r}
min(glucoselevels$gl)
max(glucoselevels$gl)
```

Instead of asking for an error bar representing a confidence 
interval (CI), let's ask for error bar representing the reference
intervals (RI).

Reference intervals are not (at this time) shipped with ``superb``.
Let's define a short-name version of this with 

```{r, eval=FALSE, message=FALSE, warning=FALSE, echo=TRUE}
RI.mean <- function(data, gamma = 0.95) {
    refLimit(data, refConf = gamma)$Ref_Int
}
```

We actually use the function ``refLimit()`` from the ``referenceIntervals``
package. There are many options in that function; please consult
the documentation for that package. We only preserve the reference
interval limits with ``$Ref_Int`` (that function outputs other
information, as we will see below).

The one argument that we use is the confidence level of the reference
interval, ``refConf`` (recommended is 95%, which is the default
if none is specified).

As you may note, the ``RI`` function is *attached* to the mean.
This is arbitrary: in `superb`, error bars must be error bars of
a summary statistic. You could have used any function, that won't
change the position of the interval.

This is all we need to make our first reference interval plots:

```{r, message=FALSE, echo=TRUE, fig.width = 4, fig.cap="**Figure 2**. Mean glucose level and 95% reference intervals as a function of concentration."}
superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", # mean is what RI is attached to
            errorbar  = "RI",   # RI calls the function above
            gamma     = 0.95,   # select the coverage desired
            plotStyle = "line" )
```

You can see that the vertical range is much wider now than in Figure 1.
It is ok, as these intervals cover most of the individuals in the
samples.

## Uncertainty in the reference interval limits

The plot from Figure 2 shows the estimated range of individuals
from a sample, and therefore has some uncertainty in the exact
location of the tips. It is possible to add some indication
of the width of uncertainty regarding these tips by estimating
a confidence interval for each extremity.

The exact computation is based on the uncertainty of quantiles
[e.g., the 2.5% quantile for the lower limit of the population
as infered from the 2.5% quantile of the sample, when 95% RI 
are plotted; see e.g., @nccls00; @htc14].

In the packge ``referenceIntervals``, the function ``refLimit()``
also produces this information by default. The novelty is that
the confidence level of the confidence interval for the tips of the RI
may be different from the reference interval level.

As an example, the RI coverage interval might be a 95%, so as to 
include a wide proportion of the population, and the certainty
on the tip positions of that RI might be low, e.g., 80%, because
we fear that the sample is not the most representative.

To acheive this, we wrap the ``refLimit()`` function in 
a shorter name function, ``ciloRI`` for the confidence intervals
of the lower part of the RI, and ``cihiRI`` for the confidence 
intervals of the upper part of the RI. This function requires
two gammas, one for the RI level, and the second for the CI
level. Thus, we get:

```{r, eval=FALSE, message=FALSE, warning=FALSE, echo=TRUE}        
ciloRI.mean <- function(data, gamma = c(0.95, 0.90) ) {
    refLimit(data, refConf = gamma[1], limitConf = gamma[2] )$Conf_Int[1:2]
}
cihiRI.mean <- function(data, gamma = c(0.95, 0.90) ) {
    refLimit(data, refConf = gamma[1], limitConf = gamma[2] )$Conf_Int[3:4]
}
```

Again, this is all we need. We can for example use the following 
to see the uncertainty in the top tips of all the RI:

```{r, message=FALSE, echo=TRUE, fig.width = 4, fig.cap="**Figure 3**. Mean glucose level and 90% confidence intervals of the upper RI tips."}
superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", errorbar = "cihiRI",
            gamma     = c(0.95, 0.90),
            plotStyle = "line" ) 
```

Well, that is weird because now, we only see an error bar way up 
there. The RI error bar is no longer visible because ``superbPlot()``
can only show one sort of error bar at a time.

Lets correct this. The gist of the following is to perform
three plots, one showing the RI intervals (as Figure 2 above), 
a second showing the 
confidence interval of the RI top tips (as Figure 3 just above),
and one last plot 
showing the confidence interval of the RI bottom tips. Plots 2 and
3 must be transparent so that they can be superimposed on top of
plot 1.

To that end, we use a set of graphic directives that makes
the plot transparent. It will be appplied to
plots 2 and 3 (it hides the grid (if any) and set 
the background of the whole plot and of the panel to transparent).
This operation is made with the function ``makeTransparent()``.

It is also necessary that all plots have the same range, so that
they align correctly when superimpose. To do so,
we set the vertical range from 0 to 200. We also add a description
in the title of the plot. These are all grouped in a list:

```{r}
ornate = list(
        labs(title =paste("(tick)     95% reference intervals (RI)",
                        "\n(red)      90% confidence intervals of upper 95% RI",
                        "\n(purple) 90% confidence intervals of lower 95% RI",
                        "\n(blue)    95% confidence intervals of the mean")),
        coord_cartesian( ylim = c(000,200) ),
        theme_light(base_size=10) # smaller font
)
```

Now that these are dealt with, let's do our three plots:

```{r, message=FALSE}
plt1 <- superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", 
            errorbar  = "RI",
            gamma     = 0.95,
            errorbarParams = list(width = 0.0, linewidth = 1.5,
                                  position = position_nudge( 0.0) ),
            plotStyle = "line" ) + ornate
plt2 <- superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", 
            errorbar  = "cihiRI",
            gamma     = c(0.95, 0.90),
            errorbarParams = list(width = 0.2, linewidth = 0.2, color = "red",
                                  direction = "left",
                                  position = position_nudge(-0.15) ),
            plotStyle = "line" ) + ornate + makeTransparent()
plt3 <- superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", 
            errorbar  = "ciloRI",
            gamma     = c(0.95, 0.90),
            errorbarParams = list(width = 0.2, linewidth = 0.2, color = "purple",
                                  direction = "left",
                                  position = position_nudge(-0.15) ),
            plotStyle = "line" ) + ornate + makeTransparent()
```

Things to note: (1) only the second and third uses the transparent
directives; (2) the confidence levels (``gamma``) have two numbers
in plots 2 and 3: the first for the RI, the second for the CI of 
the RI; (3) with ``errorbarParams``, we gave different attributes
to the various error bars (purple color for the lower CI, for example)
and moved them sligtly to the left (position).

Almost there. Lets turn these plots into graphic objects (``grob``), 
then  superimpose them onto an empty plot:

```{r, message=FALSE, echo=TRUE, fig.width = 4, fig.cap="**Figure 3a**. Mean glucose level and 95% reference intervals with 95% confidence intervals."}
# transform the three plots into visual objects
plt1 <- ggplotGrob(plt1)
plt2 <- ggplotGrob(plt2)
plt3 <- ggplotGrob(plt3)

# superimpose the grobs onto an empty ggplot 
ggplot() + 
    annotation_custom(grob=plt1) + 
    annotation_custom(grob=plt2) + 
    annotation_custom(grob=plt3)
```

## Is that it?

Well, we are just begining! The whole point of the reference 
intervals is to provide an indication of the individuals in 
the population. It would be great if we could see these, 
don't you think?

There are multiple ways to illustrate the individuals from a sample.
One is using jittered dots: each member of the sample is illustrated
with a small dot whose horizontal position can be jittered randomly
to avoid that many dots superimposes and no longer be visible.

These is a layout in ``superb`` that achieve exacly that, 
``pointjitter``. You can for example redo plot 1 but changing
the layout, then superimpose the plots again, as in

```{r, message=FALSE, echo=TRUE, fig.width = 4, fig.cap="**Figure 3b**. Jittered dots showing mean glucose level and 95% reference intervals with 95% confidence intervals."}
# redo plt1; the other 2 are still in memory
plt1 <- superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", 
            errorbar  = "RI",
            gamma     = 0.95,
            errorbarParams = list(width = 0.0, linewidth = 1.5,
                                  position = position_nudge( 0.0) ),
            plotStyle = "pointjitter" ) + ornate

# transform the new plot into a visual object
plt1 <- ggplotGrob(plt1)

# superimpose the grobs onto an empty ggplot 
ggplot() + 
    annotation_custom(grob=plt1) + 
    annotation_custom(grob=plt2) + 
    annotation_custom(grob=plt3)
```

As seen, it is now easy to see the individuals from the sample and
that most of them are indeed within the reference intervals.

A more elaborate layout, although maybe redundant with the RI, is
the ``pointjitterviolin`` as done in the next example.

```{r, message=FALSE, echo=TRUE, fig.width = 4, fig.cap="**Figure 3c**. Jittered dots and violins showing mean glucose level and 95% reference intervals with 95% confidence intervals of the tips' position."}
# redo plt1; the other 2 are still in memory
plt1 <- superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", 
            errorbar  = "RI",
            gamma     = 0.95,
            errorbarParams = list(width = 0.0, linewidth = 1.5,
                                  position = position_nudge( 0.0) ),
            plotStyle = "pointjitterviolin" ) + ornate

# transform the three plots into visual objects
plt1 <- ggplotGrob(plt1)

# you may superimpose the grobs onto an empty ggplot 
#ggplot() + 
#    annotation_custom(grob=plt1) + 
#    annotation_custom(grob=plt2) + 
#    annotation_custom(grob=plt3)
```

## Reference intervals vs. confidence intervals of the means

As one last example, we want to show the difference between the
reference intervals and the confidence intervals of a summary statistic, 
here the mean.

To that end, we can create a fourth plot showing the confidence interval
of the means with

```{r, message=FALSE}
plt4 <- superbPlot(glucoselevels, 
            BSFactors = "concentration",  
            variables = "gl", 
            statistic = "mean", 
            errorbar  = "CI",  # just the regular CI of the mean
            errorbarParams = list(width = 0.2, linewidth = 1.5, color = "blue",
                                  position = position_nudge( 0.00) ),
            gamma     = 0.95,
            plotStyle = "line" ) + ornate + makeTransparent()
```

This plot also has transparent theme as it will superimposed on the
previous three plots:

```{r, message=FALSE, echo=TRUE, fig.width = 4, fig.cap="**Figure 3d**. Jittered dots and violins showing mean glucose level +-95% confidence intervals of the mean, and 95% reference intervals with 95% confidence intervals."}
# transform that plot too into visual objects
plt4 <- ggplotGrob(plt4)

# superimpose the grobs onto an empty ggplot 
ggplot() + 
    annotation_custom(grob=plt1) + 
    annotation_custom(grob=plt2) + 
    annotation_custom(grob=plt3) +
    annotation_custom(grob=plt4)
```

As seen in blue, the confidence interval of the means have much 
shorter intervals than the reference interval. This is ok (or should
I say, *normal*): Both intervals represent very different things.
Also, the interval widths of the CI of the mean are shorter than
the interval widths of the CI or the RI tips for two reasons: (1) the 
RI tips are based on 90% confidence levels; (2) a central tendency
statistic is easier to estimate than an extreme quantile statistic.


## In summary

Reference intervals are useful to depict the population as a whole.
It is not an error bar in the sense that it does not represent
the error for the estimation of a statistic. Instead, it is an
illustration of the individuals' possible scores.

Note that Reference intervals **must not** be used in conjunction with
adjustments. These adjustments are used when conditions are compared
to other conditions. Reference intervals are not comparative statistics,
they show the extend of the sample, irrespective of what the other
samples might look like.



# References