\name{diff.resamples}
\alias{diff.resamples}
\alias{summary.diff.resamples}
\alias{compare_models}
\title{Inferential Assessments About Model Performance}
\description{Methods for making inferences about differences between models }
\usage{
\method{diff}{resamples}(x, models = x$models, metric = x$metrics, 
     test = t.test, 
     confLevel = 0.95, adjustment = "bonferroni",
     ...)

\method{summary}{diff.resamples}(object, digits = max(3, getOption("digits") - 3), ...)

compare_models(a, b, metric = a$metric[1])
}
\arguments{
  \item{x}{an object generated by \code{resamples}}
  \item{models}{a character string for which models to compare}
  \item{metric}{a character string for which metrics to compare}
  \item{test}{a function to compute differences. The output of this function should have scalar outputs called \code{estimate} and \code{p.value}}
  \item{object}{a object generated by \code{diff.resamples}}
  \item{adjustment}{any p-value adjustment method to pass to \code{\link[stats]{p.adjust}}. }
  \item{confLevel}{confidence level to use for \code{\link{dotplot.diff.resamples}}. See Details below.}
  \item{digits}{the number of significant differences to display when printing}
  \item{a, b}{two objects of class \code{\link{train}}, \code{\link{sbf}} or \code{\link{rfe}}  with a common set of resampling indices in the \code{control} object.}
  \item{\dots}{further arguments to pass to \code{test}}
}
\details{
The ideas and methods here are based on Hothorn et al. (2005) and Eugster et al. (2008).

For each metric, all pair-wise differences are computed and tested to assess if the difference is equal to zero.

When a Bonferroni correction is used, the confidence level is changed from \code{confLevel} to \code{1-((1-confLevel)/p)} here \code{p} is the number of pair-wise comparisons are being made. For other correction methods, no such change is used.

\code{compare_models} is a shorthand function to compare two models using a single metric. It returns the results of \code{\link[stats]{t.test}} on the differences. 
}
\value{
An object of class \code{"diff.resamples"} with elements:
  \item{call }{the call}
  \item{difs }{a list for each metric being compared. Each list contains a matrix with differences in columns and resamples in rows }
  \item{statistics }{a list of results generated by \code{test}}
  \item{adjustment}{the p-value adjustment used}
  \item{models}{a character string for which models were compared.}
  \item{metrics }{a character string of performance metrics that were used}

or...

An object of class \code{"summary.diff.resamples"} with elements:
  \item{call }{the call}
  \item{table }{a list of tables that show the differences and p-values }

...or (for \code{compare_models}) an object of class \code{htest} resulting from \code{\link[stats]{t.test}}.
}

\references{Hothorn et al. The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics (2005) vol. 14 (3) pp. 675-699

Eugster et al. Exploratory and inferential analysis of benchmark experiments. Ludwigs-Maximilians-Universitat Munchen, Department of Statistics, Tech. Rep (2008) vol. 30}

\author{Max Kuhn}

\seealso{\code{\link{resamples}}, \code{\link{dotplot.diff.resamples}},  \code{\link{densityplot.diff.resamples}}, \code{\link{bwplot.diff.resamples}}, \code{\link{levelplot.diff.resamples}}}

\examples{
\dontrun{
#load(url("http://topepo.github.io/caret/exampleModels.RData"))

resamps <- resamples(list(CART = rpartFit,
                          CondInfTree = ctreeFit,
                          MARS = earthFit))

difs <- diff(resamps)

difs

summary(difs)

compare_models(rpartFit, ctreeFit)
}
}

\keyword{ models }

