% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/treat.R
\name{Treat.coin}
\alias{Treat.coin}
\title{Treat a data set in a coin for outliers}
\usage{
\method{Treat}{coin}(
  x,
  dset,
  global_specs = NULL,
  indiv_specs = NULL,
  combine_treat = FALSE,
  out2 = "coin",
  write_to = NULL,
  write2log = TRUE,
  ...
)
}
\arguments{
\item{x}{A coin}

\item{dset}{A named data set available in \code{.$Data}}

\item{global_specs}{A list specifying the treatment to apply to all columns. This will be applied to all columns, except any
that are specified in the \code{indiv_specs} argument. Alternatively, set to \code{"none"} to apply no treatment. See details.}

\item{indiv_specs}{A list specifying any individual treatment to apply to specific columns, overriding \code{global_specs}
for those columns. See details.}

\item{combine_treat}{By default, if \code{f1} fails to pass \code{f_pass}, then \code{f2} is applied to the original \code{x},
rather than the treated output of \code{f1}. If \code{combine_treat = TRUE}, \code{f2} will instead be applied to the output
of \code{f1}, so the two treatments will be combined.}

\item{out2}{The type of function output: either \code{"coin"} to return an updated coin, or \code{"list"} to return a
list with treated data and treatment details.}

\item{write_to}{If specified, writes the aggregated data to \code{.$Data[[write_to]]}. Default \code{write_to = "Treated"}.}

\item{write2log}{Logical: if \code{FALSE}, the arguments of this function are not written to the coin log, so this
function will not be invoked when regenerating. Recommend to keep \code{TRUE} unless you have a good reason to do otherwise.}

\item{...}{arguments passed to or from other methods.}
}
\value{
An updated coin with a new data set \code{.Data$Treated} added, plus analysis information in
\code{.$Analysis$Treated}.
}
\description{
Operates a two-stage data treatment process on the data set specified by \code{dset}, based on two data treatment functions, and a pass/fail
function which detects outliers. The method of data treatment can be either specified by the \code{global_specs} argument (which applies
the same specifications to all indicators in the specified data set), or else (additionally) by the \code{indiv_specs} argument which allows different
methods to be applied for each indicator. See details. For a simpler function for data treatment, see the wrapper function \code{\link[=qTreat]{qTreat()}}.
}
\section{Global specifications}{
If the same method of data treatment should be applied to all indicators, use the \code{global_specs} argument. This argument takes a structured
list which looks like this:

\if{html}{\out{<div class="sourceCode">}}\preformatted{global_specs = list(f1 = .,
                    f1_para = list(.),
                    f2 = .,
                    f2_para = list(.),
                    f_pass = .,
                    f_pass_para = list()
                    )
}\if{html}{\out{</div>}}

The entries in this list correspond to arguments in \code{\link[=Treat.numeric]{Treat.numeric()}}, and the meanings of each are also described in more detail here
below. In brief, \code{f1} is the name of a function to apply at the first round of data treatment, \code{f1_para} is a list of any additional
parameters to pass to \code{f1}, \code{f2} and \code{f2_para} are equivalently the function name and parameters of the second round of data treatment, and
\code{f_pass} and \code{f_pass_para} are the function and additional arguments to check for the existence of outliers.

The default values for \code{global_specs} are as follows:

\if{html}{\out{<div class="sourceCode">}}\preformatted{global_specs = list(f1 = "winsorise",
                     f1_para = list(na.rm = TRUE,
                                    winmax = 5,
                                    skew_thresh = 2,
                                    kurt_thresh = 3.5,
                                    force_win = FALSE),
                     f2 = "log_CT",
                     f2_para = list(na.rm = TRUE),
                     f_pass = "check_SkewKurt",
                     f_pass_para = list(na.rm = TRUE,
                                        skew_thresh = 2,
                                        kurt_thresh = 3.5))
}\if{html}{\out{</div>}}

This shows that by default (i.e. if \code{global_specs} is not specified), each indicator is checked for outliers by the \code{\link[=check_SkewKurt]{check_SkewKurt()}} function, which
uses skew and kurtosis thresholds as its parameters. Then, if outliers exist, the first function \code{\link[=winsorise]{winsorise()}} is applied, which also
uses skew and kurtosis parameters, as well as a maximum number of winsorised points. If the Winsorisation function does not satisfy
\code{f_pass}, the \code{\link[=log_CT]{log_CT()}} function is invoked.

To change the global specifications, you don't have to supply the whole list. If, for example, you are happy with all the defaults but
want to simply change the maximum number of Winsorised points, you could specify e.g. \code{global_specs = list(f1_para = list(winmax = 3))}.
In other words, a subset of the list can be specified, as long as the structure of the list is correct.
}

\section{Individual specifications}{
The \code{indiv_specs} argument allows different specifications for each indicator. This is done by wrapping multiple lists of the format of the
list described in \code{global_specs} into one single list, named according to the column names of \code{x}. For example, if the date set has indicators with codes
"x1", "x2" and "x3", we could specify individual treatment as follows:

\if{html}{\out{<div class="sourceCode">}}\preformatted{indiv_specs = list(x1 = list(.),
                   x2 = list(.)
                   x3 = list(.))
}\if{html}{\out{</div>}}

where each \code{list(.)} is a specifications list of the same format as \code{global_specs}. Any indicators that are \emph{not} named in \code{indiv_specs} are
treated using the specifications from \code{global_specs} (which will be the defaults if it is not specified). As with \code{global_specs},
a subset of the \code{global_specs} list may be specified for
each entry. Additionally, as a special case, specifying a list entry as e.g. \code{x1 = "none"} will apply no data treatment to the indicator "x1". See
\code{vignette("treat")} for examples of individual treatment.
}

\section{Function methodology}{
This function is set up to allow any functions to be passed as the
data treatment functions (\code{f1} and \code{f2}), as well as any function to be passed as the outlier detection
function \code{f_pass}, as specified in the \code{global_specs} and \code{indiv_specs} arguments.

The arrangement of this function is inspired by a fairly standard data treatment process applied to
indicators, which consists of checking skew and kurtosis, then if the criteria are not met, applying
Winsorisation up to a specified limit. Then if Winsorisation still does not bring skew and kurtosis
within limits, applying a nonlinear transformation such as log or Box-Cox.

This function generalises this process by using the following general steps:
\enumerate{
\item Check if variable passes or fails using \code{f_pass}
\item If \code{f_pass} returns \code{FALSE}, apply \code{f1}, else return \code{x} unmodified
\item Check again using *\code{f_pass}
\item If \code{f_pass} still returns \code{FALSE}, apply \code{f2}
\item Return the modified \code{x} as well as other information.
}

For the "typical" case described above \code{f1} is a Winsorisation function, \code{f2} is a nonlinear transformation
and \code{f_pass} is a skew and kurtosis check. Parameters can be passed to each of these three functions in
a named list, for example to specify a maximum number of points to Winsorise, or Box-Cox parameters, or anything
else. The constraints are that:
\itemize{
\item All of \code{f1}, \code{f2} and \code{f_pass} must follow the format \verb{function(x, f_para)}, where \code{x} is a
numerical vector, and \code{f_para} is a list of other function parameters to be passed to the function, which
is specified by \code{f1_para} for \code{f1} and similarly for the other functions. If the function has no parameters
other than \code{x}, then \code{f_para} can be omitted.
\item \code{f1} and \code{f2} should return either a list with \code{.$x} as the modified numerical vector, and any other information
to be attached to the list, OR, simply \code{x} as the only output.
\item \code{f_pass} must return a logical value, where \code{TRUE} indicates that the \code{x} passes the criteria (and
therefore doesn't need any (more) treatment), and \code{FALSE} means that it fails to meet the criteria.
}

See also \code{vignette("treat")}.
}

\examples{
# build example coin
coin <- build_example_coin(up_to = "new_coin")

# treat raw data set
coin <- Treat(coin, dset = "Raw")

# summary of treatment for each indicator
head(coin$Analysis$Treated$Dets_Table)

}
