% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/changepoint.R
\name{KRDetect.outliers.changepoint}
\alias{KRDetect.outliers.changepoint}
\title{Identification of outliers using changepoint analysis}
\usage{
KRDetect.outliers.changepoint(x, perform.smoothing = TRUE,
  perform.cp.analysis = TRUE, bandwidth.type = "local",
  bandwidth.value = NULL, cp.analysis.type = "parametric",
  pen.value = "5*log(n)", alpha.edivisive = 0.3,
  min.segment.length = 30, segment.length.for.merge = 15,
  method = "auto", prefer.grubbs = TRUE, alpha.default = NULL,
  L.default = NULL)
}
\arguments{
\item{x}{a numeric vector of observations.}

\item{perform.smoothing}{a logical value specifying if data smoothing is performed. If \code{TRUE} (default), data are smoothed.}

\item{perform.cp.analysis}{a logical value specifying if changepoint analysis is performed. If \code{TRUE} (default), smoothing residuals are partitioned into homogeneous segments.}

\item{bandwidth.type}{a character string specifying the type of bandwidth, must be \code{"local"} (default) or \code{"global"}.}

\item{bandwidth.value}{a local bandwidth array (for \code{bandwidth.type = "local"}) or global bandwidth value (for \code{bandwidth.type = "global"}) for kernel regression estimation. If \code{bandwidth.type = "NULL"} (default) a data-adaptive local plug-in (Herrmann, 1997) (for \code{bandwidth.type = "local"}) or data-adaptive global plug-in (Gasser et al., 1991) (for \code{bandwidth.type = "global"}) bandwidth is used instead.}

\item{cp.analysis.type}{a character string specifying the type of changepoint analysis, must be \code{"parametric"} or \code{"nonparametric"} (default).
If \code{cp.analysis.type = "parametric"}, changepoint analysis is performed using PELT algorithm (Killick et al., 2012), otherwise A Nonparametric Approach for Multiple Changepoins (Matteson and James, 2014) is used.}

\item{pen.value}{a character string giving the formula for manual penalty used in PELT algorithm.
Only required for \code{cp.analysis.type = "parametric"}. Default is \code{pen.value = "5*log(n)"}.}

\item{alpha.edivisive}{a numeric value giving the moment index used for determining the distance between and within segments in nonparametric changepoint model. Default is \code{alpha.edivisive = 0.3}.}

\item{min.segment.length}{a numeric value giving minimal required number of observations on segments from changepoint analysis.
If a segment contains less than \code{min.segment.length} observations and the variances of data on the segment and the previous one are supposed to be equal (based on Levene´s test (Fox, 2016) for homogeneity of variances), the segment is merged with previous one.
Analogous, the first segment can be merged with the second one. Default is \code{min.segment.length = 30}.}

\item{segment.length.for.merge}{a numeric value giving giving minimal required number of observations on segments for performing the homogeneity test within changepoint split control.
A segment with less data than \code{segment.length.for.merge} is merged with the previous one without testing the homogeneity of variances (the first segment is merged with the second one). Default is \code{segment.length.for.merge = 15}.}

\item{method}{a character string specifying the method for identification of outlier residuals. Must be one of \code{"auto"} (automatic selection based on the structure of the residuals), \code{"grubbs.test"} (Grubbs test), \code{"normal.distribution"} (quantiles of normal distribution) or \code{"chebyshev.inequality"} (chebyshev inequality). Default is \code{method = "auto"}.}

\item{prefer.grubbs}{a logical variable specyfing if Grubbs test for identification of outlier residuals is preferred to quantiles of normal distribution.
\code{TRUE} (default) means that Grubbs test is preferred. Only required for \code{method = "auto"}.}

\item{alpha.default}{a numeric value from interval (0,1) of alpha parameter determining the criterion for (residual) outlier detection:
the limits for outlier residuals on individual segments are set as \eqn{+/- (alpha/2-quantile of normal distribution with parameters corresponding to residuals on studied segment) * (sample standard deviation of residuals on corresponding segment)}.
If \code{alpha.default = NULL} (default), its value on individual segments is estimated using Modified Algorithm A1 (Campulova et al., 2018).}

\item{L.default}{a numeric value of \emph{L} parameter determining the criterion for outlier (residual) detection:
the limits for outlier residuals on individual segments are set as \eqn{+/- L * sample standard deviation of residuals on corresponding segment}.
If \code{L.default = NULL} (default), its value on individual segments is estimated using Algorithm A1 (Campulova et al., 2018).}
}
\value{
A list is returned with elements:
\item{method.type}{a character string giving the type of method used for outlier idetification}
\item{x}{a numeric vector of observations}
\item{index}{a numeric vector of index design points assigned to individual observations}
\item{smoothed}{a numeric vector of estimates of the kernel regression function (smoothed data)}
\item{changepoints}{an integer membership vector for individual segments}
\item{normality.results}{a data.frame of normality results of residuals on individual segments}
\item{detection.method}{a character string giving the type of method used for identification of outlier residuals}
\item{alpha}{a numeric vector of alpha parameters used for outlier identification on individual segments}
\item{L}{a numeric vector of \emph{L} parameters used for outlier identification on individual segments}
\item{outlier}{a logical vector specyfing the identified outliers, \code{TRUE} means that corresponding observation from vector \code{x} is detected as outlier}
}
\description{
Identification of outliers in environmental data using method based on kernel smoothing, changepoint analysis of smoothing residuals and subsequent analysis of residuals on homogeneous segments (Campulova et al., 2018).
}
\details{
This function identifies outliers in time series using procedure based on kernel smoothing, changepoint analysis of smoothing residuals and subsequent analysis of residuals on homogeneous segments (Campulova et al., 2018).
Three different approaches (Grubbs test, quantiles of normal distribution, Chebyshev inequality), that can be selected automatically based on data structure or specified by the user, can be used to detect outlier residuals.
Crucial for the method is the choice of parameters alpha and \emph{L} for quantiles of normal distribution and Chebyshev inequality approach, that define the criterion for outlier detection. These values can be specified by the user
or estimated automatically using data driven algorithms (Campulova et al., 2018).
}
\examples{
data("mydata", package = "openair")
x = mydata$o3[format(mydata$date, "\%m \%Y") == "12 2002"]
result = KRDetect.outliers.changepoint(x)
KRDetect.outliers.plot(result)
}
\references{
Campulova M, Michalek J, Mikuska P, Bokal D (2018). Nonparametric algorithm for identification of outliers in environmental data. Journal of Chemometrics, 32, 453-463.

Gasser T, Kneip A, Kohler W (1991). A flexible and fast method for automatic smoothing. Journal of the American Statistical Association, 86, 643–652.

Herrmann E (1997). Local bandwidth choice in kernel regression estimation. Journal of Computational and Graphical Statistics, 6(1), 35–54.

Eva Herrmann; Packaged for R and enhanced by Martin Maechler (2016). lokern: Kernel Regression Smoothing with Local or Global Plug-in Bandwidth. R package version 1.1-8. https://CRAN.R-project.org/package=lokern.

Killick R, Fearnhead P, Eckley IA (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590–1598.

Killick R, Haynes K, Eckley IA (2016). changepoint: An R package for changepoint analysis. R package version 2.2.2, <URL: https://CRAN.R-project.org/package=changepoint>.

Matteson D, James N (2014). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data. Journal of the American Statistical Association, 109(505), 334–345.

Nicholas A. James, David S. Matteson (2014). ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data. Journal of Statistical Software, 62(7), 1-25, URL "http://www.jstatsoft.org/v62/i07/".

Brys G, Hubert M, Struyf A (2008). Goodness-of-fit tests based on a robust measure of skewness. Computational Statistics, 23(3), 429–442.

Todorov V, Filzmoser P (2009). An Object-Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1-47. URL http://www.jstatsoft.org/v32/i03/.

Box G, Cox D (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B, 26, 211–234.

Venables WN, Ripley BD (2002). Modern Applied Statistics with S. New York, fourth edition. ISBN 0-387-95457-0, URL http://www.stats.ox.ac.uk/pub/MASS4.

Grubbs F (1950). Sample criteria for testing outlying observations. The Annals of Mathematical Statistics, 21(1), 27-58.

Fox J (2016). Applied regression analysis and generalized linear models. 3 edition. Los Angeles: SAGE. ISBN 9781452205663.
}
