% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/PRISM.R
\name{PRISM}
\alias{PRISM}
\alias{PRISM.default}
\alias{PRISM.formula}
\title{PReprocessing Instances that Should be Misclassified}
\usage{
\method{PRISM}{formula}(formula, data, ...)

\method{PRISM}{default}(x, classColumn = ncol(x), ...)
}
\arguments{
\item{formula}{A formula describing the classification variable and the attributes to be used.}

\item{data, x}{Data frame containing the tranining dataset to be filtered.}

\item{...}{Optional parameters to be passed to other methods.}

\item{classColumn}{positive integer indicating the column which contains the
(factor of) classes. By default, the last column is considered.}
}
\value{
An object of class \code{filter}, which is a list with seven components:
\itemize{
   \item \code{cleanData} is a data frame containing the filtered dataset.
   \item \code{remIdx} is a vector of integers indicating the indexes for
   removed instances (i.e. their row number with respect to the original data frame).
   \item \code{repIdx} is a vector of integers indicating the indexes for
   repaired/relabelled instances (i.e. their row number with respect to the original data frame).
   \item \code{repLab} is a factor containing the new labels for repaired instances.
   \item \code{parameters} is a list containing the argument values.
   \item \code{call} contains the original call to the filter.
   \item \code{extraInf} is a character that includes additional interesting
   information not covered by previous items.
}
}
\description{
Similarity-based filter for removing label noise from a dataset as a
preprocessing step of classification. For more information, see 'Details' and
'References' sections.
}
\details{
\code{PRISM} identifies \emph{ISMs} (Instances that Should be Misclassified) and removes them from the dataset.
In order to do so, it combines five heuristics based on varied approaches by means of a formula.
One heuristic relies on class distribution among nearest neighbors, two heuristics are based on the class
distribution in a leaf node of a C4.5 tree (either pruned or unpruned), and the other two are based on
the class likelihood for an instance, assuming gaussian distribution for continuous variables when necessary.
}
\examples{
data(iris)
out <- PRISM(Species~., data = iris)
print(out)
identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])
}
\references{
Smith M. R., Martinez T. (2011, July): Improving classification accuracy by identifying
and removing instances that should be misclassified.
In \emph{Neural Networks (IJCNN), The 2011 International Joint Conference on} (pp. 2690-2697). IEEE.
}

