% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/prepare_set.R
\name{prepare_set}
\alias{prepare_set}
\title{Preparation pipeline}
\usage{
prepare_set(data_set, final_form = "data.table", verbose = TRUE, ...)
}
\arguments{
\item{data_set}{Matrix, data.frame or data.table}

\item{final_form}{"data.table" or "numerical_matrix" (default to data.table)}

\item{verbose}{Should the algorithm talk? (logical, default to TRUE)}

\item{...}{Additional parameters to tune pipeline (see details)}
}
\value{
A data.table or a numerical matrix (according to \code{final_form}). \cr
It will perform the following steps:
\itemize{
  \item Correct set: unfactor factor with many values, id dates and numeric that are hiden in character
  \item Transform set: compute differences between every date, transform dates into factors, generate
     features from character..., if \code{key} is provided, will perform aggregate according to this \code{key}
  \item Filter set: filter constant, in double or bijection variables. If `digits` is provided,
     will round numeric
  \item Handle NA: will perform \code{\link{fast_handle_na}})
  \item Shape set: will put the result in asked shape (\code{final_form}) with acceptable columns format.
}
}
\description{
Full pipeline for preparing your data_set set.
}
\details{
Additional arguments are available to tune pipeline:
\itemize{
  \item \code{key} Name of a column of data_set according to which data_set should be aggregated
     (character)
  \item \code{analysis_date} A date at which the data_set should be aggregated
     (differences between every date and analysis_date will be computed) (Date)
  \item \code{n_unfactor} Number of max value in a factor, set it to -1 to disable
  \code{\link{un_factor}} function.  (numeric, default to 53)
  \item \code{digits} The number of digits after comma (optional, numeric, if set will perform
     \code{\link{fast_round}})
  \item \code{dateFormats} List of format of Dates in data_set (list of characters)
  \item \code{name_separator} character to separate parts of new column names (character, default to ".")
  \item \code{functions}  Aggregation functions for numeric columns, see \code{\link{aggregate_by_key}}
   (list of functions names (character))
  \item \code{factor_date_type} Aggregation level to factorize date (see
     \code{\link{generate_factor_from_date}}) (character, default to "yearmonth")
  \item \code{target_col} A target column to perform target encoding, see \code{\link{target_encode}}
  (character)
  \item \code{target_encoding_functions} Functions to perform target encoding, see
  \code{\link{build_target_encoding}},
  if \code{target_col} is not given will not do anything, (list, default to \code{"mean"})
}
}
\examples{
# Load ugly set
\dontrun{
data(tiny_messy_adult)

# Have a look to set
head(tiny_messy_adult)

# Compute full pipeline
clean_adult <- prepare_set(tiny_messy_adult)

# With a reference date
adult_agg <- prepare_set(tiny_messy_adult, analysis_date = as.Date("2017-01-01"))

# Add aggregation by country
adult_agg <- prepare_set(tiny_messy_adult, analysis_date = as.Date("2017-01-01"), key = "country")

# With some new aggregation functions
power <- function(x) {sum(x^2)}
adult_agg <- prepare_set(tiny_messy_adult, analysis_date = as.Date("2017-01-01"), key = "country",
                        functions = c("min", "max", "mean", "power"))
}
# "##NOT RUN:" mean that this example hasn't been run on CRAN since its long. But you can run it!
}
