% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/alluvial_model_response.R
\name{get_data_space}
\alias{get_data_space}
\title{calculate data space}
\usage{
get_data_space(df, imp, degree = 4, bins = 5, max_levels = 10)
}
\arguments{
\item{df}{dataframe, training data}

\item{imp}{dataframe, with not more then two columns one of them numeric
containing importance measures and one character or factor column containing
corresponding variable names as found in training data.}

\item{degree}{integer,  number of top important variables to select. For
plotting more than 4 will result in two many flows and the alluvial plot
will not be very readable, Default: 4}

\item{bins}{integer, number of bins for numeric variables, and maximum number
of levels for factor variables, increasing this number might result in too
many flows, Default: 5}

\item{max_levels}{integer, maximum number of levels per factor variable, Default: 10}
}
\value{
data frame
}
\description{
calculates a dataspace based on the modelling dataframe and the
 importance of the explanatory variables. It only considers the most
 important variables as defined by the degree parameter. It selects a number
 (defined by bins) of sensible single values spread over the range of the
 numeric variables and creates all possible value combinations among the most
 important variables. The values of the remaining variables are set to
 mode(factors) or median(numerics).
}
\details{
It selects a the top most important variables based on the degree
 parameter and bins the numeric variables using
 \code{\link[easyalluvial]{manip_bin_numerics}}, while leaving categoric
 variables unchanged. The number of bins for each numeric variable is set to
 bins -2. Next the median is picked for each of the bins and the min and the
 max value is added for each numeric variable So that we get { median(bin) X
 bins -2, max, min} for each numeric variable. Then all possible combinations
 between those values and the  categoric factor levels are created. The total
 number of all possible combinations defines the range of the data space. The
 values of the remaining variables are set to mode(factors) or
 median(numerics).

this model visualisation approach follows the "visualising the model
 in the dataspace" principle as described in Wickham H, Cook D, Hofmann H
 (2015) Visualizing statistical models: Removing the blindfold. Statistical
 Analysis and Data Mining 8(4) <doi:10.1002/sam.11271>
}
\examples{
df = mtcars2[, ! names(mtcars2) \%in\% 'ids' ]
m = randomForest::randomForest( disp ~ ., df)
imp = m$importance
dspace = get_data_space(df, imp)
}
\seealso{
\code{\link[easyalluvial]{alluvial_wide}},
 \code{\link[easyalluvial]{manip_bin_numerics}}
}
