% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/inspect_num.R
\name{inspect_num}
\alias{inspect_num}
\title{Summarise and compare the numeric variables within one or two dataframes}
\usage{
inspect_num(df1, df2 = NULL, breaks = 20, include_int = TRUE,
  show_plot = FALSE)
}
\arguments{
\item{df1}{A dataframe.}

\item{df2}{An optional second dataframe for comparing categorical levels.
Defaults to \code{NULL}.}

\item{breaks}{Integer number of breaks used for histogram bins, passed to 
\code{graphics::hist()}.  Defaults to 20.}

\item{include_int}{Logical flag, whether to include integer columns in numeric summaries.
Defaults to \code{TRUE}.}

\item{show_plot}{(Deprecated) Logical flag indicating whether a plot should be shown.
Superseded by the function \code{show_plot()} and will be dropped in a future version.
\code{hist(..., breaks)}.  See \code{?hist} for more details.}
}
\value{
A \code{tibble} containing statistical summaries of the numeric 
columns of \code{df1}, or comparing the histograms of \code{df1} and \code{df2}.
}
\description{
Summarise and compare the numeric variables within one or two dataframes
}
\details{
If only \code{df1} is specified, \code{inspect_num()} returns a tibble with columns
\itemize{
  \item \code{col_name}, a character vector containing the column names in \code{df1}
  \item \code{min}, \code{q1}, \code{median}, \code{mean}, \code{q3}, \code{max} and 
  \code{sd}: the minimum, lower quartile, median, mean, upper quartile, maximum and 
  standard deviation for each numeric column.
  \item \code{pcnt_na}, the percentage of each numeric feature that is missing
  \item \code{hist}, a named list of tibbles containing the relative frequency of values in a 
  falling in bins determined by \code{breaks}.
}
If both \code{df1} and \code{df2} are specified, the tibble has columns
\itemize{
  \item \code{col_name} character vector containing the column names in \code{df1}
  and \code{df2}
  \item \code{hist_1}, \code{hist_2} list column for histograms of each of \code{df1} and \code{df2}.
  Where a column appears in both dataframe, the bins used for \code{df1} are reused to 
  calculate histograms for \code{df2}.
  \item{jsd} numeric column containing the Jensen-Shannon divergence.  This measures the 
  difference in distribution of a pair of binned numeric features.  Values near to 0 indicate
  agreement of the distributions, while 1 indicates disagreement.
  \item{fisher_p} p-value corresponding to Fisher's exact test.  A small p indicates 
  evidence that the two histograms are actually different.
}
}
\examples{
data("starwars", package = "dplyr")
# show summary statistics for starwars
inspect_num(starwars)
# with a visualisation too - try to limit number of bins
inspect_num(starwars, breaks = 10)
# compare two data frames
inspect_num(starwars, starwars[-c(1:10), ], breaks = 10)
}
