% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/proc_freq.R
\encoding{UTF-8}
\name{proc_freq}
\alias{proc_freq}
\title{Generates Frequency Statistics}
\usage{
proc_freq(
  data,
  tables = NULL,
  output = NULL,
  by = NULL,
  weight = NULL,
  options = NULL,
  titles = NULL
)
}
\arguments{
\item{data}{The input data frame to perform frequency calculations on.
Input data as the first parameter makes this function pipe-friendly.}

\item{tables}{The variable or variables to perform frequency counts on.
The table specifications are passed as a vector of strings. For one-way
frequencies, simply pass the variable name.
For two-way tables, pass the desired combination of variables separated by a
star (*) operator.  The parameter does not accept SAS® style grouping syntax.
All cross combinations should be listed explicitly. If the
table request is named, the name will be used as the list item name on the
return list of tables. See "Example 3" for an illustration on how to name an
output table.}

\item{output}{Whether or not to return datasets from the function. Valid
values are "out", "none", and "report".  Default is "out". This parameter
also accepts the data shaping options "long", "stacked", and "wide". See
the \strong{Data Shaping} section for a description of these options. Multiple
output keywords may be passed on a character vector. For example,
to produce both a report dataset and a "long" output dataset,
use the parameter \code{output = c("report", "out", "long")}.}

\item{by}{An optional by group. Parameter accepts a vector of one or more
variable names. When this parameter is set, data
will be subset for each by group, and tables will be generated for
each subset.}

\item{weight}{An optional weight parameter.  This parameter is passed
as a variable name to use for the weight.  If a weight variable is
indicated, the weighted value will be summed to calculate the frequency
counts.}

\item{options}{The options desired for the function.
Options are passed to the parameter as a vector of quoted strings. You may
also use the \code{v()} function to pass unquoted strings.
The following options are available:
"chisq", "crosstab", "fisher", "list", "missing",
"nlevels", "nocol",
"nocum", "nofreq", "nopercent", "noprint",
"nonobs", "norow", "nosparse", "notable", "outcum". See
the \strong{Options} section for a description of these options.}

\item{titles}{A vector of titles to assign to the interactive report.}
}
\value{
The function will return all requested datasets by default.  This is
equivalent to the \code{output = "out"} option.  To return the datasets
as created for the interactive report, pass the "report" output option.  If
no output datasets are desired, pass the "none" output option. If a
single dataset is requested, the function
will return a single dataset.  If multiple datasets are requested, the function
will return a list of datasets.  The type of data frame returned will
correspond to the type of data frame passed in on the \code{data} parameter.
If the input data is a tibble, the output data will be a
tibble.  If the input data is a Base R data frame, the output data will be
a Base R data frame.
}
\description{
The \code{proc_freq} function generates frequency statistics.
It is both an interactive function that can be used for data exploration,
and can produce dataset output for further analysis.
The function can perform one and two-way frequencies.  Two-way
frequencies are produced as a cross-tabulation by default.  There
are many options to control the generated tables.  The function will return
requested tables in a named list.
}
\details{
The \code{proc_freq} function generates frequency statistics
for one-way and two-way tables.  Data is passed in on the \code{data}
parameter.  The desired frequencies are specified on the \code{tables}
parameter.
}
\section{Report Output}{

By default, \code{proc_freq} results will
be immediately sent to the viewer as an HTML report.  This functionality
makes it easy to get a quick analysis of your data with very little
effort. To turn off the interactive report, pass the "noprint" keyword
to the \code{options} parameter or set \code{options("procs.print" = FALSE)}.

The \code{titles} parameter allows you to set one or more titles for your
report.  Pass these titles as a vector of strings.

If the frequency variables have a label assigned, that label
will be used in the report output. This feature gives you some control
over the column headers in the final report.

The exact datasets used for the interactive output can be returned as a list.
To return these datasets as a list, pass
the "report" keyword on the \code{output} parameter. This list may in
turn be passed to \code{\link{proc_print}} to write the report to a file.
}

\section{Data Frame Output}{

The \code{proc_freq} function returns output datasets.
If you are requesting only one table, a single
data frame will be returned.  If you request multiple tables, a list of data
frames will be returned.

By default, the list items are named according to the
strings specified on the \code{tables} parameter. You may control
the names of the returned results by using a named vector on the
\code{tables} parameter.

The standard output datasets are optimized for data manipulation.
Column names have been standardized, and additional variables may
be present to help with data manipulation. For instance, the by variable will
always be named "BY", and the frequency category will always be named "CAT".
In addition, data values in the
output datasets are not rounded or formatted
to give you the most accurate statistical results.
}

\section{Frequency Weight}{

Normally the \code{proc_freq} function counts each row in the
input data equally. In some cases, however, each row in the data
can represent multiple observations, and rows should not be treated
equally.  In these cases, use the \code{weight} parameter.  The parameter
accepts a variable/column name to use as the weighted value.  If the
\code{weight} parameter is used, the function will sum the weighted values
instead of counting rows.
}

\section{By Groups}{

You may request that frequencies be separated into by groups using the
\code{by} parameter.  The parameter accepts one or more variable names
from the input dataset. When this parameter is assigned, the data
will be subset by the "by" variable(s) before frequency counts are
calculated.  On the interactive report, the by groups will appear in
separate tables.  On the output dataset, the by groups will be identified
by additional columns.
}

\section{Options}{

The \code{options} parameter accepts a vector of options.  Normally, these
options must be quoted.  But you may pass them unquoted using the \code{v()}
function.  For example, you can request the number of category levels
and the Chi-Square statistic like this: \code{options = v(nlevels, chisq)}.

Below are all the available options and a description of each:
\itemize{
\item{\strong{crosstab}: Two-way output tables are a list style by default.
If you want a crosstab style, pass the "crosstab" option.
}
\item{\strong{list}: Two-way interactive tables are a crosstab style
by default.  If you want a list style two-way table, pass the "list" option.
}
\item{\strong{missing}: Normally, missing values are not counted and not
shown on frequency tables.  The "missing" option allows you to treat
missing (NA) values as normal values, so that they are counted and
shown on the frequency table.  Missing levels will appear on the
table as a single dot (".").
}
\item{\strong{nlevels}: The "nlevels" option will display the number of unique
values for each variable in the frequency table. These levels are generated
as a separate table that appears on the report, and will also be output from
the function as a separate dataset.
}
\item{\strong{nocol}: Two-way cross tabulation tables include column percents
by default.  To turn them off, pass the "nocol" option.
}
\item{\strong{nocum}: Whether to include the cumulative frequency and percent
columns on one-way, interactive tables. These columns are included by default.
To turn them off, pass the "nocum" option.
}
\item{\strong{nofreq}: The "nofreq" option will remove the frequency column
from one-way and two-way tables.
}
\item{\strong{nopercent}: The "nopercent" option will remove the percent column
from one-way and two-way tables.
}
\item{\strong{noprint}: Whether to print the interactive report to the
viewer.  By default, the report is printed to the viewer. The "noprint"
option will inhibit printing.
}
\item{\strong{nonobs}: Whether to include the number of observations "N"
column on the output and interactive tables.  By default, the N column
will be included.  The "nonobs" option turns it off.
}
\item{\strong{norow}: Whether to include the row percentages on two-way
crosstab tables. The "norow" option will turn them off.
}
\item{\strong{nosparse/sparse}: Whether to include categories for which there are no
frequency counts.  Zero-count categories will be included by default, which
is the "sparse" option.  If the
"nosparse" option is present, zero-count categories will be removed.
}
\item{\strong{notable}: Whether to include the frequency table in the output
dataset list. Normally, the frequency table is included.  You may want to
exclude the frequency table in some cases, for instance, if you only
want the Chi-Square statistic.
}
\item{\strong{outcum}: Whether to include the cumulative frequency and percent
on output frequency tables.  By default, these columns are not included.
The "outcum" option will include them.
}
}
}

\section{Statistics Options}{

In addition to the above options, the \code{options} parameter accepts
some statistics options.  The following keywords will generate
an additional tables of specialized statistics. These statistics
options are only available on two-way tables:
\itemize{
\item{\strong{chisq}: Requests that the Chi-square statistics be produced.
}
\item{\strong{fisher}: Requests that the Fisher's exact statistics be produced.
}
}
}

\section{Using Factors}{

There are some occasions when you may want to define the \code{tables} variable
or \code{by} variables as a factor. One occasion is for sorting/ordering,
and the other is for obtaining zero-counts on sparse data.

To order the frequency categories in the frequency output, define the
\code{tables} variable as a factor in the desired order. The function will
then retain that order for the frequency categories in the output dataset
and report.

You may also wish to
define the tables variable as a factor if you are dealing with sparse data
and some of the frequency categories are not present in the data. To ensure
these categories are displayed with zero-counts, define the \code{tables} variable
or \code{by} variable
as a factor and use the "sparse" option.  Note
that the "sparse" option is actually the default.

If you do not want to
show the zero-count categories on a variable that is defined as a factor,
pass the "nosparse" keyword on the \code{options} parameter.
}

\section{Data Shaping}{

By default, the \code{proc_freq} function returns an output dataset of
frequency results.  If running interactively, the function also prints
the frequency results to the viewer.  As described above, the output
dataset can be somewhat different than the dataset sent to the viewer.
The \code{output} parameter allows you to choose which datasets to return.
There are three choices:
"out", "report", and "none".  The "out" keyword returns the default output
dataset.  The "report" keyword returns the dataset(s) sent to the viewer. You
may also pass "none" if you don't want any datasets returned from the function.

In addition, the output dataset produced by the "out" keyword can be shaped
in different ways. These shaping options allow you to decide whether the
data should be returned long and skinny, or short and wide. The shaping
options can reduce the amount of data manipulation necessary to get the
frequencies into the desired form. The
shaping options are as follows:
\itemize{
\item{\strong{long}: Transposes the output datasets
so that statistics are in rows and frequency categories are in columns.
}
\item{\strong{stacked}: Requests that output datasets
be returned in "stacked" form, such that both statistics and frequency
categories are in rows.
}
\item{\strong{wide}: Requests that output datasets
be returned in "wide" form, such that statistics are across the top in
columns, and frequency categories are in rows. This shaping option
is the default.
}
}
}

\examples{
library(procs)

# Turn off printing for CRAN checks
options("procs.print" = FALSE)

# Create sample data
df <- as.data.frame(HairEyeColor, stringsAsFactors = FALSE)

# Assign labels
labels(df) <- list(Hair = "Hair Color",
                   Eye = "Eye Color",
                   Sex = "Sex at Birth")

# Example #1: One way frequencies on Hair and Eye color with weight option.
res <- proc_freq(df,
                 tables = v(Hair, Eye),
                 options = outcum,
                 weight = Freq)

# View result data
res
# $Hair
#    VAR   CAT   N CNT      PCT CUMSUM    CUMPCT
# 1 Hair Black 592 108 18.24324    108  18.24324
# 2 Hair Blond 592 127 21.45270    235  39.69595
# 3 Hair Brown 592 286 48.31081    521  88.00676
# 4 Hair   Red 592  71 11.99324    592 100.00000
#
# $Eye
#   VAR   CAT   N CNT      PCT CUMSUM    CUMPCT
# 1 Eye  Blue 592 215 36.31757    215  36.31757
# 2 Eye Brown 592 220 37.16216    435  73.47973
# 3 Eye Green 592  64 10.81081    499  84.29054
# 4 Eye Hazel 592  93 15.70946    592 100.00000

# Example #2: 2 x 2 Crosstabulation table with Chi-Square statistic
res <- proc_freq(df, tables = Hair * Eye,
                     weight = Freq,
                     options = v(crosstab, chisq))

# View result data
res
#$`Hair * Eye`
#   Category Statistic       Blue      Brown      Green     Hazel     Total
#1     Black Frequency  20.000000  68.000000  5.0000000 15.000000 108.00000
#2     Black   Percent   3.378378  11.486486  0.8445946  2.533784  18.24324
#3     Black   Row Pct  18.518519  62.962963  4.6296296 13.888889        NA
#4     Black   Col Pct   9.302326  30.909091  7.8125000 16.129032        NA
#5     Blond Frequency  94.000000   7.000000 16.0000000 10.000000 127.00000
#6     Blond   Percent  15.878378   1.182432  2.7027027  1.689189  21.45270
#7     Blond   Row Pct  74.015748   5.511811 12.5984252  7.874016        NA
#8     Blond   Col Pct  43.720930   3.181818 25.0000000 10.752688        NA
#9     Brown Frequency  84.000000 119.000000 29.0000000 54.000000 286.00000
#10    Brown   Percent  14.189189  20.101351  4.8986486  9.121622  48.31081
#11    Brown   Row Pct  29.370629  41.608392 10.1398601 18.881119        NA
#12    Brown   Col Pct  39.069767  54.090909 45.3125000 58.064516        NA
#13      Red Frequency  17.000000  26.000000 14.0000000 14.000000  71.00000
#14      Red   Percent   2.871622   4.391892  2.3648649  2.364865  11.99324
#15      Red   Row Pct  23.943662  36.619718 19.7183099 19.718310        NA
#16      Red   Col Pct   7.906977  11.818182 21.8750000 15.053763        NA
#17    Total Frequency 215.000000 220.000000 64.0000000 93.000000 592.00000
#18    Total   Percent  36.317568  37.162162 10.8108108 15.709459 100.00000

# $`chisq:Hair * Eye`
#      CHISQ CHISQ.DF      CHISQ.P
# 1 138.2898        9 2.325287e-25

#' # Example #3: By variable with named table request
res <- proc_freq(df, tables = v(Hair, Eye, Cross = Hair * Eye),
                 by = Sex,
                 weight = Freq)

# View result data
res
# $Hair
#       BY  VAR   CAT   N CNT      PCT
# 1 Female Hair Black 313  52 16.61342
# 2 Female Hair Blond 313  81 25.87859
# 3 Female Hair Brown 313 143 45.68690
# 4 Female Hair   Red 313  37 11.82109
# 5   Male Hair Black 279  56 20.07168
# 6   Male Hair Blond 279  46 16.48746
# 7   Male Hair Brown 279 143 51.25448
# 8   Male Hair   Red 279  34 12.18638
#
# $Eye
#       BY VAR   CAT   N CNT       PCT
# 1 Female Eye  Blue 313 114 36.421725
# 2 Female Eye Brown 313 122 38.977636
# 3 Female Eye Green 313  31  9.904153
# 4 Female Eye Hazel 313  46 14.696486
# 5   Male Eye  Blue 279 101 36.200717
# 6   Male Eye Brown 279  98 35.125448
# 7   Male Eye Green 279  33 11.827957
# 8   Male Eye Hazel 279  47 16.845878
#
# $Cross
#        BY VAR1 VAR2  CAT1  CAT2   N CNT        PCT
# 1  Female Hair  Eye Black  Blue 313   9  2.8753994
# 2  Female Hair  Eye Black Brown 313  36 11.5015974
# 3  Female Hair  Eye Black Green 313   2  0.6389776
# 4  Female Hair  Eye Black Hazel 313   5  1.5974441
# 5  Female Hair  Eye Blond  Blue 313  64 20.4472843
# 6  Female Hair  Eye Blond Brown 313   4  1.2779553
# 7  Female Hair  Eye Blond Green 313   8  2.5559105
# 8  Female Hair  Eye Blond Hazel 313   5  1.5974441
# 9  Female Hair  Eye Brown  Blue 313  34 10.8626198
# 10 Female Hair  Eye Brown Brown 313  66 21.0862620
# 11 Female Hair  Eye Brown Green 313  14  4.4728435
# 12 Female Hair  Eye Brown Hazel 313  29  9.2651757
# 13 Female Hair  Eye   Red  Blue 313   7  2.2364217
# 14 Female Hair  Eye   Red Brown 313  16  5.1118211
# 15 Female Hair  Eye   Red Green 313   7  2.2364217
# 16 Female Hair  Eye   Red Hazel 313   7  2.2364217
# 17   Male Hair  Eye Black  Blue 279  11  3.9426523
# 18   Male Hair  Eye Black Brown 279  32 11.4695341
# 19   Male Hair  Eye Black Green 279   3  1.0752688
# 20   Male Hair  Eye Black Hazel 279  10  3.5842294
# 21   Male Hair  Eye Blond  Blue 279  30 10.7526882
# 22   Male Hair  Eye Blond Brown 279   3  1.0752688
# 23   Male Hair  Eye Blond Green 279   8  2.8673835
# 24   Male Hair  Eye Blond Hazel 279   5  1.7921147
# 25   Male Hair  Eye Brown  Blue 279  50 17.9211470
# 26   Male Hair  Eye Brown Brown 279  53 18.9964158
# 27   Male Hair  Eye Brown Green 279  15  5.3763441
# 28   Male Hair  Eye Brown Hazel 279  25  8.9605735
# 29   Male Hair  Eye   Red  Blue 279  10  3.5842294
# 30   Male Hair  Eye   Red Brown 279  10  3.5842294
# 31   Male Hair  Eye   Red Green 279   7  2.5089606
# 32   Male Hair  Eye   Red Hazel 279   7  2.5089606
}
\seealso{
For summary statistics, see \code{\link{proc_means}}.  To pivot
or transpose the data coming from \code{proc_freq},
see \code{\link{proc_transpose}}.
}
