% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/coding.R
\name{coding}
\alias{coding}
\title{Coding Categorical Variables}
\usage{
coding(..., data = NULL,
       type = c("dummy", "simple", "effect", "weffect", "repeat",
                "fhelm", "rhelm", "poly"), base = NULL,
       name = c("dum.", "sim.", "eff.", "weff.", "rep.", "fhelm.", "rhelm.", "poly."),
       append = TRUE, as.na = NULL, check = TRUE)
}
\arguments{
\item{...}{a numeric vector with integer values, character vector or factor
Alternatively, an expression indicating the variable name in
\code{data}. Note that the function can only deal with one
categorical variable.}

\item{data}{a data frame when specifying a variable in the argument \code{...}.
Note that the argument is \code{NULL} when specifying a numeric
vector with integer values, character vector or factor numeric
vector for the argument \code{...}.}

\item{type}{a character string indicating the type of coding, i.e.,
\code{dummy} (default) for dummy coding, \code{simple} for
simple coding, \code{effect} for unweighted effect coding,
\code{weffect} for weighted effect coding, \code{repeat}
for repeated coding, \code{fhelm} for forward Helmert coding,
\code{rhelm} for reverse Helmert coding, and \code{poly} for
orthogonal polynomial coding (see 'Details').}

\item{base}{a numeric value or character string indicating the baseline
group for dummy and simple coding and the omitted group in
effect coding. By default, the first group or factor level is
selected as baseline or omitted group.}

\item{name}{a character string or character vector indicating the names
of the coded variables. By default, variables are named
\code{"dum."}, \code{"sim."}, \code{"eff."}, \code{"weff."},
\code{"rep."}, \code{"fhelm."}, \code{"rhelm."},or \code{"poly."}
depending on the \code{type} of coding with the category used
in the comparison (e.g., \code{"dum.2"} and \code{"dum.3"}).
Variable names can be specified using a character string (e.g.,
\code{name = "dummy_"} leads to \code{dummy_2} and \code{dummy_3})
or a character vector matching the number of coded variables
(e.g. \code{name = c("x1_2", "x1_3")})  which is the number of
unique categories minus one.}

\item{append}{logical: if \code{TRUE} (default), coded variables are appended
to the data frame specified in the argument \code{data}.}

\item{as.na}{a numeric vector indicating user-defined missing values,
i.e. these values are converted to \code{NA} before conducting
the analysis.}

\item{check}{logical: if \code{TRUE} (default), argument specification is checked.}
}
\value{
Returns a data frame with \eqn{k - 1} coded variables or a data frame with the
same length or same number of rows as \code{...} containing the coded variables.
}
\description{
This function creates \eqn{k - 1} variables for a categorical variable with
\eqn{k} distinct levels. The coding system available in this function are
dummy coding, simple coding, unweighted effect coding, weighted effect coding,
repeated coding, forward Helmert coding, reverse Helmert coding, and orthogonal
polynomial coding.
}
\details{
\describe{
\item{\strong{Dummy Coding}}{Dummy or treatment coding compares the mean of
each level of the categorical variable to the mean of a baseline group. By
default, the first group or factor level is selected as baseline group. The
intercept in the regression model represents the mean of the baseline group.
For example, dummy coding based on a  categorical variable with four groups
\code{A}, \code{B}, \code{C}, \code{D} makes following comparisons:
\code{B vs A}, \code{C vs A}, and \code{D vs A} with \code{A} being the
baseline group.}
\item{\strong{Simple Coding}}{Simple coding compares each level of the
categorical variable to the mean of a baseline level. By default, the first
group or factor level is selected as baseline group. The intercept in the
regression model represents the unweighted grand mean, i.e., mean of group
means. For example, simple coding based on a  categorical variable with four
groups \code{A}, \code{B}, \code{C}, \code{D} makes following comparisons:
\code{B vs A}, \code{C vs A}, and \code{D vs A} with \code{A} being the
baseline group.}
\item{\strong{Unweighted Effect Coding}}{Unweighted effect or sum coding
compares the mean of a given level to the unweighed grand mean, i.e., mean of
group means. By default, the first group or factor level is selected as
omitted group. For example, effect coding based on a  categorical variable
with four groups \code{A}, \code{B}, \code{C}, \code{D} makes following
comparisons: \code{B vs (A, B, C, D)}, \code{C vs (A, B, C, D)}, and
\code{D vs (A, B, C, D)} with \code{A} being the omitted group.}
\item{\strong{Weighted Effect Coding}}{Weighted effect or sum coding compares
the mean of a given level to the weighed grand mean, i.e., sample mean. By
default, the first group or factor level is selected as omitted group. For
example, effect coding based on a categorical variable with four groups
\code{A}, \code{B}, \code{C}, \code{D} makes following comparisons:
\code{B vs (A, B, C, D)}, \code{C vs (A, B, C, D)}, and \code{D vs (A, B, C, D)}
with \code{A} being the omitted group.}
\item{\strong{Repeated Coding}}{Repeated or difference coding compares the
mean of each level of the categorical variable to the mean of the previous
adjacent level. For example, repeated coding based on a  categorical variable
with four groups \code{A}, \code{B}, \code{C}, \code{D} makes following
comparisons: \code{B vs A}, \code{C vs B}, and \code{D vs C}.}
\item{\strong{Foward Helmert Coding}}{Forward Helmert coding compares the
mean of each level of the categorical variable to the unweighted mean of all
subsequent level(s) of the categorical variable. For example, forward Helmert
coding based on a  categorical variable with four groups \code{A}, \code{B},
\code{C}, \code{D} makes following comparisons: \code{(B, C, D) vs A},
\code{(C, D) vs B}, and \code{D vs C}.}
\item{\strong{Reverse Helmert Coding}}{Reverse Helmert coding compares the
mean of each level of the categorical variable to the unweighted mean of all
prior level(s) of the categorical variable. For example, reverse Helmert
coding based on a  categorical variable with four groups \code{A}, \code{B},
\code{C}, \code{D} makes following comparisons: \code{B vs A}, \code{C vs (A, B)},
and \code{D vs (A, B, C)}.}
\item{\strong{Orthogonal Polynomial Coding}}{Orthogonal polynomial coding is
a form of trend analysis based on polynomials of order \eqn{k - 1}, where
\eqn{k} is the number of levels of the categorical variable. This coding
scheme assumes an ordered-categorical variable with equally spaced levels.
For example, orthogonal polynomial coding based on a categorical variable with
four groups \code{A}, \code{B}, \code{C}, \code{D} investigates a linear,
quadratic, and cubic trends in the categorical variable.}
}
}
\note{
This function uses the \code{contr.treatment} function from the \pkg{stats}
package for dummy coding and simple coding, a modified copy of the
\code{contr.sum} function from the \pkg{stats} package for effect coding,
a modified copy of the \code{contr.wec} function from the \pkg{wec} package
for weighted effect coding, a modified copy of the \code{contr.sdif}
function from the \pkg{MASS} package for repeated coding, a modified copy
of the \code{code_helmert_forward} function from the \pkg{codingMatrices}
for forward Helmert coding, a modified copy of the \code{contr_code_helmert}
function from the \pkg{faux} package for reverse Helmert coding, and the
\code{contr.poly} function from the \pkg{stats} package for orthogonal
polynomial coding.
}
\examples{
# Example 1a: Dummy coding for 'gear', baseline group = 3
coding(gear, data = mtcars)

# Example 1b: Alterantive specification without using the 'data' argument
coding(mtcars$gear)

# Example 2: Dummy coding for 'gear', baseline group = 4
coding(gear, data = mtcars, base = 4)

# Example 3: Effect coding for 'gear', omitted group = 3
coding(gear, data = mtcars, type = "effect")

# Example 3: Effect coding for 'gear', omitted group = 4
coding(gear, data = mtcars, type = "effect", base = 4)

# Example 4a: Dummy-coded variable names with prefix "gear3."
coding(gear, data = mtcars, name = "gear3.")

# Example 4b: Dummy-coded variables named "gear_4vs3" and "gear_5vs3"
coding(gear, data = mtcars, name = c("gear_4vs3", "gear_5vs3"))
}
\seealso{
\code{\link{rec}}, \code{\link{item.reverse}}
}
\author{
Takuya Yanagida \email{takuya.yanagida@univie.ac.at}
}
