% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pulver.R
\name{pulverize}
\alias{pulverize}
\title{Computes p-values for interaction terms from linear regressions}
\usage{
pulverize(ymat, xmat, zmat, output_file = NULL, colnames = c("y", "x", "z"),
  pvalue_threshold = NULL, cores = 1L, overwrite = FALSE,
  suppress_return = FALSE, .fp = NULL)
}
\arguments{
\item{ymat, xmat, zmat}{Matrices of type "double", number of rows for all
matrices are the number of observations and must have the same size}

\item{output_file}{Output file if \code{NULL} no output file will be created
(default: \code{NULL})}

\item{colnames}{Column names for result table (default: "y", "x", and "z")}

\item{pvalue_threshold}{Report only p-values below threshold
(default:  \code{NULL}, i.e., report all p-values)}

\item{cores}{Number of cores to use for parallelization (default:
1)}

\item{overwrite}{If \code{TRUE} overwrite \code{output_file} (default:
\code{FALSE})}

\item{suppress_return}{If \code{TRUE} return \code{NULL} instead of a data
frame with p-values for the interaction term (default: \code{FALSE})}

\item{.fp}{File pointer, only for internal use. Necessary for the function
\code{\link{pulverize_all}} to join results from different input files
to the same output file (dafault: \code{NULL})}
}
\value{
If \code{suppress_return} is \code{FALSE}, a data frame with
    columns "y", "x", "z", and "pvalue" containing the p-values for
    the interaction term \eqn{xz} of the above linear model is returned.
    Otherwise \code{NULL} is returned.
}
\description{
Given matrices \code{ymat}, \code{xmat}, and \code{zmat},
\code{pulverize} evaluates every linear regression
\deqn{y = \beta_0 + \beta_1 x + \beta_2 z + \beta_3 xz + \epsilon}{y = b0 + b1 x + b2 z + b3 xz}
where \eqn{y}, \eqn{x}, and \eqn{z} are columns of \code{ymat},
\code{xmat}, and \code{zmat}, respectively, and returns the p-value
for the null hypothesis \eqn{\beta_3=0}{b3=0}.  For example, if
\code{ymat}, \code{xmat}, and \code{zmat} have 200, 300, and 400
columns, then \code{pulverize} evaluates 24 million models
(\eqn{200\times 300\times 400}{200 * 300 * 400}) and returns the
p-value for the interaction term for each model.
}
\details{
For reasons of computational efficiency, \code{pulverize} returns
only p-values and only for the interaction term \eqn{xz}.
Fast run time is achieved by using the correlation coefficient between
the outcome variable \eqn{y} and the interaction term \eqn{xz} to test
the null-hypothesis, which avoids the costly computation of inversions.
Additional employed time-saving operations are a rearrangement of the order when iterating through
the different matrices, and implementing the core algorithm in the fast
programming language C++.

Once interesting models are identified based on the resulting p-values, the number of
models will be greatly reduced.  At this point additional model
characteristics, e.g. effect estimates and standard errors, can be
obtained via traditional methods such as R's
\code{\link[stats]{lm}} function.

Matrices \code{ymat}, \code{xmat}, and \code{zmat} must
    have column names. Missing values are imputed using their
    column means. If \code{pvalue_threshold} is supplied, only
    p-values (p < \code{pvalue_threshold}) strictly below the
    threshold are included in the returned data frame and saved
    in \code{output_file}.

    In cases where the resulting data frame would be too large to fit
    in memory, it is possible to write the results to
    \code{output_file} without returning a data frame by setting
    \code{suppress_return} to \code{TRUE}.

    The column names of the results table are by default set to "y", "x", and
    "z", but can be changed using the \code{colnames} argument.

    An error will be signaled if \code{output_file} already exists.
    Setting \code{overwrite} to \code{TRUE} will silently overwrite the
    file.

    The computation can be parallelized by specifying a number of
    \code{cores} greater than 1.  By default only a single CPU is
    used.  Note that parallelization is only supported in
    environments with C/C++ compilers that support OpenMP.
}
\examples{
nobs <- 100
y <- matrix(rnorm(nobs * 2), ncol = 2, dimnames = list(paste0("row", 1:nobs),
paste0("column", 1:2)))
x <- matrix(rnorm(nobs * 3), ncol = 3, dimnames = list(paste0("row", 1:nobs),
paste0("column", 1:3)))
z <- matrix(rnorm(nobs * 4), ncol = 4, dimnames = list(paste0("row", 1:nobs),
paste0("column", 1:4)))
pulverize(y, x, z)

}
\references{
Shabalin, Andrey A (2012) Bioinformatics: Matrix eQTL:
    ultra fast eQTL analysis via large matrix operations, Oxford
    Univ Press, *28*, 1353-1358
}
