% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hbl_data.R
\name{hbl_data}
\alias{hbl_data}
\title{Standardize data}
\usage{
hbl_data(
  data,
  response,
  study,
  study_reference,
  group,
  group_reference,
  patient,
  rep,
  rep_reference,
  covariates
)
}
\arguments{
\item{data}{A tidy data frame or \code{tibble} with the data.}

\item{response}{Character of length 1,
name of the column in \code{data} with the response/outcome variable.
\code{data[[response]]} must be a continuous variable,
and it \emph{should} be the change from baseline of a
clinical endpoint of interest, as opposed to just
the raw response. Treatment differences
are computed directly from this scale, please supply
change from baseline unless you are absolutely certain
that treatment differences computed directly from
this quantity are clinically meaningful.}

\item{study}{Character of length 1,
name of the column in \code{data} with the study ID.}

\item{study_reference}{Atomic of length 1,
element of the \code{study} column that indicates
the current study.
(The other studies are historical studies.)}

\item{group}{Character of length 1,
name of the column in \code{data} with the group ID.}

\item{group_reference}{Atomic of length 1,
element of the \code{group} column that indicates
the control group.
(The other groups may be treatment groups.)}

\item{patient}{Character of length 1,
name of the column in \code{data} with the patient ID.}

\item{rep}{Character of length 1,
name of the column in \code{data} with the rep ID.}

\item{rep_reference}{Atomic of length 1,
element of the \code{rep} column that indicates
baseline, i.e. the first rep chronologically.
(The other reps may be post-baseline study visits or time points.)}

\item{covariates}{Character vector of column names
in \code{data} with the columns with baseline covariates.
These can be continuous, categorical, or binary.
Regardless, \code{historicalborrowlong} derives the appropriate
model matrix.

Each baseline covariate column must truly be a \emph{baseline} covariate:
elements must be equal for all time points within each patient
(after the steps in the "Data processing" section).
In other words, covariates must not be time-varying.

A large number of covariates, or a large number of levels in a
categorical covariate, can severely slow down the computation.
Please consider carefully if you really need to include
such complicated baseline covariates.}
}
\value{
A standardized tidy data frame with one row per patient
and the following columns:
\itemize{
\item \code{response}: continuous response/outcome variable. (Should be
change from baseline of an outcome of interest.)
\item \code{study_label}: human-readable label of the study.
\item \code{study}: integer study index with the max index equal to the
current study (at \code{study_reference}).
\item \code{group_label}: human-readable group label (e.g. treatment arm name).
\item \code{group}: integer group index with an index of 1 equal to the control
group (at \code{group_reference}).
\item \code{patient_label}: original patient ID.
\item \code{patient}: integer patient index.
\item \code{rep_label}: original rep ID (e.g. time point or patient visit).
\item \code{rep}: integer rep index.
\item \verb{covariate_*}: baseline covariate columns.
}
}
\description{
Standardize a tidy input dataset.
}
\details{
Users do not normally need to call this function.
It mainly serves exposes the indexing behavior of
studies and group levels to aid in interpreting
summary tables.
}
\section{Data processing}{

Before running the MCMC, dataset is pre-processed.
This includes expanding the rows of the data so every rep
of every patient gets an explicit row. So if your
original data has irregular rep IDs, e.g. unscheduled
visits in a clinical trial that few patients attend,
please remove them before the analysis. Only the most
common rep IDs should be added.

After expanding the rows, the function fills in missing
values for every column except the response. That includes
covariates. Missing covariate values are filled in,
first with last observation carried forward,
then with last observation carried backward.
If there are still missing values after this process,
the program throws an informative error.
}

\examples{
set.seed(0)
data <- hbl_sim_independent(n_continuous = 1, n_study = 2)$data
data <- dplyr::select(
  data,
  study,
  group,
  rep,
  patient,
  response,
  tidyselect::everything()
)
data <- dplyr::rename(
  data,
  change = response,
  trial = study,
  arm = group,
  subject = patient,
  visit = rep,
  cov1 = covariate_study1_continuous1,
  cov2 = covariate_study2_continuous1
)
data$trial <- paste0("trial", data$trial)
data$arm <- paste0("arm", data$arm)
data$subject <- paste0("subject", data$subject)
data$visit <- paste0("visit", data$visit)
hbl_data(
  data = data,
  response = "change",
  study = "trial",
  study_reference = "trial1",
  group = "arm",
  group_reference = "arm1",
  patient = "subject",
  rep = "visit",
  rep_reference = "visit1",
  covariates = c("cov1", "cov2")
)
}
\seealso{
Other data: 
\code{\link{hbl_s_tau}()}
}
\concept{data}
