% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nbc4va_help.R
\name{nbc4vaHelpData}
\alias{nbc4vaHelpData}
\title{Training and testing specifications in nbc4va}
\usage{
nbc4vaHelpData()
}
\description{
Training and testing specifications in nbc4va
}
\section{About}{

This documentation page provides details on the training and testing data formats to be used as inputs in the nbc4va package.
}

\section{Training and Testing Data}{

The training data (consisting of cases, causes of death for each case, and symptoms) is used as input for the Naive Bayes Classifier (NBC) algorithm to learn the probabilities for
each cause of death to produce a NBC model. \cr \cr
This model can be evaluated for its performance by predicting on the testing data cases, where the predicted causes of death
are compared to the causes of death in the testing data. \cr \cr
The process of learning the probabilities to produce the NBC model is known as training, and the process of evaluating the predictive performance of the trained model is known as testing. \cr \cr
\strong{Key points}:
\itemize{
  \item The training data is used to build the NBC model
  \item The testing data is used to evaluate the NBC model's predictive performance
  \item Ideally, the testing data should not have the same cases in the training data
  \item Both the training and testing data must have the same symptoms
}
}

\section{Format}{

The format of the training and testing data is structured as a table, where each column holds a variable and each row
holds a death case. \cr \cr
The following format must be met in order to be used with the nbc4va package:
\itemize{
  \item \strong{Columns (in order)}: ID, Cause, Symptoms1..N
  \item \strong{ID}: column of case identifiers formatted as text
  \item \strong{Cause}: column of known causes of death formatted as text
  \item \strong{Symptoms1..N}: N number of columns representing symptoms with each column containing 1 for presence of the symptom, 0 for absence of the symptom, any other values are treated as unknown
  \item If the testing causes are not known, the second column (\emph{Cause}) can be omitted
  \item Unknown symptoms are imputed randomly from the distribution of known 1s and 0s; a symptom column will be removed from training and testing if 1s or 0s do not exist
  \item Both the training and testing data must be consistent with each other (same symptoms in order) to be meaningful
}
}

\section{Examples}{

The image below shows an example of the training data. \cr \cr
\figure{nbcdatatrainex.png} \cr

The image below shows an example of the corresponding testing data. \cr \cr
\figure{nbcdatatestex.png}{options: width=400px;} \cr

The image below shows an example of the corresponding testing data without any causes. \cr \cr
\figure{nbcdatatestncex.png}
}

\section{Symptom Imputation Example}{

Given a symptom column containing the values of each case (1, 0, 0, 1, 99, 99):
\itemize{
  \item 1 represents presence of the symptom
  \item 0 represents absence of the symptom
  \item 99 is treated as unknown as to whether the symptom is present or absent
}
The imputation is applied as follows:
\enumerate{
  \item The unknown values (99, 99) are randomly imputed according to the known values (1, 0, 0, 1).
  \item The known values contain half (2/4) the values as 1s and half (2/4) the values as 0s.
  \item Thus, the imputation results in half (1/2) the unknown values as 1s and half (1/2) of the unknown values as 0s to match the known values distribution.
  \item The possible combinations for replacing the unknown values (99, 99) are then (1, 0) and (0, 1).
}
The symptom imputation method preserves the approximate distribution of the known values in an attempt to avoid dropping entire cases or symptoms.
}

\section{Sample Code}{

Run the following code using \code{\link{nbc4vaData}} in the R console to view the example data included in the nbc4va package.
\preformatted{
--------------------------------------------------------------------------------------------------

    library(nbc4va)  # load the nbc4va package
    data(nbc4vaData)  # load the example data
    View(nbc4vaData)  # view the sample data in the nbc4va package
    data(nbc4vaDataRaw)  # load the example data with unknown symptom values
    View(nbc4vaDataRaw)  # view the sample data with unknown symptom values

--------------------------------------------------------------------------------------------------
}
}
\seealso{
Guide for package: \code{\link{nbc4va}}

Other help functions: \code{\link{nbc4vaHelpAdvanced}},
  \code{\link{nbc4vaHelpBasic}},
  \code{\link{nbc4vaHelpDev}},
  \code{\link{nbc4vaHelpFunctions}},
  \code{\link{nbc4vaHelpMethods}}, \code{\link{nbc4vaHelp}}
}

