% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/individualQC.R
\name{check_het_and_miss}
\alias{check_het_and_miss}
\title{Identification of individuals with outlying missing genotype or
heterozygosity rates}
\usage{
check_het_and_miss(indir, name, qcdir = indir, imissTh = 0.03,
  hetTh = 3, run.check_het_and_miss = TRUE, interactive = FALSE,
  verbose = FALSE, path2plink = NULL, showPlinkOutput = TRUE)
}
\arguments{
\item{indir}{[character] /path/to/directory containing the basic PLINK data
files name.bim, name.bed, name.fam files.}

\item{name}{[character] Prefix of PLINK files, i.e. name.bed, name.bim,
name.fam, name.het and name.imiss.}

\item{qcdir}{[character] /path/to/directory where name.het as returned by
plink --het and name.imiss as returned by plink --missing will be saved. Per
default qcdir=indir. If run.check_het_and_miss is FALSE, it is assumed that
plink --missing and plink --het have been run and qcdir/name.imiss and
qcdir/name.het are present. User needs writing permission to qcdir.}

\item{imissTh}{[double] Threshold for acceptable missing genotype rate per
individual; has to be proportion between (0,1)}

\item{hetTh}{[double] Threshold for acceptable deviation from mean
heterozygosity per individual. Expressed as multiples of standard
deviation of heterozygosity (het), i.e. individuals outside mean(het) +/-
hetTh*sd(het) will be returned as failing heterozygosity check; has to be
larger than 0.}

\item{run.check_het_and_miss}{[logical] Should plink --missing and plink
--het be run to determine genotype missingness and heterozygosity rates; if
FALSE, it is assumed that plink --missing and plink --het have been run and
qcdir/name.imiss and qcdir/name.het are present;
\code{\link{check_het_and_miss}} will fail with missing file error otherwise.}

\item{interactive}{[logical] Should plots be shown interactively? When
choosing this option, make sure you have X-forwarding/graphical interface
available for interactive plotting. Alternatively, set interactive=FALSE and
save the returned plot object (p_het_imiss) via ggplot2::ggsave(p=p_het_imiss
, other_arguments) or pdf(outfile) print(p_het_imiss) dev.off().}

\item{verbose}{[logical] If TRUE, progress info is printed to standard out.}

\item{path2plink}{[character] Absolute path to directory where external plink
software \url{https://www.cog-genomics.org/plink/1.9/} can be found, i.e.
plink should be accesible as path2plink/plink -h. If not
provided, assumed that PATH set-up works and plink will be found by
system("plink").}

\item{showPlinkOutput}{[logical] If TRUE, plink log and error messages are
printed to standard out.}
}
\value{
Named [list] with i) fail_imiss [data.frame] containing FID (Family
ID), IID (Within-family ID), MISS_PHENO (Phenotype missing? (Y/N)), N_MISS
(Number of missing genotype call(s), not including obligatory missings),
N_GENO (Number of potentially valid call(s)), F_MISS (Missing call rate) of
individuals failing missing genotype check and ii) fail_het [data.frame]
containing FID (Family ID), IID (Within-family ID), O(HOM) (Observed number
of homozygotes), E(HOM) (Expected number of homozygotes), N(NM) (Number of
non-missing autosomal genotypes), F (Method-of-moments F coefficient
estimate) of individuals failing  outlying heterozygosity check and iii)
p_het_imiss, a ggplot2-object 'containing' a scatter plot with the samples'
missingness rates on x-axis and their heterozygosity rates on the y-axis,
which can be shown by print(p_het_imiss).
}
\description{
Runs and evaluates results from plink --missing (missing genotype rates
per individual) and plink --het (heterozygosity rates per individual).
Non-systematic failures in genotyping and outlying heterozygosity (hz) rates
per individual are often proxies for DNA sample quality. Larger than expected
heterozygosity can indicate possible DNA contamination.
The mean heterozygosity in PLINK is computed as hz_mean = (N-O)/N, where
N: number of non-missing genotypes and O:observed number of homozygous
genotypes for a given individual.
Mean heterozygosity can differ between populations and SNP genotyping panels.
Within a population and genotyping panel, a reduced heterozygosity rate can
indicate inbreeding - these individuals will then likely be returned by
\code{\link{check_relatedness}} as individuals that fail the relatedness
filters. \code{check_het_and_miss} creates a scatter plot with the
individuals' missingness rates on x-axis and their heterozygosity rates on
the y-axis.
}
\details{
\code{\link{check_het_and_miss}} wraps around
\code{\link{run_check_missingness}},
\code{\link{run_check_heterozygosity}} and
\code{\link{evaluate_check_het_and_miss}}.
If run.check_het_and_miss is TRUE, \code{\link{run_check_heterozygosity}} and
\code{\link{run_check_missingness}} are excuted; otherwise it is assumed
that plink --missing and plink --het have been run externally and
qcdir/name.het and qcdir/name.imiss exist.  \code{\link{check_het_and_miss}}
will fail with missing file error otherwise.

For details on the output data.frame fail_imiss and fail_het, check the
original description on the PLINK output format page:
\url{https://www.cog-genomics.org/plink/1.9/formats#imiss} and
\url{https://www.cog-genomics.org/plink/1.9/formats#het}
}
\examples{
package.dir <- find.package('plinkQC')
qcdir <- file.path(package.dir, 'extdata')
name <- "data"
fail_het_miss <- check_het_and_miss(indir=qcdir, name=name,
run.check_het_and_miss=FALSE, interactive=FALSE)
}
