% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/multivar_match.R
\name{multivar_match}
\alias{multivar_match}
\title{Matching by computing multivar_scores based on several variables}
\usage{
multivar_match(
  data1,
  data2,
  by = NULL,
  by.x = NULL,
  by.y = NULL,
  unique_key_1,
  unique_key_2,
  logit = NULL,
  missing = FALSE,
  wgts = NULL,
  compare_type = "diff",
  blocks = NULL,
  blocks.x = NULL,
  blocks.y = NULL,
  nthread = 1,
  top = 1,
  threshold = NULL,
  suffixes = c("_1", "_2")
)
}
\arguments{
\item{data1}{data.frame. First to-merge dataset.}

\item{data2}{data.frame. Second to-merge dataset.}

\item{by}{character string. Variables to merge on (common across data 1 and data 2). See \code{merge}}

\item{by.x}{character string. Variable to merge on in data1. See \code{merge}}

\item{by.y}{character string. Variable to merge on in data2. See \code{merge}}

\item{unique_key_1}{character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields)}

\item{unique_key_2}{character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields)}

\item{logit}{a glm or lm model as a result from a logit regression on a verified dataset. See details.}

\item{missing}{boolean T/F, whether or not to treat missing (NA) observations as its own binary column for each column in by. See details.}

\item{wgts}{rather than a lm model, you can supply weights to calculate multivar_score. Can be weights from \code{calculate_weights}.}

\item{compare_type}{a vector with the same length as "by" that describes how to compare the variables. Options are "in", "indicator", "substr", "difference", "ratio", and "stringdist". See X for details.}

\item{blocks}{variable present in both data sets to "block" on before computing scores. multivar_scores will only be computed for observations that share a block. See details.}

\item{blocks.x}{name of blocking variables in x. cannot supply both blocks and blocks.x}

\item{blocks.y}{name of blocking variables in y. cannot supply both blocks and blocks.y}

\item{nthread}{integer. Number of cores to use when computing all combinations. See \code{parallel::makecluster()}}

\item{top}{integer. Number of matches to return for each observation.}

\item{threshold}{numeric. Minimum score for a match to be included in the result.}

\item{suffixes}{see \code{merge}}
}
\value{
a data.table, the resultant match, including columns from both data sets.
}
\description{
\code{multivar_match} computes a multivar_score between each pair of observations between
datasets x and y using several variables, then executes a merge by picking the
highest multivar_score pair for each observation in x.
}
\details{
The best way to understand this function is to see the vignette 'Multivar_matching'.

There are two ways of performing this match: either with or without a pre-trained logit.
To use a logit, you must have a verified set of matches. The names of the variables
in this set must match the names of the variables in the data you pass into \code{multivar_match}.
Without a pre-trained logit, you must have a set of weights for each variable that you
want in the comparison. These can either be made up ahead of time, or you can
use a verified set of matches and \code{calculate_weights}.
}
