% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/validation.R
\name{validation}
\alias{validation}
\title{Perform Q-matrix validation methods}
\usage{
validation(
  Y,
  Q,
  CDM.obj = NULL,
  model = "GDINA",
  method = "GDI",
  search.method = "PAA",
  maxitr = 1,
  iter.level = "test",
  eps = 0.95,
  criter = "PVAF",
  verbose = TRUE
)
}
\arguments{
\item{Y}{A required \code{N} × \code{I} matrix or data.frame consisting of the responses of \code{N} individuals
to \code{I} items. Missing values need to be coded as \code{NA}.}

\item{Q}{A required binary \code{I} × \code{K} containing the attributes not required or required, 0 or 1,
to master the items. The \code{i}th row of the matrix is a binary indicator vector indicating which
attributes are not required (coded by 0) and which attributes are required (coded by 1) to
master item \code{i}.}

\item{CDM.obj}{An object of class \code{CDM.obj}. When it is not NULL, it enables rapid verification
of the Q-matrix without the need for parameter estimation. @seealso \code{\link[Qval]{CDM}}.}

\item{model}{Type of model to fit; can be \code{"GDINA"}, \code{"LCDM"}, \code{"DINA"}, \code{"DINO"}
, \code{"ACDM"}, \code{"LLM"}, or \code{"rRUM"}. Default = \code{"GDINA"}.
@seealso \code{\link[Qval]{CDM}}.}

\item{method}{The methods to validata Q-matrix, can be \code{"GDI"}, \code{"Wald"}, \code{"Hull"}, and
\code{"MLR-B"}. The \code{"model"} must be \code{"GDINA"} when \code{method = "Wald"}.
Default = \code{"GDI"}. See details.}

\item{search.method}{Character string specifying the search method to use during validation.
\describe{
  \item{"SSA"}{for sequential search algorithm (see de la Torre, 2008; Terzi & de la Torre, 2018). This option can be used when the \code{method} is \code{"GDI"} or \code{"MLR-B"}.}
  \item{"ESA"}{for exhaustive search algorithm. This option can be used when the \code{method} is any of \code{"GDI"}, \code{"Wald"}, \code{"Hull"}, or \code{"MLR-B"}.}
  \item{"PAA"}{for priority attribute algorithm.
               This is the default option and can be used when the \code{method} is any of \code{"GDI"}, \code{"Wald"}, \code{"Hull"}, or \code{"MLR-B"}.}
}}

\item{maxitr}{Number of max iterations. Default = \code{1}.}

\item{iter.level}{Can be \code{"item"} level or \code{"test"} level. Default = \code{"test"}. See details.}

\item{eps}{Cut-off points of \eqn{PVAF}, will work when the method is \code{"GDI"} or \code{"Wald"}.
Default = \code{0.95}. See details.}

\item{criter}{The kind of fit-index value, can be \eqn{R^2} for \eqn{R_{McFadden}^2} @seealso \code{\link[Qval]{get.R2}}
or \eqn{PVAF} for the proportion of variance accounted for (\eqn{PVAF}) @seealso \code{\link[Qval]{get.PVAF}}.
Only when \code{method = "Hull"} works and default = \code{"PVAF"}. See details.}

\item{verbose}{Logical indicating to print iterative information or not. Default is \code{TRUE}}
}
\value{
An object of class \code{validation} is a \code{list} containing the following components:
\item{Q.orig}{The original Q-matrix that maybe contains some mis-specifications and need to be validate.}
\item{Q.sug}{The Q-matrix that suggested by certain validation method.}
\item{priority}{An \code{I} × \code{K} matrix that contains the priority of every attribute for
                each item. Only when the \code{search.method} is \code{"PAA"}, the value is availble. See details.}
\item{iter}{The number of iteration.}
\item{time.cost}{The time that CPU cost to finish the function.}
}
\description{
This function uses generalized Q-matrix validation methods to validate the Q-matrix,
including commonly used methods such as GDI (de la Torre, & Chiu, 2016; Najera, Sorrel,
& Abad, 2019; Najera et al., 2020), Wald (Ma, & de la Torre, 2020), Hull (Najera et al.,
2021), and MLR-B (Tu et al., 2022). It supports different iteration methods (test
level or item level; Najera et al., 2020; Najera et al., 2021; Tu et al., 2022) and
can apply various attribute search methods (ESA, SSA, PAA; de la Torre, 2008; Terzi, &
de la Torre, 2018). More see details.
}
\section{The GDI method}{

The GDI method (de la Torre & Chiu, 2016), as the first Q-matrix validation method
applicable to saturated models, serves as an important foundation for various mainstream
Q-matrix validation methods.

The method calculates the proportion of variance accounted for (\eqn{PVAF}; @seealso \code{\link[Qval]{get.PVAF}})
for all possible q-vectors for each item, selects the q-vector with a \eqn{PVAF} just
greater than the cut-off point (or Epsilon, EPS) as the correction result, and the variance
\eqn{\zeta^2} is the generalized discriminating index (GDI; de la Torre & Chiu, 2016).
Therefore, the GDI method is also considered as a generalized extension of the \eqn{delta}
method (de la Torre, 2008), which also takes maximizing discrimination as its basic idea.
In the GDI method, \eqn{\zeta^2} is defined as the weighted variance of the correct
response probabilities across all mastery patterns, that is:
\deqn{
 \zeta^2 =
 \sum_{l=1}^{2^K} \pi_{l} {(P(X_{pi}=1|\mathbf{\alpha}_{l}) - P_{i}^{mean})}^2
}
where \eqn{\pi_{l}} represents the prior probability of mastery pattern \eqn{l};
\eqn{P_{i}^{mean}=\sum_{k=1}{K}\pi_{l}P(X_{pi}=1|\mathbf{\alpha}_{l})} is the weighted
average of the correct response probabilities across all attribute mastery patterns.
When the q-vector is correctly specified, the calculated \eqn{\zeta^2} should be maximized,
indicating the maximum discrimination of the item. However, in reality, \eqn{\zeta^2}
continues to increase when the q-vector is over-specified, and the more attributes that
are over-specified, the larger \eqn{\zeta^2} becomes. The q-vector with all attributes set
to 1 (i.e., \eqn{\mathbf{q}_{1:K}}) has the largest \eqn{\zeta^2} (de la Torre, 2016).
This is because an increase in attributes in the q-vector leads to an increase in item
parameters, resulting in greater differences in correct response probabilities across
attribute patterns and, consequently, increased variance. However, this increase in
variance is spurious. Therefore, de la Torre et al. calculated \eqn{PVAF = \frac{\zeta^2}{\zeta_{1:K}^2}}
to describe the degree to which the discrimination of the current q-vector explains
the maximum discrimination. They selected an appropriate \eqn{PVAF} cut-off point to achieve
a balance between q-vector fit and parsimony. According to previous studies,
the \eqn{PVAF} cut-off point is typically set at 0.95 (Ma & de la Torre, 2020; Najera et al., 2021).
}

\section{The Wald method}{

The Wald method (Ma & de la Torre, 2020) combines the Wald test with \eqn{PVAF} to correct
the Q-matrix at the item level. Its basic logic is as follows: when correcting item \eqn{i},
the single attribute that maximizes the \eqn{PVAF} value is added to a vector with all
attributes set to \eqn{\mathbf{0}} (i.e., \eqn{\mathbf{q} = (0, 0, \ldots, 0)}) as a starting point.
In subsequent iterations, attributes in this vector are continuously added or
removed through the Wald test. The correction process ends when the \eqn{PVAF} exceeds the
cut-off point or when no further attribute changes occur. The Wald statistic follows an
asymptotic \eqn{\chi^{2}} distribution with a degree of freedom of \eqn{2^{K^\ast} - 1}.

The calculation method is as follows:
\deqn{
   Wald = {(\mathbf{R} \times P_{i}(\mathbf{\alpha}))}_{'}
   {(\mathbf{R} \times \mathbf{V}_{i} \times \mathbf{R})}_{-1}
   {(\mathbf{R} \times P_{i}(\mathbf{\alpha}))}
}
\eqn{\mathbf{R}} represents the restriction matrix; \eqn{P_{i}(\mathbf{\alpha})} denotes
the vector of correct response probabilities for item \eqn{i}; \eqn{\mathbf{V}_i} is the
variance-covariance matrix of the correct response probabilities for item \eqn{i}, which
can be obtained by multiplying the \eqn{\mathbf{M}_i} matrix (de la Torre, 2011) with the
variance-covariance matrix of item parameters \eqn{\mathbf{\Sigma}_i}, i.e.,
\eqn{\mathbf{V}_i = \mathbf{M}_i \times \mathbf{\Sigma}_i}. The \eqn{\mathbf{\Sigma}_i} can be
derived by inverting the information matrix. Using the the empirical cross-product information
matrix (de la Torre, 2011) to calculate \eqn{\mathbf{\Sigma}_i}.

\eqn{\mathbf{M}_i} is a \eqn{2^{K^\ast} × 2^{K^\ast}} matrix that represents the relationship between
the parameters of item \eqn{i} and the attribute mastery patterns. The rows represent different mastery
patterns, while the columns represent different item parameters.
}

\section{The Hull method}{

The Hull method (Najera et al., 2021) addresses the issue of the cut-off point in the GDI
method and demonstrates good performance in simulation studies. Najera et al. applied the
Hull method for determining the number of factors to retain in exploratory factor analysis
(Lorenzo-Seva et al., 2011) to the retention of attribute quantities in the q-vector, specifically
for Q-matrix validation. The Hull method aligns with the GDI approach in its philosophy
of seeking a balance between fit and parsimony. While GDI relies on a preset, arbitrary
cut-off point to determine this balance, the Hull method utilizes the most pronounced elbow
in the Hull plot to make this judgment. The the most pronounced elbow is determined using
the following formula:
\deqn{
   st = \frac{(f_k - f_{k-1}) / (np_k - np_{k-1})}{(f_{k+1} - f_k) / (np_{k+1} - np_k)}
}
where \eqn{f_k} represents the fit-index value (can be \eqn{PVAF} @seealso \code{\link[Qval]{get.PVAF}} or
\eqn{R2} @seealso \code{\link[Qval]{get.R2}}) when the q-vector contains \eqn{k} attributes,
similarly, \eqn{f_{k-1}} and \eqn{f_{k+1}} represent the fit-index value when the q-vector contains \eqn{k-1}
and \eqn{k+1} attributes, respectively. \eqn{{np}_k} denotes the number of parameters when the
q-vector has \eqn{k} attributes, which is \eqn{2^k} for a saturated model. Likewise, \eqn{{np}_{k-1}}
and \eqn{{np}_{k+1}} represent the number of parameters when the q-vector has \eqn{k-1} and
\eqn{k+1} attributes, respectively. The Hull method calculates the \eqn{st} index for all possible q-vectors
and retains the q-vector with the maximum \eqn{st} index as the corrected result.
Najera et al. (2021) removed any concave points from the Hull plot, and when only the first and
last points remained in the plot, the saturated q-vector was selected.
}

\section{The MLR-B method}{

The MLR-B method proposed by Tu et al. (2022) differs from the GDI, Wald and Hull method in that
it does not employ \eqn{PVAF}. Instead, it directly uses the marginal probabilities of attribute mastery for
subjects to perform multivariate logistic regression on their observed scores. This approach assumes
all possible q-vectors and conducts \eqn{2^K-1} regression modelings. After proposing regression equations
that exclude any insignificant regression coefficients, it selects the q-vector corresponding to
the equation with the minimum AIC fit as the validation result. The performance of this method in both the
LCDM and GDM models even surpasses that of the Hull method, making it an efficient and reliable
approach for Q-matrix correction.
}

\section{Iterative procedure}{

The iterative procedure that one modification at a time is item level iteration (\code{"item"}) in (Najera
et al., 2020, 2021), while the iterative procedure that the entire Q-matrix is modified at each iteration
is test level iteration (\code{"test"}) (Najera et al., 2020; Tu et al., 2022).

The steps of the \code{item} level iterative procedure algorithm are as follows:
\describe{
   \item{Step1}{Fit the \code{CDM} according to the item responses and the provisional Q-matrix (\eqn{\mathbf{Q}^0}).}
   \item{Step2}{Validate the provisional Q-matrix and gain a suggested Q-matrix (\eqn{\mathbf{Q}^1}).}
   \item{Step3}{for each item, \eqn{PVAF_{0i}} as the \eqn{PVAF} of the provisional q-vector specified in \eqn{\mathbf{Q}^0},
               and \eqn{PVAF_{1i}} as the \eqn{PVAF} of the suggested q-vector in \eqn{\mathbf{Q}^1}.}
   \item{Step4}{Calculate all items' \eqn{\delta PVAF_{i}}, defined as \eqn{\delta PVAF_{i} = |PVAF_{1i} - PVAF_{0i}|}}
   \item{Step5}{Define the hit item as the item with the highest \eqn{\delta PVAF_{i}}.}
   \item{Step6}{Update \eqn{\mathbf{Q}^0} by changing the provisional q-vector by the suggested q-vector of the hit item.}
   \item{Step7}{Iterate over Steps 1 to 6 until \eqn{\sum_{i=1}^{I} \delta PVAF_{i} = 0}}
}

The steps of the \code{test} level iterative procedure algorithm are as follows:
\describe{
   \item{Step1}{Fit the \code{CDM} according to the item responses and the provisional Q-matrix (\eqn{\mathbf{Q}^0}).}
   \item{Step2}{Validate the provisional Q-matrix and gain a suggested Q-matrix (\eqn{\mathbf{Q}^1}).}
   \item{Step3}{Check whether \eqn{\mathbf{Q}^1 = \mathbf{Q}^0}. If \code{TRUE}, terminate the iterative algorithm.
             If \code{FALSE}, Update \eqn{\mathbf{Q}^0} as \eqn{\mathbf{Q}^1}.}
   \item{Step4}{Iterate over Steps 1 and 3 until one of conditions as follows is satisfied: 1. \eqn{\mathbf{Q}^1 =
                 \mathbf{Q}^0}; 2. Reach the max iteration (\code{maxitr}); 3. \eqn{\mathbf{Q}^1} does not satisfy
                the condition that an attribute is measured by one item at least.}
}
}

\examples{
################################################################
#                           Example 1                          #
#             The GDI method to validate Q-matrix              #
################################################################
set.seed(123)

library(Qval)

## generate Q-matrix and data
K <- 4
I <- 20
example.Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ,
                         model = "GDINA", distribute = "horder")

## simulate random mis-specifications
example.MQ <- sim.MQ(example.Q, 0.1)

\donttest{
## using MMLE/EM to fit CDM model first
example.CDM.obj <- CDM(example.data$dat, example.MQ)

## using the fitted CDM.obj to avoid extra parameter estimation.
Q.GDI.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "GDI")


## also can validate the Q-matrix directly
Q.GDI.obj <- validation(example.data$dat, example.MQ)

## item level iteration
Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI",
                        iter.level = "item", maxitr = 150)

## search method
Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI",
                        search.method = "ESA")

## cut-off point
Q.GDI.obj <- validation(example.data$dat, example.MQ, method = "GDI",
                        eps = 0.90)

## check QRR
print(getQRR(example.Q, Q.GDI.obj$Q.sug))
}



################################################################
#                           Example 2                          #
#             The Wald method to validate Q-matrix             #
################################################################
set.seed(123)

library(Qval)

## generate Q-matrix and data
K <- 4
I <- 20
example.Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA",
                         distribute = "horder")

## simulate random mis-specifications
example.MQ <- sim.MQ(example.Q, 0.1)

\donttest{
## using MMLE/EM to fit CDM first
example.CDM.obj <- CDM(example.data$dat, example.MQ)

## using the fitted CDM.obj to avoid extra parameter estimation.
Q.Wald.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "Wald")


## also can validate the Q-matrix directly
Q.Wald.obj <- validation(example.data$dat, example.MQ, method = "Wald")

## check QRR
print(getQRR(example.Q, Q.Wald.obj$Q.sug))
}



################################################################
#                           Example 3                          #
#             The Hull method to validate Q-matrix             #
################################################################
set.seed(123)

library(Qval)

## generate Q-matrix and data
K <- 4
I <- 20
example.Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA",
                         distribute = "horder")

## simulate random mis-specifications
example.MQ <- sim.MQ(example.Q, 0.1)

\donttest{
## using MMLE/EM to fit CDM first
example.CDM.obj <- CDM(example.data$dat, example.MQ)

## using the fitted CDM.obj to avoid extra parameter estimation.
Q.Hull.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "Hull")


## also can validate the Q-matrix directly
Q.Hull.obj <- validation(example.data$dat, example.MQ, method = "Hull")

## change PVAF to R2 as fit-index
Q.Hull.obj <- validation(example.data$dat, example.MQ, method = "Hull", criter = "R2")

## check QRR
print(getQRR(example.Q, Q.Hull.obj$Q.sug))
}



################################################################
#                           Example 4                          #
#             The MLR-B method to validate Q-matrix            #
################################################################
set.seed(123)

library(Qval)

## generate Q-matrix and data
K <- 4
I <- 20
example.Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
example.data <- sim.data(Q = example.Q, N = 500, IQ = IQ, model = "GDINA",
                         distribute = "horder")

## simulate random mis-specifications
example.MQ <- sim.MQ(example.Q, 0.1)

\donttest{
## using MMLE/EM to fit CDM first
example.CDM.obj <- CDM(example.data$dat, example.MQ)

## using the fitted CDM.obj to avoid extra parameter estimation.
Q.MLR.obj <- validation(example.data$dat, example.MQ, example.CDM.obj, method = "MLR-B")


## also can validate the Q-matrix directly
Q.MLR.obj <- validation(example.data$dat, example.MQ, method  = "MLR-B")

## check QRR
print(getQRR(example.Q, Q.Hull.obj$Q.sug))
}

}
\references{
de la Torre, J., & Chiu, C. Y. (2016). A General Method of Empirical Q-matrix Validation. Psychometrika, 81(2), 253-273. https://doi.org/10.1007/s11336-015-9467-8.

de la Torre, J. (2008). An Empirically Based Method of Q-Matrix Validation for the DINA Model: Development and Applications. Journal of Education Measurement, 45(4), 343-362. https://doi.org/10.1111/j.1745-3984.2008.00069.x.

Lorenzo-Seva, U., Timmerman, M. E., & Kiers, H. A. (2011). The Hull method for selecting the number of common factors. Multivariate Behavioral Research, 46, 340–364. https://doi.org/10.1080/00273171.2011.564527.

Ma, W., & de la Torre, J. (2020). An empirical Q-matrix validation method for the sequential generalized DINA model. British Journal of Mathematical and Statistical Psychology, 73(1), 142-163. https://doi.org/10.1111/bmsp.12156.

McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in economics (pp. 105–142). New York, NY: Academic Press.

Najera, P., Sorrel, M. A., & Abad, F. J. (2019). Reconsidering Cutoff Points in the General Method of Empirical Q-Matrix Validation. Educational and Psychological Measurement, 79(4), 727-753. https://doi.org/10.1177/0013164418822700.

Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2020). Improving Robustness in Q-Matrix Validation Using an Iterative and Dynamic Procedure. Applied Psychological Measurement, 44(6), 431-446. https://doi.org/10.1177/0146621620909904.

Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2021). Balancing fit and parsimony to improve Q-matrix validation. British Journal of Mathematical and Statistical Psychology, 74 Suppl 1, 110-130. https://doi.org/10.1111/bmsp.12228.

Terzi, R., & de la Torre, J. (2018). An Iterative Method for Empirically-Based Q-Matrix Validation. International Journal of Assessment Tools in Education, 248-262. https://doi.org/10.21449/ijate.407193.

Tu, D., Chiu, J., Ma, W., Wang, D., Cai, Y., & Ouyang, X. (2022). A multiple logistic regression-based (MLR-B) Q-matrix validation method for cognitive diagnosis models: A confirmatory approach. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01880-x.
}
\author{
Haijiang Qin <Haijiang133@outlook.com>
}
