% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mvord.R
\name{mvord}
\alias{mvord}
\title{Multivariate Ordinal Regression Models.}
\usage{
mvord(formula, error.structure = cor_general(~1), link = mvprobit(), data,
  index = NULL, response.names = NULL, response.levels = NULL,
  coef.constraints = NULL, coef.values = NULL,
  threshold.constraints = NULL, threshold.values = NULL, weights = NULL,
  offset = NULL, scale = FALSE, se = TRUE, start.values = NULL,
  solver = "newuoa", PL.lag = NULL, control = list(maxit = 2e+05, trace =
  1, kkt = FALSE))
}
\arguments{
\item{formula}{an object of class \code{\link{formula}} of the form \code{y ~ X1 + ... + Xp}.}

\item{error.structure}{different \code{error.structures}: general correlation structure (default)\cr
\code{cor_general(~1)},
general covariance structure \code{cov_general(~1)}, factor dependent correlation structure \code{cov_general(~f)},
factor dependent covariance structure \code{cov_general(~f)}, covariate dependent equicorrelation structure \cr
\code{cor_equi(~S)},
AR(1) correlation structure \code{cor_ar1(~1)} or a covariate dependent \cr
AR(1) correlation structure \code{cor_ar1(~S)}.
See \code{\link{error_struct}} or 'Details'.}

\item{link}{specifies the link function by \code{mvprobit()} (multivariate normally distributed errors)
or \code{mvlogit(df = 8)} (multivariate logistically distributed errors), where \code{df} specifies the degrees of freedom of the t copula.}

\item{data}{\code{\link{data.frame}} containing a subject index, an index for the multiple measurements,
an ordinal response \code{y} and covariates \code{X1, ..., Xp}.}

\item{index}{(optional) argument to specify the column names of the subject index and the multiple measurement index
by a vector \cr
\code{c("subject", "multiple_measurement")} in \code{data}.
The default value of \code{index} is \code{NULL} assuming that the first column of \code{data} contains
the subject index and the second column the multiple measurement index.}

\item{response.names}{(optional) \code{\link{vector}} of the labels of the multiple measurement index in order to
specify the ordering of the responses which is essential when setting constraints on the model parameters.
 The default value of \code{response.names} is \code{NULL} giving the natural ordering of the levels of the factor variable
 of multiple measurements.}

\item{response.levels}{(optional) \code{\link{list}} of length equal to the number of multiple measurements to specify the category labels
in case of varying categories across multiple measurements}

\item{coef.constraints}{\code{\link{vector}} or \code{\link{matrix}} of constraints on the regression coefficients. See 'Details'.}

\item{coef.values}{\code{\link{matrix}} setting fixed values on the regression coefficients. See 'Details'.}

\item{threshold.constraints}{\code{\link{vector}} of constraints on the threshold parameters. See 'Details'.}

\item{threshold.values}{\code{\link{list}} of (optional) fixed values for the threshold parameters. See 'Details'.}

\item{weights}{(optional) column name of subject-specific weights in \code{data} which need to be
constant across multiple measurements. Negative weights are not allowed.}

\item{offset}{this can be used to specify an a priori known component to be included in the linear predictor during fitting.
This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included
in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.}

\item{scale}{If \code{scale = TRUE}, the continuous covariates are standardized
by substracting the mean and dividing by the standard deviation.
This operation is performed for each repeated measurement before fitting.}

\item{se}{logical, if \code{TRUE} standard errors are computed.}

\item{start.values}{vector of (optional) starting values.}

\item{solver}{character string containing the name of the applicable solver of \code{\link{optimx}} (default is \code{"newuoa"})
or wrapper function for user defined solver.}

\item{PL.lag}{specifies the time lag of the pairs in the pairwise likelihood approach to be optimized.}

\item{control}{a list of control arguments. See \code{\link{optimx}}.}
}
\value{
The function \code{mvord} returns an object of \code{\link{class}} \code{"mvord"}.

The functions \code{summary} and \code{print} are used to display the results.
The function \code{coef} extracts the regression coefficients, a function \code{thresholds} the threshold coefficients
and the function \cr
\code{get_error_struct} returns the estimated parameters of the corresponding error structure.

An object of \code{\link{class}} \code{"mvord"} is a list containing the following components:

\itemize{
 \item{\code{beta}}{

 a named \code{\link{matrix}} of regression coefficients}
 \item{\code{theta}}

 a named \code{\link{list}}{ of threshold parameters}
  \item{\code{error.struct}}{

  an object of class \code{\link{error_struct}} containing the parameters of the error
  structure}
  \item{\code{sebeta}}{

    a named \code{\link{matrix}} of the standard errors of the regression coefficients}
  \item{\code{setheta}}{

    a named \code{\link{list}} of the standard errors of the threshold parameters}
  \item{\code{seerror.struct}}{

  a \code{vector} of standard errors for the parameters of the error structure}
  \item{\code{rho}}{

    a \code{\link{list}} of all objects that are used in \code{mvord()}}
}
}
\description{
\code{mvord} is used to estimate multivariate ordinal regression models. Different model types are implemented and can be chosen by
the use of different \code{error.structures}. Constraints on the threshold as well as on the regression parameters can be imposed.
}
\details{
\describe{
  \item{\code{data}}{
We use the long format for the input of \code{data}, where each row contains a subject
index \eqn{i} (\code{firm_id}), a multiple measurement index \eqn{j} (\code{rater_id}), an ordinal response
(\code{rating}) and all the covariates (\code{X1, ..., Xp}). This long format data stucture
is internally transformed to matrix of covariates \eqn{Y} and a list of covariate matrices
\eqn{X_j} for all \eqn{j \in J} by a matching according to the subject index \eqn{i} and the multiple
measurement index \eqn{j}, which are passed by an optional argument \code{index}. This is
usually performed by a character vector of length two specifying the column names
of the subject index and the multiple measurement index in \code{data}. For the data set in \code{\link{data_mvord}} we have:

   \code{index = c("firm_id", "rater_id")}

The default value of \code{index} is \code{NULL} assuming that the first column of \code{data}
contains the subject index and the second column the multiple measurement index.
(Note that if the covariates have different scale, the estimation is prone to numerical instabilities.
In such a case one could standardize the covariates \eqn{x_{ij}}.)

If specific constraints are imposed on the parameter set, a well defined index \eqn{j \in J}
for the multiple measurements is needed. Therefore, a vector \code{response.names}
is used to define the index number of the multiple measurement.

    \code{response.names = c("rater1", "rater2", "rater3")}

The default value of \code{response.names} is \code{NULL} giving the natural ordering
of the levels of the factor variable for all the multiple measurements.
The ordering of \code{response.names} always specifies the index of the
multiple measurement unit \eqn{j \in J}. This ordering is essential when
putting constraints on the parameters and when setting \code{response.levels}.

\preformatted{response.levels = list(c("G","F","E", "D", "C", "B", "A"),
                       c("G","F","E", "D", "C", "B", "A"),
                       c("O","N","M","L", "K", "J", "I", "H"))}

If the categories differ across multiple measurements (either the number of categories or the category labels)
one needs to specify the \code{response.levels} explicitly. This is performed by a list
of length \eqn{J} (number of multiple measurements), where each element contains
the names of the levels of the ordered categories in ascending or descending order.}

\item{\code{formula}}{
The ordinal responses \eqn{Y} (\code{rating}) are passed by a \code{formula} object.
Intercepts can be included or excluded in the model depending on the model paramterization:
\itemize{
\item {Model without intercept:}

If the intercept should be removed the \code{formula} for a given response (\code{rating})
and covariates (\code{X1} to \code{Xp}) has the following form:

     \code{formula = rating ~ 0 + X1 + ... + Xp}.

\item {Model with intercept:}

If one wants to include an intercept in the model, there are two equivalent possibilities
to set the model \code{formula}. Either one inludes the intercept explicitly by:

    \code{formula = rating ~ 1 + X1 + ... + Xp},

or by

  \code{formula = rating ~ X1 + ... + Xp}.
}
  }
  \item{\code{error.structure}}{
 We allow for different error structures depending on the model parameterization:
\itemize{
  \item {Correlation:}
  \itemize{
  \item \code{cor_general}
The most common parameterization is the general correlation matrix.

 \code{error.structure = cor_general(~ 1)}

This parameterization can be extended by allowing a factor dependent
correlation structure, where the correlation of each subject \eqn{i} depends
on a given subject-specific factor \code{f}. This factor \code{f} is not allowed to vary
across multiple measurements \eqn{j} for the same subject \eqn{i} and due to numerical
constraints only up to maximum 30 levels are allowed.

      \code{error.structure = cor_general(~ f)}

  \item \code{cor_equi}
A covariate dependent equicorrelation structure, where the correlations
are equal across all \eqn{J} dimensions and depend on subject-specific covariates \code{S1, ..., Sm}.
It has to be noted that these covariates \code{S1, ..., Sm} are not allowed to vary across
 multiple measurements \eqn{j} for the same subject \eqn{i}.

         \code{error.structure = cor_equi(~ S1 + ... + Sm)}

  \item \code{cor_ar1}
In order to account for some heterogeneity the \eqn{AR(1)} error structure
is allowed to depend on covariates \code{X1, ..., Xp} that are constant
over time for each subject \eqn{i}.

      \code{error.structure = cor_ar1(~ S1 + ... + Sm)}
}


\item {Covariance:}
\itemize{
\item \code{cov_general}

In case of a full variance-covariance parameterization the standard parameterization
 with a full variance-covariance is obtained by:

 \code{error.structure = cov_general(~ 1)}

 This parameterization can be extended to the factor dependent covariance structure,
  where the covariance of each subject depends on a given factor \code{f}:

 \code{error.structure = cov_general(~ f)}
  }
  }
  }

  \item{\code{coef.constraints}}{
  The package supports
  constraints on the regression coefficients. Firstly, the
  user can specify whether the regression coefficients should be equal
  across some or all response dimensions. Secondly, the values of some
  of the regression coefficients can be fixed.

  As there is no unanimous way to specify such constraints, we offer
  two options. The first option is similar to the specification of constraints on the thresholds.
   The constraints can be specified in this case as a vector or matrix of integers,
    where coefficients getting same integer value are set equal.
  Values of the regression coefficients can be fixed through a matrix.
  Alternatively constraints on the regression coefficients can be specified
  by using the design employed by the \pkg{VGAM} package.
  The constraints in this setting are set through a named list,
  where each element of the list contains a matrix full-column rank.
  If the values of some regression coefficients should be fixed, offsets can be used.
  This design has the advantage that it supports
  constraints on outcome-specific as well as category-specific
  regression coefficients. While the first option has the advantage of requiring a more concise input,
   it does not support category-specific coefficients.
  The second option offers a more flexible design in this respect. For further information
  on the second option we refer to the vignette and to the documentation of \code{\link[VGAM]{vglm}}.

Using the first option, constraints can be specified by a vector or a matrix \cr
   \code{coef.constraints}.
    First, a simple and less flexible way by specifying a vector \cr
    \code{coef.constraints}
     of dimension \eqn{J}. The ordering \eqn{j \in J} of the responses is given by \code{response.names}.
     This vector is allocated in the following way:
The first element of the vector \code{coef.constraints} gets a value of 1. If the coefficients
 of the multiple measurement \eqn{j = 2} should be equal to the coefficients of the first dimension (\eqn{j=1}) again
  a value of 1 is set. If the coefficients should be different to the coefficients of the first dimension
  a value of 2 is set. In analogy, if the coefficients of dimensions two and three
   should be the same one sets both values to 2 and if they should be different,
    a value of 3 is set. Constraints on the regression coefficients of the remaining multiple measurements are set analogously.

 \code{coef.constraints <- c(1,1,2,3)}

 This vector \code{coef.constraints} sets the coefficients of the first two raters equal
 \deqn{\beta_{1\cdot} = \beta_{2\cdot}}
 A more flexible way to specify constraints on the regression coefficients is a matrix with \eqn{J} rows and \eqn{p} columns,
  where each column specifies constraints on one of the \eqn{p} coefficients in the same way as above.
   In addition, a value of \code{NA} excludes a corresponding coefficient (meaning it should be fixed to zero).

   \preformatted{coef.constraints <- cbind(c(1,2,3,4), c(1,1,1,2), c(NA,NA,NA,1),
                          c(1,1,1,NA), c(1,2,3,4), c(1,2,3,4))}

       This matrix \code{coef.constraints} gives the following constraints:
\itemize{
 \item \eqn{\beta_{12} = \beta_{22} = \beta_{32}}
   \item \eqn{\beta_{13} = 0}
   \item \eqn{\beta_{23} = 0}
   \item \eqn{\beta_{33} = 0}
   \item \eqn{\beta_{44} = 0}
   \item \eqn{\beta_{14} = \beta_{24} = \beta_{34}}
}
}


  \item{\code{coef.values}}{
  In addition, specific values on regression coefficients can be set in the matrix \cr
  \code{coef.values}.
   Parameters are removed if the value is set to zero (default for \code{NA}'s in \cr
   \code{coef.constraints})
    or to some fixed value. If constraints on parameters are set, these dimensions need to have
     the same value in \code{coef.values}. Again each column corresponds to one regression coefficient.

 Together with the \code{coef.constraints} from above we impose:

   \preformatted{coef.constraints <- cbind(c(1,2,2), c(1,1,2), c(NA,1,2),
                          c(NA,NA,NA), c(1,1,2))}

 \preformatted{coef.values <- cbind(c(NA,NA,NA), c(NA,NA,NA), c(0,NA,NA),
                     c(1,1,1), c(NA,NA,NA))}
Interaction terms

When constraints on the regression coefficient should be specified in models with interaction terms,
the \code{coef.constraints} matrix has to be expanded manually. In case of interaction terms
 (specified either by \code{X1 + X2 + X1:X2} or equivalently by \code{X1*X2}), one additional
  column at the end of \code{coef.constraints} for the interaction term has to be specified for
   numerical variables. For interaction terms including factor variables suitably more columns have
    to be added to the \code{coef.constraints} matrix.
}


  \item{\code{threshold.constraints}}{
  Similarly, constraints on the threshold parameters can be imposed by a vector of positive integers,
   where dimensions with equal threshold parameters get the same integer. When restricting the thresholds of two
    outcome dimensions to be the same, one has to be careful that the number of categories in
     the two outcome dimensions must be the same. In our example with \eqn{J=4} different outcomes we impose:

 \code{threshold.constraints <- c(1,1,2)}

   gives the following restrictions:
 \itemize{
 \item \eqn{\bm\theta_{1} = \bm\theta_{2}}
 \item \eqn{\bm\theta_{3}} arbitrary.
}
}

  \item{\code{threshold.values}}{
  In addition, threshold parameter values can be specified by \code{threshold.values}
   in accordance with identifiability constraints. For this purpose we use a \code{list}
    with \eqn{J} elements, where each element specifies the constraints of the particular
     dimension by a vector of length of the number of threshold parameters (number of categories - 1).
     A number specifies a threshold parameter to a specific value and \code{NA} leaves the parameter flexible.
      For \code{\link{data_mvord}} we have
\preformatted{threshold.constraints <- NULL}

\preformatted{threshold.values <- list(c(-4,NA,NA,NA,NA,4.5),
                         c(-4,NA,NA,NA,NA,4.5),
                         c(-5,NA,NA,NA,NA,NA,4.5))}
}
}
}
\examples{
library(mvord)

#toy example
data(data_toy_example)

# convert data_toy_example into long format
df <- cbind.data.frame("i" = rep(1:100,2), "j" = rep(1:2,each = 100),
                       "Y" = c(data_toy_example$Y1,data_toy_example$Y2),
                       "X1" = rep(data_toy_example$X1,2),
                       "X2" = rep(data_toy_example$X2,2))

res <- mvord(formula = Y ~ 0 + X1 + X2,
               data = df,
               index = c("i", "j"),
               link = mvprobit(),
               solver = "BFGS",
               se = TRUE,
               error.structure = cor_general(~1),
               threshold.constraints = c(1,1),
               coef.constraints = c(1,1))
print(res)
summary(res)
thresholds(res)
coefficients(res)
get_error_struct(res)

## examples
#load data
data(data_mvord)
head(data_mvord)

#-------------
# cor_general
#-------------
\donttest{
# approx 1 min
res_cor <- mvord(formula = rating ~ 0 + X1 + X2 + X3 + X4 + X5,
#formula ~ 0 ... without intercept
               index = c("firm_id", "rater_id"),
#not necessary if firm_id is first column and rater is second column in data
               data = data_mvord, #choose data
               response.levels = list(c("G","F","E", "D", "C", "B", "A"),
                                      c("G","F","E", "D", "C", "B", "A"),
                                      c("O","N","M","L", "K", "J", "I", "H")),
#list for each rater;
#need to be set if specific levels/labels are desired (not in natural ordering)
               response.names = c("rater1", "rater2", "rater3"),
# set if not all raters are used and specifies ordering
               link = mvprobit(), #mvprobit() or mvlogit()
               error.structure = cor_general(~1), #different error structures
               coef.constraints = cbind(c(1,2,2),
                                        c(1,1,2),
                                        c(NA,1,2),
                                        c(NA,NA,NA),
                                        c(1,1,2)),#either a vector or a matrix
               coef.values = cbind(c(NA,NA,NA),
                                   c(NA,NA,NA),
                                   c(0,NA,NA),
                                   c(1,1,1),
                                   c(NA,NA,NA)),
#matrix (possible if coef.constraints is a matrix)
               threshold.constraints = c(1,1,2),
               solver = "BFGS") #BFGS is faster
print(res_cor)
summary(res_cor)
thresholds(res_cor)
coefficients(res_cor)
get_error_struct(res_cor)

#-------------
# cov_general
#-------------
#approx 4 min
res_cov <- mvord(formula = rating ~ 1 + X1 + X2 + X3 + X4 + X5,
#formula ~ 0 ... without intercept
            index = c("firm_id", "rater_id"),
#not necessary if firm_id is first column and rater is second column in data
            data = data_mvord, #choose data
            response.levels = list(c("G","F","E", "D", "C", "B", "A"),
                                   c("G","F","E", "D", "C", "B", "A"),
                                   c("O","N","M","L", "K", "J", "I", "H")),
#list for each rater;
#need to be set if specific levels/labels are desired
            response.names = c("rater1", "rater2", "rater3"),
# set if not all raters are used and specifies ordering
            link = mvprobit(), #mvprobit() or mvlogit()
            error.structure = cov_general(~1), #different error structures
            threshold.constraints = NULL, #vector
            threshold.values = list(c(-4,NA,NA,NA,NA,4.5),
                                    c(-4,NA,NA,NA,NA,4),
                                    c(-5,NA,NA,NA,NA,NA,4.5)),
#list for each rater
            solver = "newuoa") #does not converge with BFGS
print(res_cov)
summary(res_cov)
thresholds(res_cov)
coefficients(res_cov)
get_error_struct(res_cov)


#-------------
# cor_ar1
#-------------
#approx 4min
data(data_mvord_panel)
head(data_mvord_panel)
mult.obs <- 5
res_AR1 <- mvord(formula = rating ~ 0 + X1 + X2 + X3 + X4 + X5,
#formula ~ 0 ... without intercept
           index = c("firm_id", "year"),
#not necessary if firm_id is first column and rater is second column in data
           data = data_mvord_panel, #choose data
           response.levels = rep(list(c("G","F","E", "D", "C", "B", "A")), mult.obs),
#list for each rater;
#need to be set if specific levels/labels are desired (not in natural ordering)
           response.names = c("year3", "year4", "year5", "year6", "year7"),
# set if not all raters are used and specifies ordering
           link = mvprobit(), #mvprobit() or mvlogit()
           error.structure = cor_ar1(~1), #different error structures
           threshold.constraints = c(1,1,1,2,2),
           coef.constraints = c(1,1,1,2,2),
           solver = "BFGS")
print(res_AR1)
summary(res_AR1)
thresholds(res_AR1)
coefficients(res_AR1)
get_error_struct(res_AR1)
get_error_struct(res_AR1, type = "corr")
}

}
\seealso{
%\code{\link{predict.mvord}},
\code{\link{print.mvord}}, \code{\link{summary.mvord}}, \code{\link{coef.mvord}},
 \code{\link{thresholds.mvord}}, \code{\link{get_error_struct.mvord}},
 \code{\link{data_cr_panel}},\code{\link{data_cr_mvord}}, \code{\link{data_cr_mvord2}},
 \code{\link{data_mvord_panel}},\code{\link{data_mvord}}, \code{\link{data_mvord2}}
}
