% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Optimal_Dispersion.R
\name{optimal_dispersion}
\alias{optimal_dispersion}
\title{Maximize dispersion for K groups}
\usage{
optimal_dispersion(x, K, solver = NULL, max_dispersion_considered = NULL)
}
\arguments{
\item{x}{The data input. Can be one of two structures: (1) A
feature matrix where rows correspond to elements and columns
correspond to variables (a single numeric variable can be
passed as a vector). (2) An N x N matrix dissimilarity matrix;
can be an object of class \code{dist} (e.g., returned by
\code{\link{dist}} or \code{\link{as.dist}}) or a \code{matrix}
where the entries of the upper and lower triangular matrix
represent pairwise dissimilarities.}

\item{K}{The number of groups or a vector describing the size of each group.}

\item{solver}{Optional argument; if passed, has to be either "glpk" or
"symphony". See details.}

\item{max_dispersion_considered}{Optional argument used for early stopping. If the dispersion found
is equal to or exceeds this value, a solution having the previous best dispersion 
is returned.}
}
\value{
A list with four elements:  
   \item{dispersion}{The optimal dispersion}
   \item{groups}{An assignment of elements to groups (vector)}
   \item{edges}{A matrix of 2 columns. Each row contains the indices of 
   elements that had to be investigated to find the dispersion (i.e., each pair
   of elements cannot be part of the same group in order to achieve maximum 
   dispersion).}
   \item{dispersions_considered}{All distances that were tested until the dispersion was found.}
}
\description{
Maximize dispersion for K groups
}
\details{
The dispersion is the minimum distance between two elements
  within the same group. This function implements an optimal method
  to maximize the dispersion. If the data input \code{x} is a feature
  matrix and not a dissimilarity matrix, the pairwise Euclidean
  distance is used. It uses the algorithm presented in Max
  Diekhoff's Bachelor thesis at the Computer Science Department at
  the Heinrich Heine University Düsseldorf.

  To find out which items are not allowed to be grouped in the same
  cluster for maximum dispersion, the algorithm sequentially builds
  instances of a graph coloring problem, using an integer linear
  programming (ILP) representation (also see Fernandez et al.,
  2013).  It is possible to specify the ILP solver via the argument
  \code{solver}. This function either requires the R package
  \code{Rglpk} and the GNU linear programming kit
  (<http://www.gnu.org/software/glpk/>) or the R package
  \code{Rsymphony} and the COIN-OR SYMPHONY solver libraries
  (<https://github.com/coin-or/SYMPHONY>). If the argument
  \code{solver} is not specified, the function will try to find the
  GLPK or SYMPHONY solver by itself (it prioritizes using SYMPHONY if
  both are available). The GNU linear programming kit (\code{solver =
  "glpk"}) seems to be considerably slower for K >= 3 than the
  SYMPHPONY solver (\code{solver = "symphony"}).

  Optimally solving the maximum dispersion problem is NP-hard for K
  > 2 and therefore computationally infeasible for larger data
  sets. For K = 3 and K = 4, it seems that this approach scales up to several 100 elements, 
  or even > 1000 for K = 3 (at least when using the Symphony solver). 
  For larger data sets, use the heuristic approaches in \code{\link{anticlustering}} or
  \code{\link{bicriterion_anticlustering}}. However, note that for K = 2, 
  the optimal approach is usually much faster than the heuristics.
  
  In the output, the element \code{edges} defines which elements must be in separate 
  clusters in order to achieve maximum dispersion. All elements not listed here
  can be changed arbitrarily between clusters without reducing the dispersion.
  If the maximum possible dispersion corresponds to the minimum dispersion
  in the data set, the output elements \code{edges} and \code{groups} are set to
  \code{NULL} because all possible groupings have the same value of dispersion.
  In this case the output element \code{dispersions_considered} has length 1.
}
\note{
If the SYMPHONY solver is used, an unfortunate
"message" is printed to the console when this function terminates: 

sym_get_col_solution(): No solution has been stored!

This message is no reason to worry and instead is a direct result
of the algorithm finding the optimal value for the dispersion.
Unfortunately, this message is generated in the C code underlying the 
SYMPHONY library (via the printing function \code{printf}), which cannot be
prevented in R.
}
\examples{

N <- 30
M <- 5
K <- 3
data <- matrix(rnorm(N*M), ncol = M)
distances <- dist(data)

opt <- optimal_dispersion(distances, K = K)
opt

# Compare to bicriterion heuristic:
groups_heuristic <- anticlustering(
  distances, 
  K = K,
  method = "brusco", 
  objective = "dispersion", 
  repetitions = 100
)
c(
  OPT = dispersion_objective(distances, opt$groups),
  HEURISTIC = dispersion_objective(distances, groups_heuristic)
)

# Different group sizes are possible:
table(optimal_dispersion(distances, K = c(15, 10, 5))$groups)

# Induce cannot-link constraints by maximizing the dispersion:
solvable <- matrix(1, ncol = 6, nrow = 6)
solvable[2, 1] <- -1
solvable[3, 1] <- -1
solvable[4, 1] <- -1
solvable <- as.dist(solvable)
solvable

# An optimal solution has to put item 1 in a different group than 
# items 2, 3 and 4 -> this is possible for K = 2
optimal_dispersion(solvable, K = 2)$groups

# It no longer works when item 1 can also not be linked with item 5:
# (check out output!)
unsolvable <- as.matrix(solvable)
unsolvable[5, 1] <- -1
unsolvable <- as.dist(unsolvable)
unsolvable
optimal_dispersion(unsolvable, K = 2)


}
\references{
Diekhoff (2023). Maximizing dispersion for anticlustering. Retrieved from 
https://www.cs.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Informatik/Algorithmische_Bioinformatik/Bachelor-_Masterarbeiten/2831963_ba_ifo_AbschlArbeit_klau_mapap102_madie120_20230203_1815.pdf  

Fernández, E., Kalcsics, J., & Nickel, S. (2013). The maximum dispersion 
problem. Omega, 41(4), 721–730. https://doi.org/10.1016/j.omega.2012.09.005
}
\seealso{
\code{\link{dispersion_objective}} \code{\link{anticlustering}}
}
\author{
Max Diekhoff

Martin Papenberg \email{martin.papenberg@hhu.de}
}
