% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cpThreshold.R
\name{cpThreshold}
\alias{cpThreshold}
\title{Optimizing \code{k} And \code{I} For Clique Percolation Community Detection}
\usage{
cpThreshold(
  W,
  method = c("unweighted", "weighted", "weighted.CFinder"),
  k.range,
  I.range,
  threshold = c("largest.components.ratio", "chi", "entropy")
)
}
\arguments{
\item{W}{A qgraph object or a symmetric matrix; see also \link[qgraph]{qgraph}}

\item{method}{A string indicating the method to use 
(\code{"unweighted"}, \code{"weighted"}, or \code{"weighted.CFinder"}).
See \link{cpAlgorithm} for more information}

\item{k.range}{integer or vector of \code{k} value(s) for which threshold(s) are determined
See \link{cpAlgorithm} for more information}

\item{I.range}{integer or vector of \code{I} value(s) for which threshold(s) are determined
See \link{cpAlgorithm} for more information}

\item{threshold}{A string or vector indicating which threshold(s) to determine
(\code{"largest.components.ratio", "chi", "entropy"}); see Details}
}
\value{
A data frame with columns for \code{k}, \code{I} (if \code{method = "weighted"}
  or \code{method = "weighted.CFinder"}), number of communities, number of isolated
  nodes, and results of the specified threshold(s).
}
\description{
Function for determining threshold value(s) (ratio of largest to second largest
community sizes, chi, entropy) of ranges of \code{k} and \code{I} values to help deciding
for optimal \code{k} and \code{I} values.
}
\details{
Optimizing \code{k} (clique size) and \code{I} (Intensity threshold) in clique percolation
  community detection is a difficult task. Farkas et al. (2007) recommend to look at the
  ratio of the largest to second largest community sizes
  (\code{threshold = "largest.components.ratio"}) for very large networks or
  the variance of the community sizes when removing the community size of the largest
  community (\code{threshold = "chi"}) for somewhat smaller networks. These thresholds were
  derived from percolation theory. If \code{I} for a certain \code{k} is too high, no
  community will be identified. If \code{I} is too low, a giant community with all nodes
  emerges. Just above this \code{I}, the distribution of community sizes often follows a
  power law, which constitutes a broad community sizes distribution. Farkas et al. (2007)
  point out, that for such \code{I}, the ratio of the largest to second largest community
  sizes is approximately 2, constituting one way to optimize \code{I} for each possible
  \code{k}. For somewhat smaller networks, the ratio can be rather unstable. Instead,
  Farkas et al. (2007, p.8) propose to look at the variance of the community sizes after
  removing the largest community. The idea is that when \code{I} is rather low, one giant
  community and multiple equally small ones occur. Then, the variance of the community
  sizes of the small communities (removing the giant community) is low. When \code{I}
  is high, only a few equally small communities will occur. Then, the variance of the
  community sizes (after removing the largest community) will also be low. In between,
  the variance will at some point be maximal, namely when the community size
  distribution is maximally broad (power law-distributed). Thus, the maximal variance
  could be used to optimize \code{I} for various \code{k}.

  For very small networks, optimizing \code{k} and \code{I} based on the distribution of the
  community sizes will be impossible, as too few communities will occur. Another possible
  threshold for such networks is based on the entropy of the community sizes
  (\code{threshold = "entropy"}). Entropy can be interpreted as an indicator of how
  surprising the respective solution is. The formula used here is based on Shannon
  Information, namely
  \deqn{-\sum_{i=1}^N p_i * \log_2 p_i}
  with \eqn{p_i} being the probability that a node is part of community \eqn{i}. For instance,
  if there are two communities, one of size 5 and one of size 3, the result would be
  \deqn{-((5/8 * \log_2 5/8) + (3/8 * \log_2 3/8)) = 1.46}
  When calculating entropy, the isolated nodes identified by clique percolation are treated as
  a separate community. If there is only one community or only isolated nodes, entropy is
  zero, indicating that the surprisingness is low. As compared to the ratio and chi 
  thresholds, entropy favors communities that are equal in size. Thus, it should not be 
  used for larger networks for which a broader community size distribution is preferred.
  Note that the entropy threshold has not been validated for clique percolation as of now.
  Initial simulation studies indicate that it consistently detects surprising community
  partitions in smaller networks especially if there are cliques of larger \code{k}.

  Ratio thresholds can be determined only if there are at least two communities. Chi threshold
  can be determined only if there are at least three communities. If there are not enough
  communities for the respective threshold, their values are NA in the data frame.
  Entropy can always be determined.
}
\examples{
## Example for unweighted networks

# create qgraph object
W <- matrix(c(0,1,1,1,0,0,0,0,
              0,0,1,1,0,0,0,0,
              0,0,0,0,0,0,0,0,
              0,0,0,0,1,1,1,0,
              0,0,0,0,0,1,1,0,
              0,0,0,0,0,0,1,0,
              0,0,0,0,0,0,0,1,
              0,0,0,0,0,0,0,0), nrow = 8, ncol = 8, byrow = TRUE)
W <- Matrix::forceSymmetric(W)
W <- qgraph::qgraph(W)

# determine entropy threshold for k = 3 and k = 4
results <- cpThreshold(W = W, method = "unweighted", k.range = c(3,4), threshold = "entropy")

## Example for weighted networks; three large communities with I = 0.3, 0.2, and 0.1, respectively

# create qgraph object
W <- matrix(c(0,0.10,0,0,0,0,0.10,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
              0,0,0.10,0,0,0,0,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
              0,0,0,0.10,0,0,0,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
              0,0,0,0,0.10,0,0,0.10,0.20,0,0,0,0,0.20,0.20,0,0,0,0,0,0,0,
              0,0,0,0,0,0.10,0,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0.10,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0.10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0.20,0,0,0,0,0.20,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0,0.20,0,0,0,0.20,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0,0,0.20,0,0,0.20,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0,0,0,0.20,0,0.20,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0.20,0.20,0.30,0,0,0,0,0.30,0.30,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.20,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0,0,0,0,0.30,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0,0,0,0.30,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0,0,0.30,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0,0.30,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,0.30,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.30,
              0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), nrow = 22, ncol = 22, byrow = TRUE) 
W <- Matrix::forceSymmetric(W)
W <- qgraph::qgraph(W, layout = "spring", edge.labels = TRUE)

# determine ratio, chi, and entropy thresholds for k = 3 and I from 0.3 to 0.09
results <- cpThreshold(W = W, method = "weighted", k.range = 3,
                       I.range = c(seq(0.3, 0.09, by = -0.01)),
                       threshold = c("largest.components.ratio","chi","entropy"))

## Example with Obama data set (see ?Obama)

# get data
data(Obama)

# estimate network
net <- qgraph::EBICglasso(qgraph::cor_auto(Obama), n = nrow(Obama))

# determine entropy threshold for k from 3 to 4 and I from 0.1 to 0.5
threshold <- cpThreshold(net, method = "weighted",
                         k.range = 3:4,
                         I.range = seq(0.1, 0.5, 0.01),
                         threshold = "entropy")

}
\references{
Farkas, I., Abel, D., Palla, G., & Vicsek, T. (2007). Weighted network modules.
\emph{New Journal of Physics, 9}, 180-180. http://doi.org/10.1088/1367-2630/9/6/180
}
\author{
Jens Lange, \email{lange.jens@outlook.com}
}
