\name{CE.NB}
\alias{CE.NB}
\title{Multiple Break-point Detection via the CE Method with Negative Binomial Distribution}
\description{
Performs calculations to estimate both the number of break-points and their corresponding locations of discrete measurements with the CE method. Negative binomial distribution is used to model the over-dispersed discrete (count) data. This function supports the simulation of break-point locations in the CE algorithm based on either the four parameter beta distribution or truncated normal distribution. The general BIC is used to select the optimal number of break-points. 
}
\usage{
CE.NB(data, Nmax = 10, eps = 0.01, rho = 0.05, M = 200, h = 5, a = 0.8, b = 0.8,
distyp = 1, parallel = FALSE)
}
\arguments{
  \item{data}{
data to be analysed. A single column array or a data frame.
}
  \item{Nmax}{
maximum number of break-points. Default value is 10. 
}
  \item{eps}{
the cut-off value for the stopping criterion in the CE method. Default value is 0.01.
}
  \item{rho}{
the fraction which is used to obtain the best performing set of sample solutions (i.e., elite sample). Default value is 0.05.
}
  \item{M}{
sample size to be used in simulating the locations of break-points. Default value is 200.
}
  \item{h}{
minimum aberration width. Default is 5.
}
  \item{a}{
a smoothing parameter value. It is used in the four parameter beta distribution to smooth both shape parameters. When simulating from the truncated normal distribution, this value is used to smooth the estimates of the mean values. Default is 0.8.
}
  \item{b}{
a smoothing parameter value. It is used in the truncated normal distribution to smooth the estimates of the standard deviation. Default is 0.8.
}
  \item{distyp}{
distribution to simulate break-point locations. Options: 1 = four parameter beta distribution, 2 = truncated normal distribution. Default is 1. 
}
  \item{parallel}{
A logical argument specifying if parallel computation should be carried-out (TRUE) or not (FALSE). By default it is set as `FALSE'. In Windows OS systems "snow" functionalities are used, whereas in Unix/Linux/MAC OSX "multicore" functionalities are used to carryout parallel computations with the maximum number of cores available.
}
}
\details{
The negative binomial (NB) distribution is used to model the discrete (count) data. NB model is preferred over the Poisson model when over-dispersion is observed in the count data. A performance function score (BIC) is calculated for each of the solutions generated by the statistical distribution (four parameter beta distribution or truncated normal distribution), which is used to simulate break-points from no break-point to the user provided maximum number of break-points. The solution that minimizes the BIC with respect to the number of break-points is reported as the optimal solution. Finally, a list containing a vector of break-point locations and the number of break-points are given in the console. 
}
\value{
A list is returned with following items:
\item{No.BPs}{The number of break-points in the data that is estimated by the CE method}
\item{BP.Loc}{A vector of break-point locations.}
}
\references{
  Priyadarshana, W. J. R. M. and Sofronov, G. (2012a) A Modified Cross- Entropy Method for Detecting Multiple Change-Points in DNA Count Data, In Proc. of the IEEE Conference on Evolutionary Computation (CEC), 1020-1027, DOI: 10.1109/CEC.2012.6256470.
  
  Priyadarshana, W. J. R. M. and Sofronov, G. (2012b) The Cross-Entropy Method and Multiple Change-Points Detection in Zero-Inflated DNA read count data, In: Y. T. Gu, S. C. Saha (Eds.) The 4th International Conference on Computational Methods (ICCM2012), 1-8, ISBN 978-1-921897-54-2.
  
  Rubinstein, R., and Kroese, D. (2004) The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Springer-Verlag, New York.
  
  Schwarz, G. (1978) Estimating the dimension of a model, The Annals of Statistics, 6(2), 461-464.

}
\author{
Priyadarshana, W.J.R.M. <madawa.weerasinghe@mq.edu.au>}


\seealso{
\code{\link{CE.ZINB}} for CE with zero-inflated negative binomial,
\code{\link{profilePlot}} to obtain mean profile plot.
}
\examples{
#### Simulated data example ###
segs <- 6 # Number of segements
M <- c(1500, 2200, 800, 2500, 1000, 2000) # Segment width
#true.locations <- c(1501, 3701, 4501, 7001, 8001)  # True break-point locations 
seg <- NULL 
p <- c(0.45, 0.25, 0.4, 0.2, 0.3, 0.6) # Specification of p's for each segment
for(j in 1:segs){
  seg <- c(seg, rnbinom(M[j], size =10, prob = p[j]))
}
simdata <- as.data.frame(seg)
rm(p, M, seg, segs, j)
#plot(data[, 1])

\dontrun{
## CE with the four parameter beta distribution ##

obj1 <- CE.NB(simdata, distyp = 1, parallel = TRUE) # Parallel computation
obj1

profilePlot(obj1, simdata) # To obtain the mean profile plot

## CE with truncated normal distribution ##

obj2 <- CE.NB(simdata, distyp = 2, parallel = TRUE) # Parallel computation
obj2

profilePlot(obj2, simdata) # To obtain the mean profile plot
}
}