% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering_functions.R
\name{KMeans_rcpp}
\alias{KMeans_rcpp}
\title{k-means using RcppArmadillo}
\usage{
KMeans_rcpp(data, clusters, num_init = 1, max_iters = 100,
  initializer = "optimal_init", fuzzy = FALSE, threads = 1,
  verbose = FALSE, CENTROIDS = NULL, tol = 1e-04,
  tol_optimal_init = 0.3, seed = 1)
}
\arguments{
\item{data}{matrix or data frame}

\item{clusters}{the number of clusters}

\item{num_init}{number of times the algorithm will be run with different centroid seeds}

\item{max_iters}{the maximum number of clustering iterations}

\item{initializer}{the method of initialization. One of, \emph{optimal_init}, \emph{quantile_init}, \emph{kmeans++} and \emph{random}. See details for more information}

\item{fuzzy}{either TRUE or FALSE. If TRUE, then prediction probabilities will be calculated using the distance between observations and centroids}

\item{threads}{an integer specifying the number of cores to run in parallel. Openmp will be utilized to parallelize the number of initializations (num_init)}

\item{verbose}{either TRUE or FALSE, indicating whether progress is printed during clustering. If threads > 1 THEN verbose = FALSE (by default)}

\item{CENTROIDS}{a matrix of initial cluster centroids. The rows of the CENTROIDS matrix should be equal to the number of clusters and the columns should be equal to the columns of the data.}

\item{tol}{a float number. If, in case of an iteration (iteration > 1 and iteration < max_iters) 'tol' is greater than the squared norm of the centroids, then kmeans has converged}

\item{tol_optimal_init}{tolerance value for the 'optimal_init' initializer. The higher this value is, the far appart from each other the centroids are.}

\item{seed}{integer value for random number generator (RNG)}
}
\value{
a list with the following attributes: clusters, fuzzy_clusters (if fuzzy = TRUE), centroids, total_SSE, best_initialization, WCSS_per_cluster, obs_per_cluster, between.SS_DIV_total.SS
}
\description{
k-means using RcppArmadillo
}
\details{
This function has the following features in comparison to the KMeans_arma function:

It allows for multiple initializations (which can be parallelized if Openmp is available).

Besides optimal_init, quantile_init, random and kmeans++ initilizations one can specify the centroids using the CENTROIDS parameter.

The running time and convergence of the algorithm can be adjusted using the num_init, max_iters and tol parameters.

If num_init > 1 then KMeans_rcpp returns the attributes of the best initialization using as criterion the within-cluster-sum-of-squared-error.


---------------initializers----------------------

\strong{optimal_init}   : this initializer adds rows of the data incrementally, while checking that they do not already exist in the centroid-matrix

\strong{quantile_init}  : initialization of centroids by using the cummulative distance between observations and by removing potential duplicates

\strong{kmeans++}       : kmeans++ initialization. Reference : http://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf AND http://stackoverflow.com/questions/5466323/how-exactly-does-k-means-work

\strong{random}         : random selection of data rows as initial centroids
}
\examples{

data(dietary_survey_IBS)

dat = dietary_survey_IBS[, -ncol(dietary_survey_IBS)]

dat = center_scale(dat)

km = KMeans_rcpp(dat, clusters = 2, num_init = 5, max_iters = 100, initializer = 'optimal_init')

}
\author{
Lampros Mouselimis
}
