% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/computeKmeans.R
\name{computeCanopy}
\alias{computeCanopy}
\title{Perform canopy clustering on the table to determine cluster centers.}
\usage{
computeCanopy(channel, tableName, looseDistance, tightDistance, canopy,
  tableInfo, id, include = NULL, except = NULL, scale = TRUE,
  idAlias = gsub("[^0-9a-zA-Z]+", "_", id), where = NULL,
  scaledTableName = NULL, schema = NULL, test = FALSE)
}
\arguments{
\item{channel}{connection object as returned by \code{\link{odbcConnect}}.}

\item{tableName}{Aster table name.}

\item{looseDistance}{specifies the maximum distance that any point can be from a canopy center to be considered 
part of that canopy.}

\item{tightDistance}{specifies the minimum distance that separates two canopy centers.}

\item{canopy}{an object of class \code{"toacanopy"} obtained with \code{computeCanopy}.}

\item{tableInfo}{pre-built summary of data to use (require when \code{test=TRUE}). See \code{\link{getTableSummary}}.}

\item{id}{column name or SQL expression containing unique table key.}

\item{include}{a vector of column names with variables (must be numeric). Model never contains variables other than in the list.}

\item{except}{a vector of column names to exclude from variables. Model never contains variables from the list.}

\item{scale}{logical if TRUE then scale each variable in-database before clustering. Scaling performed results in 0 mean and unit
standard deviation for each of input variables. when \code{FALSE} then function only removes incomplete
data before clustering (conaining \code{NULL}s).}

\item{idAlias}{SQL alias for table id. This is required when SQL expression is given for \code{id}.}

\item{where}{specifies criteria to satisfy by the table rows before applying
computation. The creteria are expressed in the form of SQL predicates (inside
\code{WHERE} clause).}

\item{scaledTableName}{the name of the Aster table with results of scaling}

\item{schema}{name of Aster schema that tables \code{scaledTableName}, \code{centroidTableName}, and
\code{clusteredTableName} belong to. Make sure that when this argument is supplied no table name defined
contain schema in its name.}

\item{test}{logical: if TRUE show what would be done, only (similar to parameter \code{test} in \pkg{RODBC} 
functions: \link{sqlQuery} and \link{sqlSave}).}
}
\description{
Canopy clustering algorithm runs in-database, returns centroids compatible with \code{\link{computeKmeans}} and 
pre-processes data for k-means and other clustering algorithms.
}
\details{
The function fist scales not-null data (if \code{scale=TRUE}) or just eliminate nulls without scaling. After 
that the data given (table \code{tableName} with option of filering with \code{where}) are clustered using canopy 
algorithm in Aster. This results in 1) set of centroids to use as initial cluster centers in k-means and
2) pre-processed data ready for clustering.
}
\examples{
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
can = computeCanopy(conn, "batting", looseDistance = 1, tightDistance = 0.5,
                    id="playerid || '-' || stint || '-' || teamid || '-' || yearid", 
                    include=c('g','r','h'), 
                    scaledTableName='test_canopy_scaled', 
                    where="yearid > 2000")
createCentroidPlot(can)

can = computeCanopy(conn, canopy = can, looseDistance = 2, tightDistance = 0.5)
createCentroidPlot(can)

can = computeCanopy(conn, canopy = can, looseDistance = 4, tightDistance = 1)
createCentroidPlot(can)

km = computeKmeans(conn, centers=can, iterMax = 1000, persist = TRUE, 
                   aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
                   centroidTableName = "kmeans_test_centroids",
                   tempTableName = "kmeans_test_temp",
                   clusteredTableName = "kmeans_test_clustered") 
createCentroidPlot(km)

}
}
\seealso{
\code{\link{computeClusterSample}}, \code{\link{computeSilhouette}}, \code{\link{computeCanopy}}
}

