Learning Clusterization

LearnClust package allows users to learn how the algorithms get the solution.

The package implements distances between clusters.
It includes main functions that return the solution applying the algorithms.
It contains .details functions that explain the process used to get the solution. They help the user to understand how it gets the solution.

Datasets:

We initialize some datasets to use in the algorithms:


cluster1 <- matrix(c(1,2),ncol=2)

cluster2 <- matrix(c(2,4),ncol=2)

weight <- c(0.2,0.8)

vectorData <- c(1,1,2,3,4,7,8,8,8,10)
# vectorData <- c(1:10)

matrixData <- matrix(vectorData,ncol=2,byrow=TRUE)
print(matrixData)
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> [4,]    8    8
#> [5,]    8   10

dfData <- data.frame(matrixData)
print(dfData)
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 3  4  7
#> 4  8  8
#> 5  8 10
plot(dfData)


cMatrix <- matrix(c(2,4,4,2,3,5,1,1,2,2,5,5,1,0,1,1,2,1,2,4,5,1,2,1), ncol=3, byrow=TRUE)

cDataFrame <- data.frame(cMatrix)

Distances

The package includes different types of distance:

Euclidean Distance

edistance(cluster1,cluster2)
#> [1] 2.236068

Manhattan Distance

mdistance(cluster1,cluster2)
#> [1] 3

Canberra Distance

canberradistance(cluster1,cluster2)
#> [1] 0.6666667

Chebyshev Distance

chebyshevDistance(cluster1,cluster2)
#> [1] 2

Octile Distance

octileDistance(cluster1,cluster2)
#> [1] 2.414214

Each function has a .details version that explain how the calculus is done.

There are functions where some weights are applied to each element. These function are used in the extra algorithm. These functions are:

Euclidean Distance with weight applied.

edistanceW(cluster1,cluster2,weight)
#> [1] 1.843909

Manhattan Distance with weight applied.

mdistanceW(cluster1,cluster2,weight)
#> [1] 1.8

Canberra Distance with weight applied.

canberradistanceW(cluster1,cluster2,weight)
#> [1] 0.3333333

Chebyshev Distance with weight applied.

chebyshevDistanceW(cluster1,cluster2,weight)
#> [1] 1.6

Octile Distance with weight applied.

octileDistanceW(cluster1,cluster2,weight)
#> [1] 1.682843

Agglomerative Hierarchical Clustering

This algorithm uses some functions according to the theoretical process:

We prepare data to be used in the algorithms. We create a cluster with each values. They could be different R types (vector, matrix or data.frame)

list <- toList(vectorData)

# list <- toList(matrixData)

# list <- toList(dfData)

print(list)
#> [[1]]
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    2    3    1
#> 
#> [[3]]
#>      [,1] [,2] [,3]
#> [1,]    4    7    1
#> 
#> [[4]]
#>      [,1] [,2] [,3]
#> [1,]    8    8    1
#> 
#> [[5]]
#>      [,1] [,2] [,3]
#> [1,]    8   10    1

We calculate the matrix distance using clusters from the first step. We use the distance and approach type that we want.

matrixDistance <- mdAgglomerative(list,'MAN','AVG')
print(matrixDistance)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    3    9   14   16
#> [2,]    3    0    6   11   13
#> [3,]    9    6    0    5    7
#> [4,]   14   11    5    0    2
#> [5,]   16   13    7    2    0

We get the minimal value from the matrix distance, that is, the distance between closer clusters.

minDistance <- minDistance(matrixDistance)
print(minDistance)
#> [1] 2

With the minimal distance, we look for the clusters with this distance separation. We take the clusters that will be joined.

groupedClusters <- getCluster(minDistance, matrixDistance)
print(groupedClusters)
#> [1] 4 5

These two clusters will create a new one.

updatedClusters <- newCluster(list, groupedClusters)
print(updatedClusters)
#> [[1]]
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    2    3    1
#> 
#> [[3]]
#>      [,1] [,2] [,3]
#> [1,]    4    7    1
#> 
#> [[4]]
#>      [,1] [,2] [,3]
#> [1,]    8    8    0
#> 
#> [[5]]
#>      [,1] [,2] [,3]
#> [1,]    8   10    0
#> 
#> [[6]]
#>      [,1] [,2] [,3]
#> [1,]    8    8    1
#> [2,]    8   10    1

We add the new cluster to the solution and repeat from step 2 to 5 until we get only one cluster.

The complete function that implement the algorithm is:

agglomerativeExample <- agglomerativeHC(dfData,'EUC','MAX')

plot(agglomerativeExample$dendrogram)

print(agglomerativeExample$clusters)
#> [[1]]
#>   X1 X2
#> 1  1  1
#> 
#> [[2]]
#>   X1 X2
#> 1  2  3
#> 
#> [[3]]
#>   X1 X2
#> 1  4  7
#> 
#> [[4]]
#>   X1 X2
#> 1  8  8
#> 
#> [[5]]
#>   X1 X2
#> 1  8 10
#> 
#> [[6]]
#>   X1 X2
#> 1  8  8
#> 2  8 10
#> 
#> [[7]]
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 
#> [[8]]
#>   X1 X2
#> 1  4  7
#> 2  8  8
#> 3  8 10
#> 
#> [[9]]
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 3  4  7
#> 4  8  8
#> 5  8 10
print(agglomerativeExample$groupedClusters)
#>   cluster1 cluster2
#> 1        4        5
#> 2        1        2
#> 3        3        6
#> 4        7        8

The package includes some auxiliar functions to implement the algorithm. These functions are:

A function that updates active clusters. If two clusters have been joined, they will not be used again as individual clusters.

cleanClusters <- usefulClusters(updatedClusters)
print(cleanClusters)
#> [[1]]
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    2    3    1
#> 
#> [[3]]
#>      [,1] [,2] [,3]
#> [1,]    4    7    1
#> 
#> [[4]]
#> NULL
#> 
#> [[5]]
#> NULL
#> 
#> [[6]]
#>      [,1] [,2] [,3]
#> [1,]    8    8    1
#> [2,]    8   10    1

Two functions that calculate the distance between clusters using distance and approach values given.

distances <- c(2,4,6,8)

clusterDistanceByApproach <- clusterDistanceByApproach(distances,'AVG')
print(clusterDistanceByApproach)
#> [1] 5

“clusterDistanceByApproach” get the value using approach type. This type could be “MAX”,“MIN”, and “AVG”

clusterDistance <- clusterDistance(cluster1,cluster2,'MAX','MAN')
print(clusterDistance)
#> [1] 3

“clusterDistance” get the distance value between each element from one cluster to the other ones using distance type. This type could be “EUC”, “MAN”, “CAN”, “CHE”, and “OCT”

Agglomerative Hierarchical Clustering .DETAILS

This algorithm explains every function.

How clusters are initialized to be used. Initial data could be different R types (vector, matrix or data.frame)

list <- toList.details(vectorData)
#>   'toList' creates a list initializing datas by creating clusters with each one

# list <- toList(matrixData)

# list <- toList(dfData)

print(list)
#> [[1]]
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    2    3    1
#> 
#> [[3]]
#>      [,1] [,2] [,3]
#> [1,]    4    7    1
#> 
#> [[4]]
#>      [,1] [,2] [,3]
#> [1,]    8    8    1
#> 
#> [[5]]
#>      [,1] [,2] [,3]
#> [1,]    8   10    1

How the matrix distance is created.

matrixDistance <- mdAgglomerative.details(list,'MAN','AVG')
#> 
#>  'mdAgglomerative' creates the matrix distance of every cluster given by 'list' parameter.
#> 
#>  It usesMANdistance andAVGapproach.
#> 
#>  It returns the matrix with all the distances between clusters depending on distance and approach.
#> 
#>  The matrix distance is:
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    3    9   14   16
#> [2,]    3    0    6   11   13
#> [3,]    9    6    0    5    7
#> [4,]   14   11    5    0    2
#> [5,]   16   13    7    2    0

Choosing the minimal distance avoiding cero values.

minDistance <- minDistance.details(matrixDistance)
#> 
#>  'minDistance' function gets the minimal value from a matrix.
#> 
#>  It returns the minimal value avoiding 0 values.
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    3    9   14   16
#> [2,]    3    0    6   11   13
#> [3,]    9    6    0    5    7
#> [4,]   14   11    5    0    2
#> [5,]   16   13    7    2    0
#> 
#> 
#>  2 is the minimal value of the matrix.

Using the minimal distance, look for the clusters with this distance.

groupedClusters <- getCluster.details(minDistance, matrixDistance)
#> 
#>  'getCluster' method searches the clusters which have the distance given.
#>  Search for 2 in:
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    3    9   14   16
#> [2,]    3    0    6   11   13
#> [3,]    9    6    0    5    7
#> [4,]   14   11    5    0    2
#> [5,]   16   13    7    2    0
#> 
#>  The clusters with the minimum distance are: 4, 5

With the clusters, it creates a new one and remove the previous from the initial list.

updatedClusters <- newCluster.details(list, groupedClusters)
#> 
#>  'newCluster' function creates a new cluster from the clusters given.
#> 
#>  It adds the new cluster to 'list' and disables the clusters used to create the new one.
#> 
#>  Using 4 and 5 it searches the clusters in 'list' parameter and creates a new one with 
#>  their components.
#> 
#>  After the new cluster is created, the initial clusters must be disabled.
#> 
#>  The new cluster:
#>      [,1] [,2] [,3]
#> [1,]    8    8    1
#> [2,]    8   10    1
#> 
#> 
#>  is added to the list:

We add the new cluster to the solution and repeat from step 2 to 5 until we get only one cluster.

The complete function that explains the algorithm is:

agglomerativeExample <- agglomerativeHC.details(vectorData,'EUC','MAX')
#>   Agglomerative hierarchical clustering is a classification technique that initializes 
#>  a cluster for each data.
#> 
#>   It calculates the distance between datas depending on the approach type given and
#> 
#>  it creates a new cluster joining the most similar clusters until getting only one.
#>   'toList' creates a list initializing datas by creating clusters with each one
#> 
#>   These are the clusters with only one element:
#> [[1]]
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    2    3    1
#> 
#> [[3]]
#>      [,1] [,2] [,3]
#> [1,]    4    7    1
#> 
#> [[4]]
#>      [,1] [,2] [,3]
#> [1,]    8    8    1
#> 
#> [[5]]
#>      [,1] [,2] [,3]
#> [1,]    8   10    1
#> 
#>   In each step: 
#>    - It calculates a matrix distance between active clusters depending on the approach and distance type.
#>    - It gets the minimum distance value from the matrix.
#>    - It creates a new cluster joining the minimum distance clusters.
#>    - It repeats these steps while final clusters do not include all datas.
#> 
#> _____________________________________________________________________________________________
#> STEP => 1
#> 
#>  Matrix Distance (distance type = EUC, approach type = MAX):
#>           [,1]     [,2]     [,3]     [,4]      [,5]
#> [1,]  0.000000 2.236068 6.708204 9.899495 11.401754
#> [2,]  2.236068 0.000000 4.472136 7.810250  9.219544
#> [3,]  6.708204 4.472136 0.000000 4.123106  5.000000
#> [4,]  9.899495 7.810250 4.123106 0.000000  2.000000
#> [5,] 11.401754 9.219544 5.000000 2.000000  0.000000
#> 
#>  The minimum distance is: 2
#> 
#>  The closest clusters are: 4, 5
#> 
#>  The grouped clusters are added to the solution.
#> 
#>  Grouping clusters 4 and cluster 5, it is created a new cluster:
#>   X1 X2
#> 1  8  8
#> 2  8 10
#> 
#>  The new cluster is added to the solution.
#> 
#> _____________________________________________________________________________________________
#> STEP => 2
#> 
#>  Matrix Distance (distance type = EUC, approach type = MAX):
#>           [,1]     [,2]     [,3] [,4] [,5]      [,6]
#> [1,]  0.000000 2.236068 6.708204    0    0 11.401754
#> [2,]  2.236068 0.000000 4.472136    0    0  9.219544
#> [3,]  6.708204 4.472136 0.000000    0    0  5.000000
#> [4,]  0.000000 0.000000 0.000000    0    0  0.000000
#> [5,]  0.000000 0.000000 0.000000    0    0  0.000000
#> [6,] 11.401754 9.219544 5.000000    0    0  0.000000
#> 
#>  The minimum distance is: 2.23606797749979
#> 
#>  The closest clusters are: 1, 2
#> 
#>  The grouped clusters are added to the solution.
#> 
#>  Grouping clusters 1 and cluster 2, it is created a new cluster:
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 
#>  The new cluster is added to the solution.
#> 
#> _____________________________________________________________________________________________
#> STEP => 3
#> 
#>  Matrix Distance (distance type = EUC, approach type = MAX):
#>      [,1] [,2]     [,3] [,4] [,5]     [,6]      [,7]
#> [1,]    0    0 0.000000    0    0  0.00000  0.000000
#> [2,]    0    0 0.000000    0    0  0.00000  0.000000
#> [3,]    0    0 0.000000    0    0  5.00000  6.708204
#> [4,]    0    0 0.000000    0    0  0.00000  0.000000
#> [5,]    0    0 0.000000    0    0  0.00000  0.000000
#> [6,]    0    0 5.000000    0    0  0.00000 11.401754
#> [7,]    0    0 6.708204    0    0 11.40175  0.000000
#> 
#>  The minimum distance is: 5
#> 
#>  The closest clusters are: 3, 6
#> 
#>  The grouped clusters are added to the solution.
#> 
#>  Grouping clusters 3 and cluster 6, it is created a new cluster:
#>   X1 X2
#> 1  4  7
#> 2  8  8
#> 3  8 10
#> 
#>  The new cluster is added to the solution.
#> 
#> _____________________________________________________________________________________________
#> STEP => 4
#> 
#>  Matrix Distance (distance type = EUC, approach type = MAX):
#>      [,1] [,2] [,3] [,4] [,5] [,6]     [,7]     [,8]
#> [1,]    0    0    0    0    0    0  0.00000  0.00000
#> [2,]    0    0    0    0    0    0  0.00000  0.00000
#> [3,]    0    0    0    0    0    0  0.00000  0.00000
#> [4,]    0    0    0    0    0    0  0.00000  0.00000
#> [5,]    0    0    0    0    0    0  0.00000  0.00000
#> [6,]    0    0    0    0    0    0  0.00000  0.00000
#> [7,]    0    0    0    0    0    0  0.00000 11.40175
#> [8,]    0    0    0    0    0    0 11.40175  0.00000
#> 
#>  The minimum distance is: 11.4017542509914
#> 
#>  The closest clusters are: 7, 8
#> 
#>  The grouped clusters are added to the solution.
#> 
#>  Grouping clusters 7 and cluster 8, it is created a new cluster:
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 3  4  7
#> 4  8  8
#> 5  8 10
#> 
#>  The new cluster is added to the solution.
#> 
#>  This loop has been repeated until the last cluster contained every single clusters.

Divisive Hierarchical Clustering

This algorithm uses some functions according to the theoretical process:

We prepare data to be used in the algorithms. We create a cluster with each values. They could be different R types (vector, matrix or data.frame)

 # list <- toListDivisive(vectorData)

# list <- toListDivisive(matrixData)

 list <- toListDivisive(dfData[1:4,])

print(list)
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    1
#> 
#> [[2]]
#>      [,1] [,2]
#> [1,]    2    3
#> 
#> [[3]]
#>      [,1] [,2]
#> [1,]    4    7
#> 
#> [[4]]
#>      [,1] [,2]
#> [1,]    8    8

With every cluster, the algorithm has to create all posible subclusters joining inicial clusters.

clustersList <- initClusters(list)
print(clustersList)
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    1
#> 
#> [[2]]
#>      [,1] [,2]
#> [1,]    2    3
#> 
#> [[3]]
#>      [,1] [,2]
#> [1,]    4    7
#> 
#> [[4]]
#>      [,1] [,2]
#> [1,]    8    8
#> 
#> [[5]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> 
#> [[6]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    4    7
#> 
#> [[7]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    8    8
#> 
#> [[8]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    4    7
#> 
#> [[9]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    8    8
#> 
#> [[10]]
#>      [,1] [,2]
#> [1,]    4    7
#> [2,]    8    8
#> 
#> [[11]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> 
#> [[12]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    8    8
#> 
#> [[13]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    4    7
#> [3,]    8    8
#> 
#> [[14]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    4    7
#> [3,]    8    8
#> 
#> [[15]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> [4,]    8    8

We calculate the matrix distance using clusters from second step. We use the distance and approach type that we prefer.

matrixDistance <- mdDivisive(clustersList,'MAN','AVG',list)
print(matrixDistance)
#>           [,1]     [,2]     [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#>  [1,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [2,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [3,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [4,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0    10
#>  [5,] 0.000000 0.000000 0.000000    0    0    0    0    0    0    10     0
#>  [6,] 0.000000 0.000000 0.000000    0    0    0    0    0    7     0     0
#>  [7,] 0.000000 0.000000 0.000000    0    0    0    0    7    0     0     0
#>  [8,] 0.000000 0.000000 0.000000    0    0    0    7    0    0     0     0
#>  [9,] 0.000000 0.000000 0.000000    0    0    7    0    0    0     0     0
#> [10,] 0.000000 0.000000 0.000000    0   10    0    0    0    0     0     0
#> [11,] 0.000000 0.000000 0.000000   10    0    0    0    0    0     0     0
#> [12,] 0.000000 0.000000 6.666667    0    0    0    0    0    0     0     0
#> [13,] 0.000000 6.666667 0.000000    0    0    0    0    0    0     0     0
#> [14,] 8.666667 0.000000 0.000000    0    0    0    0    0    0     0     0
#> [15,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>          [,12]    [,13]    [,14] [,15]
#>  [1,] 0.000000 0.000000 8.666667     0
#>  [2,] 0.000000 6.666667 0.000000     0
#>  [3,] 6.666667 0.000000 0.000000     0
#>  [4,] 0.000000 0.000000 0.000000     0
#>  [5,] 0.000000 0.000000 0.000000     0
#>  [6,] 0.000000 0.000000 0.000000     0
#>  [7,] 0.000000 0.000000 0.000000     0
#>  [8,] 0.000000 0.000000 0.000000     0
#>  [9,] 0.000000 0.000000 0.000000     0
#> [10,] 0.000000 0.000000 0.000000     0
#> [11,] 0.000000 0.000000 0.000000     0
#> [12,] 0.000000 0.000000 0.000000     0
#> [13,] 0.000000 0.000000 0.000000     0
#> [14,] 0.000000 0.000000 0.000000     0
#> [15,] 0.000000 0.000000 0.000000     0

We get the maximal value from the matrix distance, that is, the distance between far away clusters.

maxDistance <- maxDistance(matrixDistance)
print(maxDistance)
#> [1] 10

With the maximal distance, we look for the clusters with this distance separation. We take the clusters that will be divided.

dividedClusters <- getClusterDivisive(maxDistance, matrixDistance)
print(dividedClusters)
#> [1] 56

Two new subclusters will be created from the initial one and added to the solution.
We repeat from step 2 to 5 until any cluster could be divided again.

The complete function that implement the algorithm is:

divisiveExample <- divisiveHC(dfData[1:4,],'MAN','AVG')
print(divisiveExample)
#> [[1]]
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 3  4  7
#> 4  8  8
#> 
#> [[2]]
#>   X1 X2
#> 1  8  8
#> 
#> [[3]]
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 3  4  7
#> 
#> [[4]]
#>   X1 X2
#> 1  4  7
#> 
#> [[5]]
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 
#> [[6]]
#>   X1 X2
#> 1  1  1
#> 
#> [[7]]
#>   X1 X2
#> 1  2  3

The package uses the same auxiliar functions as the previous to implement the algorithm. These functions are:

clusterDistanceByApproach
clusterDistance
complementaryClusters: checks if the clusters we are going to divide are complementary, that is, every initial cluster is in one or in the other cluster, but never in both. This condition allows to not loose any cluster when the division is done.

data <- c(1,2,1,3,1,4,1,5)

components <- toListDivisive(data)

cluster1 <- matrix(c(1,2,1,3),ncol=2,byrow=TRUE)
cluster2 <- matrix(c(1,4,1,5),ncol=2,byrow=TRUE)
cluster3 <- matrix(c(1,6,1,7),ncol=2,byrow=TRUE)

complementaryClusters(components,cluster1,cluster2)
#> [1] TRUE

complementaryClusters(components,cluster1,cluster3)
#> [1] FALSE

Its “.details” version, explains how the functions checks this condition:

complementaryClusters.details(components,cluster1,cluster2)
#> 
#>  'complementaryClusters' checks if clusters are complementary.
#> 
#>  Each element from 'components' list has to be in one cluster, but never included in both.
#> 
#>  If every element is in one cluster, the function will return 'TRUE',
#> 
#>  if not, the result will be 'FALSE'
#> 
#>  The clusters to be checked are:
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    1    3
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    1    5
#> 
#>  And they have to include:
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    2
#> 
#> [[2]]
#>      [,1] [,2]
#> [1,]    1    3
#> 
#> [[3]]
#>      [,1] [,2]
#> [1,]    1    4
#> 
#> [[4]]
#>      [,1] [,2]
#> [1,]    1    5
#> 
#> 
#>  So, the result is TRUE.
#> [1] TRUE

Divisive Hierarchical Clustering .DETAILS

This algorithm explains every function.

How clusters are initialized to be used. Initial data could be different R types (vector, matrix or data.frame)

# list <- toListDivisive.details(vectorData)

# list <- toListDivisive(matrixData)

 list <- toListDivisive(dfData[1:4,])

print(list)
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    1
#> 
#> [[2]]
#>      [,1] [,2]
#> [1,]    2    3
#> 
#> [[3]]
#>      [,1] [,2]
#> [1,]    4    7
#> 
#> [[4]]
#>      [,1] [,2]
#> [1,]    8    8

How to create all posible clusters to be divided.

clustersList <- initClusters.details(list)
#> 
#>  'initClusters' method initializes the clusters used in the divisive algorithm.
#> 
#>  To know which are the most different clusters, we need to know the distance between 
#>  every possible clusters that could be created with the initial elements.
#> 
#>  This step is the most computationally complex, so it will make the algorithm to get the 
#>  solution with delay, or even, not to find a solution because of the computers capacities.
#> 
#>  The clusters created using
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    1
#> 
#> [[2]]
#>      [,1] [,2]
#> [1,]    2    3
#> 
#> [[3]]
#>      [,1] [,2]
#> [1,]    4    7
#> 
#> [[4]]
#>      [,1] [,2]
#> [1,]    8    8
#> 
#>  are:
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    1
#> 
#> [[2]]
#>      [,1] [,2]
#> [1,]    2    3
#> 
#> [[3]]
#>      [,1] [,2]
#> [1,]    4    7
#> 
#> [[4]]
#>      [,1] [,2]
#> [1,]    8    8
#> 
#> [[5]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> 
#> [[6]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    4    7
#> 
#> [[7]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    8    8
#> 
#> [[8]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    4    7
#> 
#> [[9]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    8    8
#> 
#> [[10]]
#>      [,1] [,2]
#> [1,]    4    7
#> [2,]    8    8
#> 
#> [[11]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> 
#> [[12]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    8    8
#> 
#> [[13]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    4    7
#> [3,]    8    8
#> 
#> [[14]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    4    7
#> [3,]    8    8
#> 
#> [[15]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> [4,]    8    8

How the matrix distance is created.

matrixDistance <- mdDivisive.details(clustersList,'MAN','AVG',list)
#> 
#>  'mdDivisive' creates a matrix distance including every cluster from 'list'. 
#> 
#> 
#>  The function checks if the clusters are not valid, if they are the same cluster and
#> 
#>  if the clusters are not complementary. It will allocate a 0 value if any condition is not 'TRUE'.
#> 
#> 
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    1
#> 
#> [[2]]
#>      [,1] [,2]
#> [1,]    2    3
#> 
#> [[3]]
#>      [,1] [,2]
#> [1,]    4    7
#> 
#> [[4]]
#>      [,1] [,2]
#> [1,]    8    8
#> 
#> [[5]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> 
#> [[6]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    4    7
#> 
#> [[7]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    8    8
#> 
#> [[8]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    4    7
#> 
#> [[9]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    8    8
#> 
#> [[10]]
#>      [,1] [,2]
#> [1,]    4    7
#> [2,]    8    8
#> 
#> [[11]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> 
#> [[12]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    8    8
#> 
#> [[13]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    4    7
#> [3,]    8    8
#> 
#> [[14]]
#>      [,1] [,2]
#> [1,]    2    3
#> [2,]    4    7
#> [3,]    8    8
#> 
#> [[15]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> [4,]    8    8
#> 
#>  The matrix distance for the list above is:
#>           [,1]     [,2]     [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#>  [1,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [2,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [3,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [4,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0    10
#>  [5,] 0.000000 0.000000 0.000000    0    0    0    0    0    0    10     0
#>  [6,] 0.000000 0.000000 0.000000    0    0    0    0    0    7     0     0
#>  [7,] 0.000000 0.000000 0.000000    0    0    0    0    7    0     0     0
#>  [8,] 0.000000 0.000000 0.000000    0    0    0    7    0    0     0     0
#>  [9,] 0.000000 0.000000 0.000000    0    0    7    0    0    0     0     0
#> [10,] 0.000000 0.000000 0.000000    0   10    0    0    0    0     0     0
#> [11,] 0.000000 0.000000 0.000000   10    0    0    0    0    0     0     0
#> [12,] 0.000000 0.000000 6.666667    0    0    0    0    0    0     0     0
#> [13,] 0.000000 6.666667 0.000000    0    0    0    0    0    0     0     0
#> [14,] 8.666667 0.000000 0.000000    0    0    0    0    0    0     0     0
#> [15,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>          [,12]    [,13]    [,14] [,15]
#>  [1,] 0.000000 0.000000 8.666667     0
#>  [2,] 0.000000 6.666667 0.000000     0
#>  [3,] 6.666667 0.000000 0.000000     0
#>  [4,] 0.000000 0.000000 0.000000     0
#>  [5,] 0.000000 0.000000 0.000000     0
#>  [6,] 0.000000 0.000000 0.000000     0
#>  [7,] 0.000000 0.000000 0.000000     0
#>  [8,] 0.000000 0.000000 0.000000     0
#>  [9,] 0.000000 0.000000 0.000000     0
#> [10,] 0.000000 0.000000 0.000000     0
#> [11,] 0.000000 0.000000 0.000000     0
#> [12,] 0.000000 0.000000 0.000000     0
#> [13,] 0.000000 0.000000 0.000000     0
#> [14,] 0.000000 0.000000 0.000000     0
#> [15,] 0.000000 0.000000 0.000000     0

Choosing the maximal distance, the far away clusters.

maxDistance <- maxDistance.details(matrixDistance)
#> 
#>  'maxDistance' function gets the maximal value from a matrix.
#> 
#>  It returns the maximal value avoiding 0 values.
#>           [,1]     [,2]     [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#>  [1,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [2,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [3,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [4,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0    10
#>  [5,] 0.000000 0.000000 0.000000    0    0    0    0    0    0    10     0
#>  [6,] 0.000000 0.000000 0.000000    0    0    0    0    0    7     0     0
#>  [7,] 0.000000 0.000000 0.000000    0    0    0    0    7    0     0     0
#>  [8,] 0.000000 0.000000 0.000000    0    0    0    7    0    0     0     0
#>  [9,] 0.000000 0.000000 0.000000    0    0    7    0    0    0     0     0
#> [10,] 0.000000 0.000000 0.000000    0   10    0    0    0    0     0     0
#> [11,] 0.000000 0.000000 0.000000   10    0    0    0    0    0     0     0
#> [12,] 0.000000 0.000000 6.666667    0    0    0    0    0    0     0     0
#> [13,] 0.000000 6.666667 0.000000    0    0    0    0    0    0     0     0
#> [14,] 8.666667 0.000000 0.000000    0    0    0    0    0    0     0     0
#> [15,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>          [,12]    [,13]    [,14] [,15]
#>  [1,] 0.000000 0.000000 8.666667     0
#>  [2,] 0.000000 6.666667 0.000000     0
#>  [3,] 6.666667 0.000000 0.000000     0
#>  [4,] 0.000000 0.000000 0.000000     0
#>  [5,] 0.000000 0.000000 0.000000     0
#>  [6,] 0.000000 0.000000 0.000000     0
#>  [7,] 0.000000 0.000000 0.000000     0
#>  [8,] 0.000000 0.000000 0.000000     0
#>  [9,] 0.000000 0.000000 0.000000     0
#> [10,] 0.000000 0.000000 0.000000     0
#> [11,] 0.000000 0.000000 0.000000     0
#> [12,] 0.000000 0.000000 0.000000     0
#> [13,] 0.000000 0.000000 0.000000     0
#> [14,] 0.000000 0.000000 0.000000     0
#> [15,] 0.000000 0.000000 0.000000     0
#> 
#> 
#>  10 is the maximal value of the matrix.

Using the maximal distance, look for the clusters with this distance.

dividedClusters <- getClusterDivisive.details(maxDistance, matrixDistance)
#> 
#>  'getCluster' method searches the cluster which have the distance to the target given.
#>  Search for 10 in:
#>           [,1]     [,2]     [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#>  [1,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [2,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [3,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [4,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0    10
#>  [5,] 0.000000 0.000000 0.000000    0    0    0    0    0    0    10     0
#>  [6,] 0.000000 0.000000 0.000000    0    0    0    0    0    7     0     0
#>  [7,] 0.000000 0.000000 0.000000    0    0    0    0    7    0     0     0
#>  [8,] 0.000000 0.000000 0.000000    0    0    0    7    0    0     0     0
#>  [9,] 0.000000 0.000000 0.000000    0    0    7    0    0    0     0     0
#> [10,] 0.000000 0.000000 0.000000    0   10    0    0    0    0     0     0
#> [11,] 0.000000 0.000000 0.000000   10    0    0    0    0    0     0     0
#> [12,] 0.000000 0.000000 6.666667    0    0    0    0    0    0     0     0
#> [13,] 0.000000 6.666667 0.000000    0    0    0    0    0    0     0     0
#> [14,] 8.666667 0.000000 0.000000    0    0    0    0    0    0     0     0
#> [15,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>          [,12]    [,13]    [,14] [,15]
#>  [1,] 0.000000 0.000000 8.666667     0
#>  [2,] 0.000000 6.666667 0.000000     0
#>  [3,] 6.666667 0.000000 0.000000     0
#>  [4,] 0.000000 0.000000 0.000000     0
#>  [5,] 0.000000 0.000000 0.000000     0
#>  [6,] 0.000000 0.000000 0.000000     0
#>  [7,] 0.000000 0.000000 0.000000     0
#>  [8,] 0.000000 0.000000 0.000000     0
#>  [9,] 0.000000 0.000000 0.000000     0
#> [10,] 0.000000 0.000000 0.000000     0
#> [11,] 0.000000 0.000000 0.000000     0
#> [12,] 0.000000 0.000000 0.000000     0
#> [13,] 0.000000 0.000000 0.000000     0
#> [14,] 0.000000 0.000000 0.000000     0
#> [15,] 0.000000 0.000000 0.000000     0
#> 
#>  The cluster with the minimum distance is: 56

We add the new clusters to the solution and repeat from step 2 to 5 until any cluster could be divided again.

The complete function that explains the algorithm is:

divisiveExample <- divisiveHC.details(dfData[1:4,],'MAN','AVG')
#> 
#>   Divisive hierarchical clustering is a classification technique that initializes a cluster 
#>  with every data.
#> 
#>   It calculates the distance between datas depending on the approach type given and
#> 
#>  it divides the most different clusters until any cluster can be divided again.
#> 
#>   These are the clusters with only one element:
#> [[1]]
#>   X1 X2
#> 1  1  1
#> 
#> [[2]]
#>   X1 X2
#> 1  2  3
#> 
#> [[3]]
#>   X1 X2
#> 1  4  7
#> 
#> [[4]]
#>   X1 X2
#> 1  8  8
#> 
#>   And this is the initial cluster with every element:
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 3  4  7
#> 4  8  8
#> 
#>   In each step: 
#>    - It calculates a matrix distance between valid clusters depending on the approach 
#>  and the distance types.
#>    - It gets the maximal distance value from the matrixes of every cluster. We have to 
#>  look for the most different clusters available  in every cluster.
#>    - It divides the selected cluster in two new and complementary clusters using tha maximal 
#>  distance between clusters.
#>    - It repeats these steps while there isn't any cluster that can be divided again.
#> 
#> _____________________________________________________________________________________________
#> STEP => 1
#> 
#>  The algorithm calculates a matrix distance from every active cluster, but it has to find 
#>  the maximal value from all matrix and then chooses which is the selected one.
#> 
#>  It gets the maximal value from every matrix and then chooses the maximal between them. So...
#> 
#>  Matrix Distance (distance type = MAN, approach type = AVG):
#>           [,1]     [,2]     [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#>  [1,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [2,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [3,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>  [4,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0    10
#>  [5,] 0.000000 0.000000 0.000000    0    0    0    0    0    0    10     0
#>  [6,] 0.000000 0.000000 0.000000    0    0    0    0    0    7     0     0
#>  [7,] 0.000000 0.000000 0.000000    0    0    0    0    7    0     0     0
#>  [8,] 0.000000 0.000000 0.000000    0    0    0    7    0    0     0     0
#>  [9,] 0.000000 0.000000 0.000000    0    0    7    0    0    0     0     0
#> [10,] 0.000000 0.000000 0.000000    0   10    0    0    0    0     0     0
#> [11,] 0.000000 0.000000 0.000000   10    0    0    0    0    0     0     0
#> [12,] 0.000000 0.000000 6.666667    0    0    0    0    0    0     0     0
#> [13,] 0.000000 6.666667 0.000000    0    0    0    0    0    0     0     0
#> [14,] 8.666667 0.000000 0.000000    0    0    0    0    0    0     0     0
#> [15,] 0.000000 0.000000 0.000000    0    0    0    0    0    0     0     0
#>          [,12]    [,13]    [,14] [,15]
#>  [1,] 0.000000 0.000000 8.666667     0
#>  [2,] 0.000000 6.666667 0.000000     0
#>  [3,] 6.666667 0.000000 0.000000     0
#>  [4,] 0.000000 0.000000 0.000000     0
#>  [5,] 0.000000 0.000000 0.000000     0
#>  [6,] 0.000000 0.000000 0.000000     0
#>  [7,] 0.000000 0.000000 0.000000     0
#>  [8,] 0.000000 0.000000 0.000000     0
#>  [9,] 0.000000 0.000000 0.000000     0
#> [10,] 0.000000 0.000000 0.000000     0
#> [11,] 0.000000 0.000000 0.000000     0
#> [12,] 0.000000 0.000000 0.000000     0
#> [13,] 0.000000 0.000000 0.000000     0
#> [14,] 0.000000 0.000000 0.000000     0
#> [15,] 0.000000 0.000000 0.000000     0
#> 
#>  The maximal distance is: 10
#> 
#>  The most distant clusters are:
#>   X1 X2
#> 1  8  8
#> 
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 3  4  7
#> 
#>  The divided clusters are added to the solution.
#> 
#>  - The second divided cluster is added to the active clusters list because it can be divided again. 
#> 
#> 
#>  If the divided clusters can't be divided again, then they don`t be added to the active clusters.
#> 
#> _____________________________________________________________________________________________
#> STEP => 2
#> 
#>  The algorithm calculates a matrix distance from every active cluster, but it has to find 
#>  the maximal value from all matrix and then chooses which is the selected one.
#> 
#>  It gets the maximal value from every matrix and then chooses the maximal between them. So...
#> 
#>  Matrix Distance (distance type = MAN, approach type = AVG):
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,]    0  0.0  0.0  0.0  0.0    6    0
#> [2,]    0  0.0  0.0  0.0  4.5    0    0
#> [3,]    0  0.0  0.0  7.5  0.0    0    0
#> [4,]    0  0.0  7.5  0.0  0.0    0    0
#> [5,]    0  4.5  0.0  0.0  0.0    0    0
#> [6,]    6  0.0  0.0  0.0  0.0    0    0
#> [7,]    0  0.0  0.0  0.0  0.0    0    0
#> 
#>  The maximal distance is: 7.5
#> 
#>  The most distant clusters are:
#>   X1 X2
#> 1  4  7
#> 
#>   X1 X2
#> 1  1  1
#> 2  2  3
#> 
#>  The divided clusters are added to the solution.
#> 
#>  - The second divided cluster is added to the active clusters list because it can be divided again. 
#> 
#> 
#>  If the divided clusters can't be divided again, then they don`t be added to the active clusters.
#> 
#> _____________________________________________________________________________________________
#> STEP => 3
#> 
#>  The algorithm calculates a matrix distance from every active cluster, but it has to find 
#>  the maximal value from all matrix and then chooses which is the selected one.
#> 
#>  It gets the maximal value from every matrix and then chooses the maximal between them. So...
#> 
#>  Matrix Distance (distance type = MAN, approach type = AVG):
#>      [,1] [,2] [,3]
#> [1,]    0    3    0
#> [2,]    3    0    0
#> [3,]    0    0    0
#> 
#>  The maximal distance is: 3
#> 
#>  The most distant clusters are:
#>   X1 X2
#> 1  1  1
#> 
#>   X1 X2
#> 1  2  3
#> 
#>  The divided clusters are added to the solution.
#> 
#>  If the divided clusters can't be divided again, then they don`t be added to the active clusters.
#> 
#>  This loop has been repeated until there aren't any active cluster, that is, any cluster 
#>  can be divided again.
print(divisiveExample)
#> [[1]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> [4,]    8    8
#> 
#> [[2]]
#>      [,1] [,2]
#> [1,]    8    8
#> 
#> [[3]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> [3,]    4    7
#> 
#> [[4]]
#>      [,1] [,2]
#> [1,]    4    7
#> 
#> [[5]]
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2    3
#> 
#> [[6]]
#>      [,1] [,2]
#> [1,]    1    1
#> 
#> [[7]]
#>      [,1] [,2]
#> [1,]    2    3

Correlative Hierarchical Clustering

This example shows how the algorithm works step by step. 1. Input data is initialized creating a cluster with each data frame row.

initData <- initData(cDataFrame)
print(initData)
#> [[1]]
#>      [,1] [,2] [,3]
#> [1,]    2    4    4
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    2    3    5
#> 
#> [[3]]
#>      [,1] [,2] [,3]
#> [1,]    1    1    2
#> 
#> [[4]]
#>      [,1] [,2] [,3]
#> [1,]    2    5    5
#> 
#> [[5]]
#>      [,1] [,2] [,3]
#> [1,]    1    0    1
#> 
#> [[6]]
#>      [,1] [,2] [,3]
#> [1,]    1    2    1
#> 
#> [[7]]
#>      [,1] [,2] [,3]
#> [1,]    2    4    5
#> 
#> [[8]]
#>      [,1] [,2] [,3]
#> [1,]    1    2    1

The algorithm checks if the input target is acceptable, if not, it initializes the target.

target <- c(1,2,3)

initTarget <- initTarget(target,cDataFrame)
print(initTarget)
#>      [,1] [,2] [,3]
#> [1,]    1    2    3

If users want it, the algorithm will normalize weight’s values.

weight <- c(5,7,6)

weights <- normalizeWeight(TRUE,weight,cDataFrame)
print(weights)
#> [1] 0.2777778 0.3888889 0.3333333

It calculates distances between clusters applying weights and distance definition given.

cluster1 <- matrix(c(1,2,3),ncol=3)
cluster2 <- matrix(c(2,5,8),ncol=3)

weight <- c(3,7,4)

distance <- distances(cluster1,cluster2,'CHE',weight)
print(distance)
#> [1] 21

Finally, the complete algorithm sorts the distances and sort the clusters aswell. It presents the solution as a sorted clusters list, with the distances or using a dendrogram.

target <- c(5,5,1)

weight <- c(3,7,5)

correlation <- correlationHC(cDataFrame, target,  weight)

print(correlation$sortedValues)
#>   cluster X1 X2 X3
#> 1       1  2  4  4
#> 2       4  2  5  5
#> 3       6  1  2  1
#> 4       8  1  2  1
#> 5       7  2  4  5
#> 6       2  2  3  5
#> 7       3  1  1  2
#> 8       5  1  0  1

print(correlation$distances)
#>   cluster sortedDistances
#> 1       1        2.294922
#> 2       4        2.670830
#> 3       6        2.720294
#> 4       8        2.720294
#> 5       7        2.756810
#> 6       2        3.000000
#> 7       3        3.316625
#> 8       5        3.855732

plot(correlation$dendrogram)

Correlative Hierarchical Clustering .DETAILS

This example shows how the algorithm works step by step.

How input data is initialized.

initData <- initData.details(cDataFrame)
#> 
#>  This function initializes the input data creating a cluster with each row of the data frame.
#> 
#>  It gets this data from the user:
#>   X1 X2 X3
#> 1  2  4  4
#> 2  2  3  5
#> 3  1  1  2
#> 4  2  5  5
#> 5  1  0  1
#> 6  1  2  1
#> 7  2  4  5
#> 8  1  2  1
#> 
#>  Each cluster will be a matrix with a row and the same columns as the initial data frame.
#> 
#>  Initialized data will be:
#> [[1]]
#>      [,1] [,2] [,3]
#> [1,]    2    4    4
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    2    3    5
#> 
#> [[3]]
#>      [,1] [,2] [,3]
#> [1,]    1    1    2
#> 
#> [[4]]
#>      [,1] [,2] [,3]
#> [1,]    2    5    5
#> 
#> [[5]]
#>      [,1] [,2] [,3]
#> [1,]    1    0    1
#> 
#> [[6]]
#>      [,1] [,2] [,3]
#> [1,]    1    2    1
#> 
#> [[7]]
#>      [,1] [,2] [,3]
#> [1,]    2    4    5
#> 
#> [[8]]
#>      [,1] [,2] [,3]
#> [1,]    1    2    1

How the algorithm checks if the input target is acceptable, and if not, how it initializes the target.

targetValid <- c(1,2,3)

targetInvalid <- c(1,2)

initTarget <- initTarget.details(targetValid,cDataFrame)
#> 
#>  This function initializes the target and checks if it is a valid target.
#> 
#>  It gets this target from the user:
#> [1] 1 2 3
#> 
#>  After transforming the target into matrix, it checks if it is acceptable
#> 
#>  If the target has the same columns as main data and only one row, it is a valid target!
#> 
#>  Target used will be:
#>      [,1] [,2] [,3]
#> [1,]    1    2    3

initTarget <- initTarget.details(targetInvalid,cDataFrame)
#> 
#>  This function initializes the target and checks if it is a valid target.
#> 
#>  It gets this target from the user:
#> [1] 1 2
#> 
#>  After transforming the target into matrix, it checks if it is acceptable
#> 
#>  If the target does not have the same columns as main data or more than one row, 
#>  it is not a valid target!
#> 
#>  The function will initialize as a '0's matrix.
#> 
#>  Target used will be:
#>      [,1] [,2] [,3]
#> [1,]    0    0    0

How the normalization process is done.

weight <- c(5,7,6)

weights <- normalizeWeight.details(TRUE,weight,cDataFrame)
#> 
#>  This function normalizes weight values.
#> 
#>  This are the initial weights:
#>   5
#>   7
#>   6
#> 
#>  It checks if there is a weights vector. If not, it creates a vector with 3 '1's.
#> 
#>  Due to the fact that 'normalize' = TRUE, weight vector changes every weight as the 
#>  initial value divided between the total sum of the vector.
#> 
#>  FinalWeight[i] = WeightValue[i]/TotalSum
#> 
#>  These are the new weights:
#>   0.277777777777778
#>   0.388888888888889
#>   0.333333333333333

weights <- normalizeWeight.details(FALSE,weight,cDataFrame)
#> 
#>  This function normalizes weight values.
#> 
#>  This are the initial weights:
#>   5
#>   7
#>   6
#> 
#>  It checks if there is a weights vector. If not, it creates a vector with 3 '1's.
#> 
#>  Due to the fact that 'normalize' = FALSE, weight vector does not change.
#> 
#>  These are the new weights:
#>   5
#>   7
#>   6

weights <- normalizeWeight.details(FALSE,NULL,cDataFrame)
#> 
#>  This function normalizes weight values.
#> 
#>  It checks if there is a weights vector. If not, it creates a vector with 3 '1's.
#> 
#>  Due to the fact that 'normalize' = FALSE, weight vector does not change.
#> 
#>  These are the new weights:
#>   1
#>   1
#>   1

How it calculates distances between clusters applying weights and distance definition given.

cluster1 <- matrix(c(1,2,3),ncol=3)
cluster2 <- matrix(c(2,5,8),ncol=3)

weight <- c(3,7,4)

distance <- distances.details(cluster1,cluster2,'CHE',weight)
#> 
#>  This function calculates CHE distance applying weights.
#> 
#>  It calculates the distance between:.
#>   X1 X2 X3
#> 1  1  2  3
#> 
#> 
#>   X1 X2 X3
#> 1  2  5  8
#> 
#>  Applying these weights:
#> [1] 3 7 4
#> 
#>  The distance value is: 21.

The complete function that explains the algorithm is:

target <- c(5,5,1)

weight <- c(3,7,5)

correlation <- correlationHC.details(cDataFrame, target,  weight)
#>   Correlation hierarchical function is a classification technique that initializes a cluster for each data.
#> 
#>   It calculates the distance between clusters and a target given depending on the distance type.
#> 
#>  The function applies weights to each property from main data to get weighted results.
#> 
#>  Due to normalized = TRUE, the initial weights change to a [0,1] values.
#> 
#>  These are the weight to be used:
#> [1] 0.2000000 0.4666667 0.3333333
#> 
#>  Initialized data are (more information about how to initialize data in 'initData.details'):
#> [[1]]
#>      [,1] [,2] [,3]
#> [1,]    2    4    4
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    2    3    5
#> 
#> [[3]]
#>      [,1] [,2] [,3]
#> [1,]    1    1    2
#> 
#> [[4]]
#>      [,1] [,2] [,3]
#> [1,]    2    5    5
#> 
#> [[5]]
#>      [,1] [,2] [,3]
#> [1,]    1    0    1
#> 
#> [[6]]
#>      [,1] [,2] [,3]
#> [1,]    1    2    1
#> 
#> [[7]]
#>      [,1] [,2] [,3]
#> [1,]    2    4    5
#> 
#> [[8]]
#>      [,1] [,2] [,3]
#> [1,]    1    2    1
#> 
#>  Initialized target is (more information about how to initialize data in 'initTarget.details'):
#>      [,1] [,2] [,3]
#> [1,]    5    5    1
#> 
#>  The function calculates the distances between each cluster and the target. It applies 
#>  weights and uses EUC distance type. 
#> 
#> 
#>  The calculated distances are:
#> [1] 2.294922 3.000000 3.316625 2.670830 3.855732 2.720294 2.756810 2.720294
#> 
#>  The previous distances sorted are:  
#> 
#> [1] 2.294922 2.670830 2.720294 2.720294 2.756810 3.000000 3.316625 3.855732
#> 
#>  Then, using sorted distances, the function order the clusters. 
#> 
#> 
#>  Finally, the sorted distances are:
#>   cluster sortedDistances
#> 1       1        2.294922
#> 2       4        2.670830
#> 3       6        2.720294
#> 4       8        2.720294
#> 5       7        2.756810
#> 6       2        3.000000
#> 7       3        3.316625
#> 8       5        3.855732
#> 
#>  The sorted clusters are:  
#> 
#>   cluster X1 X2 X3
#> 1       1  2  4  4
#> 2       4  2  5  5
#> 3       6  1  2  1
#> 4       8  1  2  1
#> 5       7  2  4  5
#> 6       2  2  3  5
#> 7       3  1  1  2
#> 8       5  1  0  1
#> 
#>  And the final dendrogram is on the image. 
#> 

print(correlation$sortedValues)
#>   cluster X1 X2 X3
#> 1       1  2  4  4
#> 2       4  2  5  5
#> 3       6  1  2  1
#> 4       8  1  2  1
#> 5       7  2  4  5
#> 6       2  2  3  5
#> 7       3  1  1  2
#> 8       5  1  0  1

print(correlation$distances)
#>   cluster sortedDistances
#> 1       1        2.294922
#> 2       4        2.670830
#> 3       6        2.720294
#> 4       8        2.720294
#> 5       7        2.756810
#> 6       2        3.000000
#> 7       3        3.316625
#> 8       5        3.855732

plot(correlation$dendrogram)

Welcome to ClientVPS Mirrors

Learning Clusterization

Datasets:

Distances

Agglomerative Hierarchical Clustering

Agglomerative Hierarchical Clustering .DETAILS

Divisive Hierarchical Clustering

Divisive Hierarchical Clustering .DETAILS

Correlative Hierarchical Clustering

Correlative Hierarchical Clustering .DETAILS