Articles - Advanced Clustering

Fuzzy Clustering Essentials

The fuzzy clustering is considered as soft clustering, in which each element has a probability of belonging to each cluster. In other words, each element has a set of membership coefficients corresponding to the degree of being in a given cluster.

This is different from k-means and k-medoid clustering, where each object is affected exactly to one cluster. K-means and k-medoids clustering are known as hard or non-fuzzy clustering.

In fuzzy clustering, points close to the center of a cluster, may be in the cluster to a higher degree than points in the edge of a cluster. The degree, to which an element belongs to a given cluster, is a numerical value varying from 0 to 1.

The fuzzy c-means (FCM) algorithm is one of the most widely used fuzzy clustering algorithms. The centroid of a cluster is calculated as the mean of all points, weighted by their degree of belonging to the cluster:

In this article, we’ll describe how to compute fuzzy clustering using the R software.

Related Book:

Required R packages

We’ll use the following R packages: 1) cluster for computing fuzzy clustering and 2) factoextra for visualizing clusters.

Computing fuzzy clustering

The function fanny() [cluster R package] can be used to compute fuzzy clustering. FANNY stands for fuzzy analysis clustering. A simplified format is:

fanny(x, k, metric = "euclidean", stand = FALSE)
  • x: A data matrix or data frame or dissimilarity matrix
  • k: The desired number of clusters to be generated
  • metric: Metric for calculating dissimilarities between observations
  • stand: If TRUE, variables are standardized before calculating the dissimilarities

The function fanny() returns an object including the following components:

  • membership: matrix containing the degree to which each observation belongs to a given cluster. Column names are the clusters and rows are observations
  • coeff: Dunn’s partition coefficient F(k) of the clustering, where k is the number of clusters. F(k) is the sum of all squared membership coefficients, divided by the number of observations. Its value is between 1/k and 1. The normalized form of the coefficient is also given. It is defined as \((F(k) - 1/k) / (1 - 1/k)\), and ranges between 0 and 1. A low value of Dunn’s coefficient indicates a very fuzzy clustering, whereas a value close to 1 indicates a near-crisp clustering.
  • clustering: the clustering vector containing the nearest crisp grouping of observations

For example, the R code below applies fuzzy clustering on the USArrests data set:

df <- scale(USArrests)     # Standardize the data
res.fanny <- fanny(df, 2)  # Compute fuzzy clustering with k = 2

The different components can be extracted using the code below:

head(res.fanny$membership, 3) # Membership coefficients
##          [,1]  [,2]
## Alabama 0.664 0.336
## Alaska  0.610 0.390
## Arizona 0.686 0.314
res.fanny$coeff # Dunn's partition coefficient
## dunn_coeff normalized 
##      0.555      0.109
head(res.fanny$clustering) # Observation groups
##    Alabama     Alaska    Arizona   Arkansas California   Colorado 
##          1          1          1          2          1          1

To visualize observation groups, use the function fviz_cluster() [factoextra package]:

fviz_cluster(res.fanny, ellipse.type = "norm", repel = TRUE,
             palette = "jco", ggtheme = theme_minimal(),
             legend = "right")

To evaluate the goodnesss of the clustering results, plot the silhouette coefficient as follow:

fviz_silhouette(res.fanny, palette = "jco",
                ggtheme = theme_minimal())
##   cluster size ave.sil.width
## 1       1   22          0.32
## 2       2   28          0.44


Fuzzy clustering is an alternative to k-means clustering, where each data point has membership coefficient to each cluster. Here, we demonstrated how to compute and visualize fuzzy clustering using the combination of cluster and factoextra R packages.