This page presents series of course videos on clustering methods for analyzing multivariate data. The main objective is either (i) to identify groups of individuals with a similar profile, or (ii) to partition individuals into several groups based on their characteristics.
The standard clustering methods include:
- Agglomerative hierarchical clustering. Creates a hierarchical tree called dendrogram
- Partitioning methods, such as K-means clustering. Subdivides individuals into k-groups, k being the optimal number of groups to be defined by the analyst.
The below video courses start by presenting an introduction to hierarchical clustering and k-means approaches. A method for choosing the optimal number of groups is also shown. Next, a practical example in R, using the FactoMineR R package, is presented.
In FactoMineR, the function
HCPC() is used for clustering. HCPC() stands for Hierarchical Clustering on Principal Components. This function applies clustering methods (hierarchical clustering and k-Means) on the results of principal component methods (PCA, CA, MCA, FAM).
The HCPC approach allows us to combine the three standard methods used in multivariate data analyses:
- Principal component methods (PCA, CA, MCA, FAMD, MFA),
- Hierarchical clustering and
- Partitioning clustering, particularly the k-means method.
The HCPC can be useful in at least two situations:
When you have a large number of continuous variables in your data set, you can first use principal component analysis to reduce the dimensions. Next, you can apply HCPC on the PCA outputs. This can lead to a more stable clusters.
Clustering on categorical variables. In order to perform clustering on categorical variables, you can first apply CA or MCA on the data set. Finally, you can compute clustering on the output of CA or MCA using the HCPC method.
R code: Quick start guide
Theory and key concepts
Hierarchical clustering: Introduction
Introduction to hierarchical clustering and data types.
Hierarchical clustering: Examples and Choosing the number of clusters
K-means algorithm: A partitioning method
Identifying and describing cluster features
Practical example in R using FactoMineR
Apply hierarchical clustering on principal component method results (PCA, CA, MCA, FAM).