This page presents series of course videos on **clustering** methods for analyzing multivariate data. The main objective is either (i) to identify groups of individuals with a similar profile, or (ii) to partition individuals into several groups based on their characteristics.

The standard clustering methods include:

**Agglomerative hierarchical clustering**. Creates a hierarchical tree called dendrogram- Partitioning methods, such as
**K-means**clustering. Subdivides individuals into k-groups, k being the optimal number of groups to be defined by the analyst.

The below video courses start by presenting an introduction to hierarchical clustering and k-means approaches. A method for choosing the optimal number of groups is also shown. Next, a practical example in **R**, using the **FactoMineR** R package, is presented.

In FactoMineR, the function `HCPC()`

is used for clustering. *HCPC*() stands for *Hierarchical Clustering on Principal Components*. This function applies clustering methods (hierarchical clustering and k-Means) on the results of principal component methods (PCA, CA, MCA, FAM).

The HCPC approach allows us to combine the three standard methods used in multivariate data analyses:

- Principal component methods (PCA, CA, MCA, FAMD, MFA),
- Hierarchical clustering and
- Partitioning clustering, particularly the k-means method.

The HCPC can be useful in at least two situations:

When you have a large number of continuous variables in your data set, you can first use principal component analysis to reduce the dimensions. Next, you can apply HCPC on the PCA outputs. This can lead to a more stable clusters.

Clustering on categorical variables. In order to perform clustering on categorical variables, you can first apply CA or MCA on the data set. Finally, you can compute clustering on the output of CA or MCA using the HCPC method.

Contents:

HCPC - Hierarchical Clustering on Principal Components: Essentials

Introduction to hierarchical clustering and data types.

Apply hierarchical clustering on principal component method results (PCA, CA, MCA, FAM).

In this article, you’ll learn how **MFA** (**Multiple Factor Analysis**) works, as well as, how to easily compute and interpret **MFA** in **R** using the *FactoMineR* package.

Recall that MFA is a multivariate data analysis method for summarizing and visualizing a complex data table in which individuals are described by several sets of variables (quantitative and /or qualitative) structured into groups. The grouping can be due to information coming from different sources.

Contents:

This video describes the data format and the type of questions that can be investigated by the multiple factor analysis.

MFA can be considered as a type of PCA on a weighted matrix. The aim of the weighting is to balance the information provided by the different groups of variables. You’ll learn here how and why it’s important to balance the influence of each group of variables in the analysis.

MFA provides results on individuals and variables just like PCA does for quantitative variables, and MCA does for qualitative variables. the most important feature of MFA is that it can take into account several groups of variables.

Here, you’ll learn:

- how to compare information provided by each of these groups,
- what information is common to several groups,
- and what information is specific to certain groups.

In this video, you’ll learn how to take into account groups of qualitative variables. Then, you’ll see what to do when one or more groups of variables correspond to one or more contingency tables. And lastly, you’ll see which interpretation aids are useful for interpreting the results of an MFA.

**Factor analysis of mixed data** (**FAMD**) is dedicated to analyze a data set containing both *categorical* and *continuous* variables.

This article provides a quick start R code and video showing a practical example with interpretation FAMD in **R** using the **FactoMineR** package.

Rougthly, FAMD can be seen as a mixed between principal component analysis (PCA) and multiple correspondence analysis (MCA). It acts as PCA for quantitative variables and as MCA for qualitative variables.

FAMD allows one to study the similarities between individuals taking into account mixed variables and to study the relationships between all the variables (both qualitative and quantitative variables).

It also creates the graph of individuals, the correlation circle for the continuous variables and the plot of categories for categorical variables. Additionally, it produces specific graphs to visualize the relationship between both quantitative and qualitative variables.

Contents:

- Install FactoMineR package:

`install.packages("FactoMineR")`

- Compute FAMD using the demo data set
`wine`

[in FactoMineR]. This data set refers to 21 wine characteristics. We’ll compute FAMD with a subset of the data. Categorical and continuous variables are detected automatically.

```
library(FactoMineR)
data("wine")
df <- wine[, c(1, 2, 16, 22, 29, 28, 30,31)]
res.famd <- FAMD(df, graph = FALSE)
```

- Visualize eigenvalues (
*scree plot*). Show the percentage of variances explained by each principal component.

```
eig.val <- res.famd$eig
barplot(eig.val[, 2],
names.arg = 1:nrow(eig.val),
main = "Variances Explained by Dimensions (%)",
xlab = "Principal Dimensions",
ylab = "Percentage of variances",
col ="steelblue")
# Add connected line segments to the plot
lines(x = 1:nrow(eig.val), eig.val[, 2],
type = "b", pch = 19, col = "red")
```

- Graph of individuals. Qualitative variable categories are shown in bold.

`plot(res.famd, choix = "ind")`

- Correlation between variables (qualitative and quantitative) with principal dimensions:

`plot(res.famd, choix = "var")`

- Correlation circle of quantitative variables:

`plot(res.famd, choix = "quanti")`

This article presents quick start R code and video series for computing **MCA** (*Multiple Correspondence Analysis*) in R, using the **FactoMineR** package. Recall that MCA is used for analyzing multivarariate data sets containing *categorical variables*, such as survey data.

Contents:

- Install FactoMineR package:

`install.packages("FactoMineR")`

- Compute MCA using the demo data set
`poison`

[in FactoMineR]. This data set refers to a survey carried out on a sample of children of primary school who suffered from food poisoning. They were asked about their symptoms and about what they ate.

```
library(FactoMineR)
data("poison")
res.mca <- MCA(poison,
quanti.sup = 1:2, # Supplementary quantitative variable
quali.sup = 3:4, # Supplementary qualitative variable
graph=FALSE)
```

Key terms:

- Active individuals and variables are used during the MCA.
- Supplementary individuals and variables: their coordinates will be predicted after the MCA.

- Visualize eigenvalues (
*scree plot*). Show the percentage of variances explained by each principal component.

```
eig.val <- res.mca$eig
barplot(eig.val[, 2],
names.arg = 1:nrow(eig.val),
main = "Variances Explained by Dimensions (%)",
xlab = "Principal Dimensions",
ylab = "Percentage of variances",
col ="steelblue")
# Add connected line segments to the plot
lines(x = 1:nrow(eig.val), eig.val[, 2],
type = "b", pch = 19, col = "red")
```

- Biplot of individuals and variables showing the link between them.

`plot(res.mca, autoLab = "yes")`

`Blue`

: Individuals`red`

: Variables`dark.green`

: Qualitative supplementary variable color

- Graph of individuals. Individuals with a similar profile are grouped together. Use the argument
`invisible`

to hide active and supplementary variables on the plot.

```
plot(res.mca,
invisible = c("var", "quali.sup", "quanti.sup"),
cex = 0.8,
autoLab = "yes")
```

- Graph of active variables. Use the argument
`invisible`

to hide individuals and supplementary variables on the plot

```
plot(res.mca,
invisible = c("ind", "quali.sup", "quanti.sup"),
cex = 0.8,
autoLab = "yes")
```

- Color individuals by groups and add confidence ellipses around the mean of groups.

`plotellipses(res.mca, keepvar = c("Vomiting", "Fish"))`

For ggplot2-based visualization, read this: MCA - Multiple Correspondence Analysis in R: Essentials

- Access to the results:

```
# Eigenvalues
res.mca$eig
# Results for active Variables
res.var <- res.mca$var
res.var$coord # Coordinates
res.var$contrib # Contributions to the PCs
res.var$cos2 # Quality of representation
# Results for qualitative supp. variables
res.mca$quali.sup
# Results for active individuals
res.ind <- res.mca$var
res.ind$coord # Coordinates
res.ind$contrib # Contributions to the PCs
res.ind$cos2 # Quality of representation
```

The following series of video explains the basics of MCA and show practical examples and interpretation in R.

This video describes the data format and the goals of MCA.

This video shows how to build the point cloud of rows/individuals and, how to interpret it using the variable’s categories.

In this video, you’ll learn how to build point clouds of categories, as well as, how to get an optimal representation of them. You will discover the link between the optimal representation of individuals and the optimal representation of categories.

This video describes some interpretation aids, shared by all principal component methods. Additionally, it shows how to use supplementary information, including supplementary variables, in MCA.

This video show how to handle missing values in MCA using missMDA and FactoMineR packages

The FactoInvestigate R package makes it possible to generate automatically a report for principal component analysis. Learn more in our previous article: FactoInvestigate R Package: Automatic Reports and Interpretation of Principal Component Analyses

This page shows quick start R code to compute *correspondence analysis*- **CA in R** using the *FactoMineR* package.

Additionaly, we present series of course videos on correspondence analysis, which is a multivariate analysis tool for analyzing large contingency tables formed by two *categorical variables*. The aim isto study the association between row and column elements.

In the videos, the instructors start by explaining the theory and the key concept behind CA. Next, they provide provide practical examples and interpretation of CA in R programming language.

Contents:

- Install FactoMineR package:

`install.packages("FactoMineR")`

- Compute CA using the demo data set children [in FactoMineR]. The data set is a contingency table that summarizes the answers given by different categories of people to the following question : according to you, what are the reasons that can make hesitate a woman or a couple to have children?

```
library(FactoMineR)
data("children")
res.ca <- CA(children,
row.sup = 15:18, # Supplementary rows
col.sup = 6:8, # Supplementary columns
graph = FALSE)
```

Key terms:

- Active rows and columns are used during the correspondence analysis.
- Supplementary rows and columns: their coordinates will be predicted after the CA.

- Visualize eigenvalues (
*scree plot*). Show the percentage of variances explained by each principal component.

```
eig.val <- res.ca$eig
barplot(eig.val[, 2],
names.arg = 1:nrow(eig.val),
main = "Variances Explained by Dimensions (%)",
xlab = "Principal Dimensions",
ylab = "Percentage of variances",
col ="steelblue")
# Add connected line segments to the plot
lines(x = 1:nrow(eig.val), eig.val[, 2],
type = "b", pch = 19, col = "red")
```

- Biplot of row and column variables showing the association between row and column elements.

`plot(res.ca, autoLab = "yes")`

`blue`

: row points`darkblue`

: supplementary rows`red`

: column points`darkred`

: supplementary columns

To plot only the row variables, specify the argumenet `invisible = “col”`

. For column variables, type `invisible = “row”`

.

For ggplot2-based visualization, read this: CA - Correspondence Analysis in R: Essentials

- Access to the results:

```
# Eigenvalues
res.ca$eig
# Results for row variables
res.row <- res.ca$row
res.row$coord # Coordinates
res.row$contrib # Contributions to the PCs
res.row$cos2 # Quality of representation
# Results for column variables
res.col <- res.ca$col
res.col$coord # Coordinates
res.col$contrib # Contributions to the PCs
res.col$cos2 # Quality of representation
```

This video describes the data and key notations, as well as, the questions that can be investigated by correspondence analysis. You’ll we see that the main point of correspondence analysis is studying the links between pairs of qualitative variables.

This video presents how to plot row and column points on the same graph.

This video presents the importance of inertia in the interpretation of correspondence analysis.

This video describes how to plot simultaneously row and column elements on the same plot.

This video introduces the concept of `quality of representation`

and `contribution`

.

The package Factoshiny provides user graphical interface for correspondence analysis.

The package FactoInvestigate can be used to generate automatically a report for correspondence analysis.

Learn more in our previous article: FactoInvestigate R Package: Automatic Reports and Interpretation of Principal Component Analyses

This article starts by providing a quick start R code for computing **PCA in R**, using the **FactoMineR**, and continues by presenting series of **PCA** *video* *courses* (by François Husson).

Recall that PCA (*Principal Component Analysis*) is a multivariate data analysis method that allows us to summarize and visualize the information contained in a large data sets of quantitative variables.

In these video courses, the instructor François Husson, walks through the theory behind PCA (first 3 videos), to practical **PCA examples** in R programming language using the FactoMineR package. He also presents how to interpret the results.

François Husson, continues the course by presenting convenient solution to handle situations where the data contain `missing values`

. This is made possible thanks to the `missMDA`

R package.

Additionally, he presents the `Factoshiny`

package, which provide an easy to use graphical interface to perform PCA. This is very useful for users with non advanced programming background.

He finishes by presenting the `FactoInvestigate`

R package, which makes it easy to generate automatically a report - in HTML, PDF or Word formats - containing the PCA outputs and interpretation.

Contents:

- Install FactoMineR package:

`install.packages("FactoMineR")`

- Compute PCA using the demo data set USArrests. The data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973.

```
library(FactoMineR)
data("USArrests")
res.pca <- PCA(USArrests, graph = FALSE)
```

- Visualize eigenvalues (
*scree plot*). Show the percentage of variances explained by each principal component.

```
eig.val <- res.pca$eig
barplot(eig.val[, 2],
names.arg = 1:nrow(eig.val),
main = "Variances Explained by PCs (%)",
xlab = "Principal Components",
ylab = "Percentage of variances",
col ="steelblue")
# Add connected line segments to the plot
lines(x = 1:nrow(eig.val), eig.val[, 2],
type = "b", pch = 19, col = "red")
```

- Visualize the graph of individuals. Individuals with a similar profile are grouped together.

`plot(res.pca, choix = "ind", autoLab = "yes")`

- Visualize the graph of variables. Positive correlated variables point to the same side of the plot. Negative correlated variables point to opposite sides of the graph.

`plot(res.pca, choix = "var", autoLab = "yes")`

For ggplot2-based visualization, read this: PCA - Principal Component Analysis Essentials

- Access to the results:

```
# Eigenvalues
res.pca$eig
# Results for Variables
res.var <- res.pca$var
res.var$coord # Coordinates
res.var$contrib # Contributions to the PCs
res.var$cos2 # Quality of representation
# Results for individuals
res.ind <- res.pca$var
res.ind$coord # Coordinates
res.ind$contrib # Contributions to the PCs
res.ind$cos2 # Quality of representation
```

This video presents the type of data to be used for principal component analysis and defines some useful notations. It introduces also the type of questions one can investigate with PCA.

This video presents how individuals and variables coordinates are calculated and visualized.

This video presents tips and tricks that help to interpret the output of PCA.

This video show how to handle missing values in PCA using missMDA and FactoMineR packages

The FactoInvestigate R package makes it possible to generate automatically a report for principal component analysis. Learn more in our previous article: FactoInvestigate R Package: Automatic Reports and Interpretation of Principal Component Analyses