This page shows quick start R code to compute correspondence analysis- CA in R using the FactoMineR package.
Additionaly, we present series of course videos on correspondence analysis, which is a multivariate analysis tool for analyzing large contingency tables formed by two categorical variables. The aim isto study the association between row and column elements.
In the videos, the instructors start by explaining the theory and the key concept behind CA. Next, they provide provide practical examples and interpretation of CA in R programming language.
- Quick start R code
- Theory and key concepts
- Correspondence analysis examples in R
- Graphical user interface: Factoshiny
- Automatic interpretation: FactoInvestigate
- Further reading
- Related Books
Quick start R code
- Install FactoMineR package:
- Compute CA using the demo data set children [in FactoMineR]. The data set is a contingency table that summarizes the answers given by different categories of people to the following question : according to you, what are the reasons that can make hesitate a woman or a couple to have children?
library(FactoMineR) data("children") res.ca <- CA(children, row.sup = 15:18, # Supplementary rows col.sup = 6:8, # Supplementary columns graph = FALSE)
- Active rows and columns are used during the correspondence analysis.
- Supplementary rows and columns: their coordinates will be predicted after the CA.
- Visualize eigenvalues (scree plot). Show the percentage of variances explained by each principal component.
eig.val <- res.ca$eig barplot(eig.val[, 2], names.arg = 1:nrow(eig.val), main = "Variances Explained by Dimensions (%)", xlab = "Principal Dimensions", ylab = "Percentage of variances", col ="steelblue") # Add connected line segments to the plot lines(x = 1:nrow(eig.val), eig.val[, 2], type = "b", pch = 19, col = "red")
- Biplot of row and column variables showing the association between row and column elements.
plot(res.ca, autoLab = "yes")
blue: row points
darkblue: supplementary rows
red: column points
darkred: supplementary columns
To plot only the row variables, specify the argumenet
invisible = “col”. For column variables, type
invisible = “row”.
For ggplot2-based visualization, read this: CA - Correspondence Analysis in R: Essentials
- Access to the results:
# Eigenvalues res.ca$eig # Results for row variables res.row <- res.ca$row res.row$coord # Coordinates res.row$contrib # Contributions to the PCs res.row$cos2 # Quality of representation # Results for column variables res.col <- res.ca$col res.col$coord # Coordinates res.col$contrib # Contributions to the PCs res.col$cos2 # Quality of representation
Theory and key concepts
Introduction and data types
This video describes the data and key notations, as well as, the questions that can be investigated by correspondence analysis. You’ll we see that the main point of correspondence analysis is studying the links between pairs of qualitative variables.
Visualizing the row and column clouds
This video presents how to plot row and column points on the same graph.
This video presents the importance of inertia in the interpretation of correspondence analysis.
This video describes how to plot simultaneously row and column elements on the same plot.
This video introduces the concept of
quality of representation and
Correspondence analysis examples in R
CA in practice with FactoMineR
Text mining with correspondence analysis
Graphical user interface: Factoshiny
The package Factoshiny provides user graphical interface for correspondence analysis.
Automatic interpretation: FactoInvestigate
The package FactoInvestigate can be used to generate automatically a report for correspondence analysis.
Learn more in our previous article: FactoInvestigate R Package: Automatic Reports and Interpretation of Principal Component Analyses