PCA in R Using FactoMineR: Quick Scripts and Videos

This article starts by providing a quick start R code for computing PCA in R, using the FactoMineR, and continues by presenting series of PCA video courses (by François Husson).

Recall that PCA (Principal Component Analysis) is a multivariate data analysis method that allows us to summarize and visualize the information contained in a large data sets of quantitative variables.

In these video courses, the instructor François Husson, walks through the theory behind PCA (first 3 videos), to practical PCA examples in R programming language using the FactoMineR package. He also presents how to interpret the results.

François Husson, continues the course by presenting convenient solution to handle situations where the data contain missing values. This is made possible thanks to the missMDA R package.

Additionally, he presents the Factoshiny package, which provide an easy to use graphical interface to perform PCA. This is very useful for users with non advanced programming background.

He finishes by presenting the FactoInvestigate R package, which makes it easy to generate automatically a report - in HTML, PDF or Word formats - containing the PCA outputs and interpretation.

PCA in R using FactoMineR: Video course

Contents:

Quick start R code
Theory and key concepts
PCA examples in R
Further reading
Related Books

Quick start R code

Install FactoMineR package:

install.packages("FactoMineR")

Compute PCA using the demo data set USArrests. The data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973.

library(FactoMineR)
data("USArrests")
res.pca <- PCA(USArrests, graph = FALSE)

Visualize eigenvalues (scree plot). Show the percentage of variances explained by each principal component.

eig.val <- res.pca$eig
barplot(eig.val[, 2], 
        names.arg = 1:nrow(eig.val), 
        main = "Variances Explained by PCs (%)",
        xlab = "Principal Components",
        ylab = "Percentage of variances",
        col ="steelblue")
# Add connected line segments to the plot
lines(x = 1:nrow(eig.val), eig.val[, 2], 
      type = "b", pch = 19, col = "red")

Visualize the graph of individuals. Individuals with a similar profile are grouped together.

plot(res.pca, choix = "ind", autoLab = "yes")

Visualize the graph of variables. Positive correlated variables point to the same side of the plot. Negative correlated variables point to opposite sides of the graph.

plot(res.pca, choix = "var", autoLab = "yes")

For ggplot2-based visualization, read this: PCA - Principal Component Analysis Essentials

Access to the results:

# Eigenvalues
res.pca$eig
  
# Results for Variables
res.var <- res.pca$var
res.var$coord          # Coordinates
res.var$contrib        # Contributions to the PCs
res.var$cos2           # Quality of representation 
# Results for individuals
res.ind <- res.pca$var
res.ind$coord          # Coordinates
res.ind$contrib        # Contributions to the PCs
res.ind$cos2           # Quality of representation

Theory and key concepts

Data types

This video presents the type of data to be used for principal component analysis and defines some useful notations. It introduces also the type of questions one can investigate with PCA.

Studying individuals and variables

This video presents how individuals and variables coordinates are calculated and visualized.

PCA interpretation

This video presents tips and tricks that help to interpret the output of PCA.

Course video materials

PCA examples in R

PCA in practice with FactoMineR

Handling missing values

This video show how to handle missing values in PCA using missMDA and FactoMineR packages

PCA Graphical user interface: Factoshiny

Automatic interpretation: FactoInvestigate

The FactoInvestigate R package makes it possible to generate automatically a report for principal component analysis. Learn more in our previous article: FactoInvestigate R Package: Automatic Reports and Interpretation of Principal Component Analyses

1	2	3
Practical Guide to Principal Component Methods in R	Exploratory Multivariate Analysis by Example Using R	Practical Guide to Cluster Analysis in R

Recommended for You!

Machine Learning Essentials: Practical Guide in R

Practical Guide to Cluster Analysis in R

Practical Guide to Principal Component Methods in R

R Graphics Essentials for Great Data Visualization

Network Analysis and Visualization in R

More books on R and data science

Recommended for you

This section contains the best data science and self-development resources to help you on your path.

Books - Data Science

Our Books

Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
Network Analysis and Visualization in R by A. Kassambara (Datanovia)
Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet

Comments

You are not authorized to post a comment

Comment

Visitor

#518 06/14/2018 at 15h06

Mr. Husson

I am working on task to trend telecommunication network performance and idea is to use PCA methodology to define performance trend.
In order to make it happens we want to preserve “birth certificate”. Our “birth certificate” definition is - measured telemetry principal component assignment during initial system startup.
To be able to trace/trend network changes, we need to be able to assign to the new cloud of the telemetry measurements the same plane, which goes through center of gravity at the birth.

Could you help us with R software code to make it happens?
Thanks
Igor Tavrovsky
igor.tavrovsky@cox.com

STAY UPDATED

Articles - Principal Component Methods: Videos