# Articles - Principal Component Methods in R: Practical Guide

## PCA in R Using Ade4: Quick Scripts

This article provides quick start R codes to compute principal component analysis (PCA) using the function dudi.pca() in the ade4 R package. We’ll use the factoextra R package to visualize the PCA results. We’ll describe also how to predict the coordinates for new individuals / variables data using ade4 functions.

Read more about the basics and the interpretation of principal component analysis in our previous article: PCA - Principal Component Analysis Essentials. Contents:

Install:

``````install.packages("magrittr")  # for piping %>%
install.packages("factoextra")# PCA visualization``````

``````library(ade4)
library(factoextra)
library(magrittr)``````

## Data sets

• Demo data: `decathlon2` [in factoextra].
• Data description available at: PCA - Data format.
• Data contents:
• Active individuals (rows 1 to 23) and active variables (columns 1 to 10). Used to compute the PCA.
• Supplementary individuals (rows 24 to 27) and supplementary variables (columns 11 to 13). Their coordinates will be predicted using the PCA information and parameters obtained with active individuals/variables. Load the data and extract only active individuals and variables:

``````library("factoextra")
data(decathlon2)
decathlon2.active <- decathlon2[1:23, 1:10]
``````##           X100m Long.jump Shot.put High.jump X400m X110m.hurdle
## SEBRLE     11.0      7.58     14.8      2.07  49.8         14.7
## CLAY       10.8      7.40     14.3      1.86  49.4         14.1
## BERNARD    11.0      7.23     14.2      1.92  48.9         15.0
## YURKOV     11.3      7.09     15.2      2.10  50.4         15.3
## ZSIVOCZKY  11.1      7.30     13.5      2.01  48.6         14.2
## McMULLEN   10.8      7.31     13.8      2.13  49.9         14.4``````

## Compute PCA using dudi.pca()

``````library(ade4)
res.pca <- dudi.pca(decathlon2.active,
scannf = FALSE,   # Hide scree plot
nf = 5            # Number of components kept in the results
)``````

## Visualize PCA results

### Visualize using factoextra

The factoextra R package creates ggplot2-based visualization.

1. Visualize eigenvalues (scree plot). Show the percentage of variances explained by each principal component.
``fviz_eig(res.pca)`` 1. Graph of individuals. Individuals with a similar profile are grouped together.
``````fviz_pca_ind(res.pca,
col.ind = "cos2", # Color by the quality of representation
repel = TRUE     # Avoid text overlapping
)`````` 1. Graph of variables. Positive correlated variables point to the same side of the plot. Negative correlated variables point to opposite sides of the graph.
``````fviz_pca_var(res.pca,
col.var = "contrib", # Color by contributions to the PC
repel = TRUE     # Avoid text overlapping
)`````` 1. Biplot of individuals and variables
``````fviz_pca_biplot(res.pca, repel = TRUE,
col.var = "#2E9FDF", # Variables color
col.ind = "#696969"  # Individuals color
)`````` The ade4 package creates R base plots.

``````# Scree plot
screeplot(res.pca, main = "Screeplot - Eigenvalues")`````` ``````# Correlation circle of variables
s.corcircle(res.pca\$co)`````` ``````# Graph of individuals
s.label(res.pca\$li,
xax = 1,     # Dimension 1
yax = 2)     # Dimension 2`````` ``````# Biplot of individuals and variables
scatter(res.pca,
posieig = "none", # Hide the scree plot
clab.row = 0      # Hide row labels
)`````` ``## NULL``

``````library(factoextra)
# Eigenvalues
eig.val <- get_eigenvalue(res.pca)
eig.val

# Results for Variables
res.var <- get_pca_var(res.pca)
res.var\$coord          # Coordinates
res.var\$contrib        # Contributions to the PCs
res.var\$cos2           # Quality of representation
# Results for individuals
res.ind <- get_pca_ind(res.pca)
res.ind\$coord          # Coordinates
res.ind\$contrib        # Contributions to the PCs
res.ind\$cos2           # Quality of representation ``````

## Predict using PCA

In this section, we’ll show how to predict the coordinates of supplementary individuals and variables using only the information provided by the previously performed PCA.

### Supplementary individuals

1. Data: rows 24 to 27 and columns 1 to to 10 [in decathlon2 data sets]. The new data must contain columns (variables) with the same names and in the same order as the active data used to compute PCA.
``````# Data for the supplementary individuals
ind.sup <- decathlon2[24:27, 1:10]
ind.sup[, 1:6]``````
``````##         X100m Long.jump Shot.put High.jump X400m X110m.hurdle
## KARPOV   11.0      7.30     14.8      2.04  48.4         14.1
## WARNERS  11.1      7.60     14.3      1.98  48.7         14.2
## Nool     10.8      7.53     14.3      1.88  48.8         14.8
## Drews    10.9      7.38     13.1      1.88  48.5         14.0``````
1. Predict the coordinates of new individuals data.
``````ind.sup.coord <- suprow(res.pca, ind.sup) %>%
.\$lisup
ind.sup.coord[, 1:4]``````
``````##          Axis1   Axis2 Axis3  Axis4
## KARPOV  -0.795  0.7795 1.633  1.724
## WARNERS  0.386 -0.1216 1.739 -0.706
## Nool     0.559  1.9775 0.483 -2.278
## Drews    1.109  0.0174 3.049 -1.534``````
1. Graph of individuals including the supplementary individuals:
``````# Plot of active individuals
p <- fviz_pca_ind(res.pca, repel = TRUE) ### Supplementary variables

#### Qualitative / categorical variables

The data sets `decathlon2` contain a supplementary qualitative variable at columns 13 corresponding to the type of competitions.

Qualitative / categorical variables can be used to color individuals by groups. The grouping variable should be of same length as the number of active individuals (here 23).

• factoextra-based plots
``````groups <- as.factor(decathlon2\$Competition[1:23])
fviz_pca_ind(res.pca,
col.ind = groups, # color by groups
palette = c("#00AFBB",  "#FC4E07"),
addEllipses = TRUE, # Concentration ellipses
ellipse.type = "confidence",
legend.title = "Groups",
repel = TRUE
)`````` ``````groups <- as.factor(decathlon2\$Competition[1:23])
s.class(res.pca\$li,
fac = groups,  # color by groups
col = c("#00AFBB",  "#FC4E07")
)`````` ``````# Biplot
res <- scatter(res.pca, clab.row = 0, posieig = "none")
s.class(res.pca\$li,
fac = groups,
col = c("#00AFBB",  "#FC4E07"),
cstar = 0,               # Remove stars
cellipse = 0             # Remove ellipses
)`````` #### Quantitative variables

Data: columns 11:12. Should be of same length as the number of active individuals (here 23)

``````quanti.sup <- decathlon2[1:23, 11:12, drop = FALSE]
``````##           Rank Points
## SEBRLE       1   8217
## CLAY         2   8122
## BERNARD      4   8067
## YURKOV       5   8036
## ZSIVOCZKY    7   8004
## McMULLEN     8   7995``````

The coordinates of a given quantitative variable are calculated as the correlation between the quantitative variables and the principal components.

``````# Predict coordinates and compute cos2
quanti.coord <- supcol(res.pca, scale(quanti.sup)) %>%
.\$cosup
quanti.cos2 <- quanti.coord^2
# Graph of variables including supplementary variables
p <- fviz_pca_var(res.pca) 