Explore the outputs of a principal component analysis - R software and data mining


Description

The functions get_eig(), get_pca_ind() and get_pca_var() can be used to explore the outputs of several PCA functions : the function PCA() from FactoMineR package; prcomp() and princomp() from stats package; dudi.pca() from ade4 package.

These 3 functions are included in the R package factoextra.

Install and load factoextra

The package devtools is required for the installation as factoextra is hosted on github.

# install.packages("devtools")
library("devtools")
install_github("kassambara/factoextra")

Load factoextra :

library("factoextra")

Usage

get_eig(X)
get_pca_var(res.pca)
get_pca_ind(res.pca)

Arguments

  • X, res.pca : an object of class PCA (FactoMineR); prcomp and princomp (stats); dudi and pca (ade4).

Examples

Principal component analysis

A principal component analysis (PCA) is performed using the built-in R function prcomp() and iris data :

data(iris)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
# The variable Species (index = 5) is removed
# before PCA analysis
res.pca <- prcomp(iris[, -5],  scale = TRUE)

Extract the eigenvalues/variances

eig <- get_eig(res.pca)
eig
      eigenvalue variance.percent cumulative.variance.percent
Dim.1 2.91849782       72.9624454                    72.96245
Dim.2 0.91403047       22.8507618                    95.81321
Dim.3 0.14675688        3.6689219                    99.48213
Dim.4 0.02071484        0.5178709                   100.00000

Extract the results for variables

The function get_pca_var() provides a list of matrices containing all the results for the active variables (coordinates, correlation between variables and axes, square cosine and contributions)

var <- get_pca_var(res.pca)
names(var)
[1] "coord"   "cor"     "cos2"    "contrib"
# Coordinates of variables
head(var$coord)
                  Dim.1       Dim.2       Dim.3       Dim.4
Sepal.Length  0.8901688 -0.36082989  0.27565767  0.03760602
Sepal.Width  -0.4601427 -0.88271627 -0.09361987 -0.01777631
Petal.Length  0.9915552 -0.02341519 -0.05444699 -0.11534978
Petal.Width   0.9649790 -0.06399985 -0.24298265  0.07535950
# Cos2 of variables
head(var$cos2)
                 Dim.1       Dim.2       Dim.3        Dim.4
Sepal.Length 0.7924004 0.130198208 0.075987149 0.0014142127
Sepal.Width  0.2117313 0.779188012 0.008764681 0.0003159971
Petal.Length 0.9831817 0.000548271 0.002964475 0.0133055723
Petal.Width  0.9311844 0.004095980 0.059040571 0.0056790544
# Contribution of variables
head(var$contrib)
                 Dim.1       Dim.2     Dim.3     Dim.4
Sepal.Length 27.150969 14.24440565 51.777574  6.827052
Sepal.Width   7.254804 85.24748749  5.972245  1.525463
Petal.Length 33.687936  0.05998389  2.019990 64.232089
Petal.Width  31.906291  0.44812296 40.230191 27.415396

Extract the results for individuals

The function get_pca_ind() provides a list of matrices containing all the results for the active individuals (coordinates, correlation between variables and axes, square cosine and contributions)

ind <- get_pca_ind(res.pca)
names(ind)
[1] "coord"   "cos2"    "contrib"
# Coordinates of individuals
head(ind$coord)
      Dim.1      Dim.2       Dim.3        Dim.4
1 -2.257141 -0.4784238  0.12727962  0.024087508
2 -2.074013  0.6718827  0.23382552  0.102662845
3 -2.356335  0.3407664 -0.04405390  0.028282305
4 -2.291707  0.5953999 -0.09098530 -0.065735340
5 -2.381863 -0.6446757 -0.01568565 -0.035802870
6 -2.068701 -1.4842053 -0.02687825  0.006586116
# Cos2 of individuals
head(ind$cos2)
      Dim.1      Dim.2        Dim.3        Dim.4
1 0.9539975 0.04286032 0.0030335249 1.086460e-04
2 0.8927725 0.09369248 0.0113475382 2.187482e-03
3 0.9790410 0.02047578 0.0003422122 1.410446e-04
4 0.9346682 0.06308947 0.0014732682 7.690193e-04
5 0.9315095 0.06823959 0.0000403979 2.104697e-04
6 0.6600989 0.33978301 0.0001114335 6.690714e-06
# Contribution of individuals
head(ind$contrib)
      Dim.1      Dim.2       Dim.3       Dim.4
1 1.1637691 0.16694510 0.073591567 0.018672867
2 0.9825900 0.32925696 0.248367113 0.339198420
3 1.2683043 0.08469576 0.008816151 0.025742863
4 1.1996857 0.25856249 0.037605617 0.139067312
5 1.2959338 0.30313118 0.001117674 0.041253702
6 0.9775628 1.60670454 0.003281801 0.001396002

Infos

This analysis has been performed using R software (ver. 3.1.2) and factoextra (ver. 1.0.2)


Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!





This page has been seen 23460 times