Explore the outputs of a principal component analysis - R software and data mining
Description
The functions get_eig(), get_pca_ind() and get_pca_var() can be used to explore the outputs of several PCA functions : the function PCA() from FactoMineR package; prcomp() and princomp() from stats package; dudi.pca() from ade4 package.
These 3 functions are included in the R package factoextra.
Install and load factoextra
The package devtools is required for the installation as factoextra is hosted on github.
# install.packages("devtools")
library("devtools")
install_github("kassambara/factoextra")
Load factoextra :
library("factoextra")
Usage
get_eig(X)
get_pca_var(res.pca)
get_pca_ind(res.pca)
Arguments
- X, res.pca : an object of class PCA (FactoMineR); prcomp and princomp (stats); dudi and pca (ade4).
Examples
Principal component analysis
A principal component analysis (PCA) is performed using the built-in R function prcomp() and iris data :
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
# The variable Species (index = 5) is removed
# before PCA analysis
res.pca <- prcomp(iris[, -5], scale = TRUE)
Extract the eigenvalues/variances
eig <- get_eig(res.pca)
eig
eigenvalue variance.percent cumulative.variance.percent
Dim.1 2.91849782 72.9624454 72.96245
Dim.2 0.91403047 22.8507618 95.81321
Dim.3 0.14675688 3.6689219 99.48213
Dim.4 0.02071484 0.5178709 100.00000
Extract the results for variables
The function get_pca_var() provides a list of matrices containing all the results for the active variables (coordinates, correlation between variables and axes, square cosine and contributions)
var <- get_pca_var(res.pca)
names(var)
[1] "coord" "cor" "cos2" "contrib"
# Coordinates of variables
head(var$coord)
Dim.1 Dim.2 Dim.3 Dim.4
Sepal.Length 0.8901688 -0.36082989 0.27565767 0.03760602
Sepal.Width -0.4601427 -0.88271627 -0.09361987 -0.01777631
Petal.Length 0.9915552 -0.02341519 -0.05444699 -0.11534978
Petal.Width 0.9649790 -0.06399985 -0.24298265 0.07535950
# Cos2 of variables
head(var$cos2)
Dim.1 Dim.2 Dim.3 Dim.4
Sepal.Length 0.7924004 0.130198208 0.075987149 0.0014142127
Sepal.Width 0.2117313 0.779188012 0.008764681 0.0003159971
Petal.Length 0.9831817 0.000548271 0.002964475 0.0133055723
Petal.Width 0.9311844 0.004095980 0.059040571 0.0056790544
# Contribution of variables
head(var$contrib)
Dim.1 Dim.2 Dim.3 Dim.4
Sepal.Length 27.150969 14.24440565 51.777574 6.827052
Sepal.Width 7.254804 85.24748749 5.972245 1.525463
Petal.Length 33.687936 0.05998389 2.019990 64.232089
Petal.Width 31.906291 0.44812296 40.230191 27.415396
Extract the results for individuals
The function get_pca_ind() provides a list of matrices containing all the results for the active individuals (coordinates, correlation between variables and axes, square cosine and contributions)
ind <- get_pca_ind(res.pca)
names(ind)
[1] "coord" "cos2" "contrib"
# Coordinates of individuals
head(ind$coord)
Dim.1 Dim.2 Dim.3 Dim.4
1 -2.257141 -0.4784238 0.12727962 0.024087508
2 -2.074013 0.6718827 0.23382552 0.102662845
3 -2.356335 0.3407664 -0.04405390 0.028282305
4 -2.291707 0.5953999 -0.09098530 -0.065735340
5 -2.381863 -0.6446757 -0.01568565 -0.035802870
6 -2.068701 -1.4842053 -0.02687825 0.006586116
# Cos2 of individuals
head(ind$cos2)
Dim.1 Dim.2 Dim.3 Dim.4
1 0.9539975 0.04286032 0.0030335249 1.086460e-04
2 0.8927725 0.09369248 0.0113475382 2.187482e-03
3 0.9790410 0.02047578 0.0003422122 1.410446e-04
4 0.9346682 0.06308947 0.0014732682 7.690193e-04
5 0.9315095 0.06823959 0.0000403979 2.104697e-04
6 0.6600989 0.33978301 0.0001114335 6.690714e-06
# Contribution of individuals
head(ind$contrib)
Dim.1 Dim.2 Dim.3 Dim.4
1 1.1637691 0.16694510 0.073591567 0.018672867
2 0.9825900 0.32925696 0.248367113 0.339198420
3 1.2683043 0.08469576 0.008816151 0.025742863
4 1.1996857 0.25856249 0.037605617 0.139067312
5 1.2959338 0.30313118 0.001117674 0.041253702
6 0.9775628 1.60670454 0.003281801 0.001396002
Infos
This analysis has been performed using R software (ver. 3.1.2) and factoextra (ver. 1.0.2)
Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!
Recommended for You!
Recommended for you
This section contains the best data science and self-development resources to help you on your path.
Books - Data Science
Our Books
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
Others
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
Click to follow us on Facebook :
Comment this article by clicking on "Discussion" button (top-right position of this page)