<?xml version="1.0" encoding="UTF-8" ?>
<!-- RSS generated by PHPBoost on Sun, 17 May 2026 05:38:08 +0200 -->

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Easy Guides]]></title>
		<atom:link href="https://www.sthda.com/english/syndication/rss/wiki/48" rel="self" type="application/rss+xml"/>
		<link>https://www.sthda.com</link>
		<description><![CDATA[Last articles of the category: simplyR]]></description>
		<copyright>(C) 2005-2026 PHPBoost</copyright>
		<language>en</language>
		<generator>PHPBoost</generator>
		
		
		<item>
			<title><![CDATA[Regression Analysis Essentials For Machine Learning]]></title>
			<link>https://www.sthda.com/english/wiki/regression-analysis-essentials-for-machine-learning</link>
			<guid>https://www.sthda.com/english/wiki/regression-analysis-essentials-for-machine-learning</guid>
			<description><![CDATA[<!-- START HTML -->


  <div id="rdoc">




<p><strong>Regression analysis</strong> consists of a set of <em>machine learning</em> methods that allow us to predict a continuous outcome variable (y) based on the value of one or multiple predictor variables (x).</p>
<p>Briefly, the goal of regression model is to build a mathematical equation that defines y as a function of the x variables. Next, this equation can be used to predict the outcome (y) on the basis of new values of the predictor variables (x).</p>
<p><a href="https://www.sthda.com/english/articles/40-regression-analysis/165-linear-regression-essentials-in-r/"><strong>Linear regression</strong></a> is the most simple and popular technique for predicting a continuous variable. It assumes a linear relationship between the outcome and the predictor variables.</p>
<p>The linear regression equation can be written as <code>y = b0 + b*x + e</code>, where:</p>
<ul>
<li>b0 is the intercept,</li>
<li>b is the regression weight or coefficient associated with the predictor variable x.</li>
<li>e is the residual error</li>
</ul>
<p>Technically, the linear regression coefficients are detetermined so that the error in predicting the outcome value is minimized. This method of computing the beta coefficients is called the <a href="https://www.sthda.com/english/articles/40-regression-analysis/165-linear-regression-essentials-in-r/"><strong>Ordinary Least Squares</strong></a> method.</p>
<p>When you have multiple predictor variables, say x1 and x2, the regression equation can be written as <code>y = b0 + b1*x1 + b2*x2 +e</code>. In some situations, there might be an <a href="https://www.sthda.com/english/articles/40-regression-analysis/164-interaction-effect-in-multiple-regression-essentials/"><strong>interaction effect</strong></a> between some predictors, that is for example, increasing the value of a predictor variable x1 may increase the effectiveness of the predictor x2 in explaining the variation in the outcome variable.</p>
<p>Note also that, linear regression models can incorporate both continuous and <a href="https://www.sthda.com/english/articles/40-regression-analysis/163-regression-with-categorical-variables-dummy-coding-essentials-in-r/"><strong>categorical predictor variables</strong></a>.</p>
<p>When you build the linear regression model, you need to <a href="https://www.sthda.com/english/articles/39-regression-model-diagnostics/"><strong>diagnostic</strong></a> whether linear model is suitable for your data.</p>
<p>In some cases, the relationship between the outcome and the predictor variables is not linear. In these situations, you need to build a <strong>non-linear regression</strong>, such as <a href="https://www.sthda.com/english/articles/40-regression-analysis/162-nonlinear-regression-essentials-in-r-polynomial-and-spline-regression-models/"><em>polynomial and spline regression</em></a>.</p>
<p>When you have multiple predictors in the regression model, you might want to select the best combination of predictor variables to build an optimal predictive model. This process called <a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/"><strong>model selection</strong></a>, consists of comparing multiple models containing different sets of predictors in order to select the best performing model that minimize the prediction error. Linear model selection approaches include <a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/155-best-subsets-regression-essentials-in-r/"><strong>best subsets regression</strong></a> and <a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/154-stepwise-regression-essentials-in-r/"><strong>stepwise regression</strong></a></p>
<p>In some situations, such as in genomic fields, you might have a large multivariate data set containing some correlated predictors. In this case, the information, in the original data set, can be summarized into few new variables (called principal components) that are a linear combination of the original variables. This few principal components can be used to build a linear model, which might be more performant for your data. This approach is know as <strong>principal component-based methods</strong>, which include: <a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/152-principal-component-and-partial-least-squares-regression-essentials/"><strong>principal component regression</strong> and <strong>partial least squares regression</strong></a>.</p>
<p>An alternative method to simplify a large multivariate model is to use <a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/153-penalized-regression-essentials-ridge-lasso-elastic-net/"><strong>penalized regression</strong></a>, which penalizes the model for having too many variables. The most well known penalized regression include <strong>ridge regression</strong> and the <strong>lasso regression</strong>.</p>
<p>You can apply all these different regression models on your data, compare the models and finally select the best approach that explains well your data. To do so, you need some statistical metrics to compare the performance of the different models in explaining your data and in predicting the outcome of new test data.</p>
<p>The best model is defined as the model that has the lowest prediction error. The <a href="https://www.sthda.com/english/articles/38-regression-model-validation/158-regression-model-accuracy-metrics-r-square-aic-bic-cp-and-more/">most popular metrics for comparing regression models</a>, include:</p>
<ul>
<li><strong>Root Mean Squared Error</strong>, which measures the model prediction error. It corresponds to the average difference between the observed known values of the outcome and the predicted value by the model. RMSE is computed as <code>RMSE = mean((observeds - predicteds)^2) %>% sqrt()</code>. The lower the RMSE, the better the model.</li>
<li><strong>Adjusted R-square</strong>, representing the proportion of variation (i.e., information), in your data, explained by the model. This corresponds to the overall quality of the model. The higher the adjusted R2, the better the model</li>
</ul>
<p>Note that, the above mentioned metrics should be computed on a new test data that has not been used to train (i.e. build) the model. If you have a large data set, with many records, you can randomly split the data into training set (80% for building the predictive model) and test set or validation set (20% for evaluating the model performance).</p>
<p>One of the most robust and popular approach for estimating a model performance is <a href="https://www.sthda.com/english/articles/38-regression-model-validation/157-cross-validation-essentials-in-r/"><strong>k-fold cross-validation</strong></a>. It can be applied even on a small data set. k-fold cross-validation works as follow:</p>
<ol style="list-style-type: decimal">
<li>Randomly split the data set into k-subsets (or k-fold) (for example 5 subsets)</li>
<li>Reserve one subset and train the model on all other subsets</li>
<li>Test the model on the reserved subset and record the prediction error</li>
<li>Repeat this process until each of the k subsets has served as the test set.</li>
<li>Compute the average of the k recorded errors. This is called the cross-validation error serving as the performance metric for the model.</li>
</ol>
<p>Taken together, the best model is the model that has the lowest cross-validation error, RMSE.</p>
<p>In this Part, you will learn different methods for regression analysis and we’ll provide practical example in <strong>R</strong>.</p>
<p>The content is organized as follow:</p>
<ol style="list-style-type: decimal">
<li><a href="https://www.sthda.com/english/articles/40-regression-analysis/">Regression Analysis</a>
<ul>
<li><a href="https://www.sthda.com/english/articles/40-regression-analysis/165-linear-regression-essentials-in-r/">Linear Regression</a></li>
<li><a href="https://www.sthda.com/english/articles/40-regression-analysis/164-interaction-effect-in-multiple-regression-essentials/">Interaction Effect in Multiple Regression</a></li>
<li><a href="https://www.sthda.com/english/articles/40-regression-analysis/163-regression-with-categorical-variables-dummy-coding-essentials-in-r/">Regression with Categorical Variables: Dummy Coding Essentials in R</a></li>
<li><a href="https://www.sthda.com/english/articles/40-regression-analysis/162-nonlinear-regression-essentials-in-r-polynomial-and-spline-regression-models/">Nonlinear Regression Essentials in R: Polynomial and Spline Regression Models</a></li>
</ul></li>
<li><a href="https://www.sthda.com/english/articles/39-regression-model-diagnostics/">Regression Model Diagnostics</a>
<ul>
<li><a href="https://www.sthda.com/english/articles/39-regression-model-diagnostics/161-linear-regression-assumptions-and-diagnostics-in-r-essentials/">Linear Regression Assumptions and Diagnostics in R</a></li>
<li><a href="https://www.sthda.com/english/articles/39-regression-model-diagnostics/160-multicollinearity-essentials-and-vif-in-r/">Multicollinearity Essentials and VIF in R</a></li>
<li><a href="https://www.sthda.com/english/articles/39-regression-model-diagnostics/159-confounding-variable-essentials/">Confounding Variable Essentials</a></li>
</ul></li>
<li><a href="https://www.sthda.com/english/articles/38-regression-model-validation/">Regression Model Validation</a>
<ul>
<li><a href="https://www.sthda.com/english/articles/38-regression-model-validation/158-regression-model-accuracy-metrics-r-square-aic-bic-cp-and-more/">Regression Model Accuracy Metrics: R-square, AIC, BIC, Cp and more</a></li>
<li><a href="https://www.sthda.com/english/articles/38-regression-model-validation/157-cross-validation-essentials-in-r/">Cross-Validation Essentials in R</a></li>
<li><a href="https://www.sthda.com/english/articles/38-regression-model-validation/156-bootstrap-resampling-essentials-in-r/">Bootstrap Resampling Essentials in R</a></li>
</ul></li>
<li><a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/">Model Selection Essentials in R</a>
<ul>
<li><a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/155-best-subsets-regression-essentials-in-r/">Best Subsets Regression Essentials in R</a></li>
<li><a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/154-stepwise-regression-essentials-in-r/">Stepwise Regression Essentials in R</a></li>
<li><a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/153-penalized-regression-essentials-ridge-lasso-elastic-net/">Penalized Regression Essentials: Ridge, Lasso &amp; Elastic Net</a></li>
<li><a href="https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/152-principal-component-and-partial-least-squares-regression-essentials/">Principal Component and Partial Least Squares Regression Essentials</a></li>
</ul></li>
</ol>


</div><!--end rdoc-->



<!-- END HTML -->]]></description>
			<pubDate>Wed, 21 Mar 2018 07:35:58 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[ggpubr: Create Easily Publication Ready Plots]]></title>
			<link>https://www.sthda.com/english/wiki/ggpubr-create-easily-publication-ready-plots</link>
			<guid>https://www.sthda.com/english/wiki/ggpubr-create-easily-publication-ready-plots</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">

<p>The <strong>ggpubr</strong> R package facilitates the creation of beautiful ggplot2-based graphs for researcher with non-advanced programming backgrounds.</p>
<p>The current material presents a collection of articles for simply creating and customizing publication-ready plots using ggpubr. To see some examples of plots created with ggpubr click the following link: <a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/">ggpubr examples</a>.</p>
<p>ggpubr Key features:</p>
<ul>
<li>Wrapper around the <strong>ggplot2</strong> package with a <strong>less opaque syntax</strong> for beginners in R programming.</li>
<li>Helps researchers, with non-advanced R programming skills, to create easily <strong>publication-ready plots</strong>.</li>
<li>Makes it possible to automatically <strong>add p-values and significance levels</strong> to box plots, bar plots, line plots, and more.</li>
<li>Makes it easy to <strong>arrange and annotate multiple plots</strong> on the same page.</li>
<li>Makes it easy to <strong>change grahical parameters</strong> such as colors and labels.</li>
</ul>
<p>Official online documentation: <a href="https://www.sthda.com/english/rpkgs/ggpubr" class="uri">https://www.sthda.com/english/rpkgs/ggpubr</a>.</p>
<p><img src="https://www.sthda.com/english/sthda-upload/images/symplyr/ggpubr-300.png" alt="ggpubr: publication ready plots" /></p>
<div id="install-and-load-ggpubr" class="section level2">
<h2>Install and load ggpubr</h2>
<ul>
<li>Install from <a href="https://cran.r-project.org/package=ggpubr">CRAN</a> as follow:</li>
</ul>
<pre class="r"><code>install.packages("ggpubr")</code></pre>
<ul>
<li>Or, install the latest version from <a href="https://github.com/kassambara/ggpubr">GitHub</a> as follow:</li>
</ul>
<pre class="r"><code># Install
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")</code></pre>
<ul>
<li>Load ggpubr:</li>
</ul>
<pre class="r"><code>library("ggpubr")</code></pre>
</div>
<div id="related-articles" class="section level2">
<h2>Related articles</h2>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/77-facilitating-exploratory-data-visualization-application-to-tcga-genomic-data/"><i class = "fa fa-file-text"> </i> Facilitating Exploratory Data Visualization: Application to TCGA Genomic Data</a></p>
<p>Description: Plot one or a list of variables at once.</p>
<p>Contents:</p>
<ul>
<li>Gene expression data</li>
<li>Box plots</li>
<li>Violin plots</li>
<li>Stripcharts and dot plots</li>
<li>Density plots</li>
<li>Histogram plots</li>
<li>Empirical cumulative density function</li>
<li>Quantile - Quantile plot</li>
</ul>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/"><i class = "fa fa-file-text"> </i> Add P-values and Significance Levels to ggplots</a></p>
<p>Description: Compute and add automatically p-values and significance levels to ggplots.</p>
<p>Contents:</p>
<ul>
<li>Methods for comparing means</li>
<li>R functions to add p-values
<ul>
<li>compare_means()</li>
<li>stat_compare_means()</li>
</ul></li>
<li>Compare two independent groups</li>
<li>Compare two paired samples</li>
<li>Compare more than two groups</li>
<li>Multiple grouping variables</li>
</ul>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/78-perfect-scatter-plots-with-correlation-and-marginal-histograms/"><i class = "fa fa-file-text"> </i> Perfect Scatter Plots with Correlation and Marginal Histograms</a></p>
<p>Description: Create beautiful scatter plots with correlation coefficients and marginal histograms/density.</p>
<p>Contents:</p>
<ul>
<li>Basic plots</li>
<li>Color by groups</li>
<li>Add concentration ellipses</li>
<li>Add point labels</li>
<li>Bubble chart</li>
<li>Color by a continuous variable</li>
<li>Add marginal plots</li>
<li>Add 2d density estimation</li>
<li>Application to gene expression data</li>
</ul>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/79-plot-meansmedians-and-error-bars/"><i class = "fa fa-file-text"> </i> Plot Means/Medians and Error Bars</a></p>
<p>Description: Plot easily means or medians with error bars.</p>
<p>Contents:</p>
<ul>
<li>Error plots</li>
<li>Line plots</li>
<li>Bar plots</li>
<li>Add labels</li>
<li>Application to gene expression data</li>
</ul>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/80-bar-plots-and-modern-alternatives/"><i class = "fa fa-file-text"> </i> Bar Plots and Modern Alternatives</a></p>
<p>Description: Create easily basic and ordered bar plots, as well as, some modern alternatives to bar plots, including lollipop charts and cleveland’s dot plots.</p>
<p>Contents:</p>
<ul>
<li>Basic bar plots</li>
<li>Multiple grouping variables</li>
<li>Ordered bar plots</li>
<li>Deviation graphs</li>
<li>Alternatives to bar plots
<ul>
<li>Lollipop chart</li>
<li>Cleveland’s dot plot</li>
</ul></li>
</ul>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/84-add-text-labels-to-histogram-and-density-plots/"><i class = "fa fa-file-text"> </i> Add Text Labels to Histogram and Density Plots</a></p>
<p>Description: Create histograms/density plots and highlight some key elements on the plot.</p>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/"><i class = "fa fa-file-text"> </i> ggplot2 - Easy Way to Mix Multiple Graphs on The Same Page</a></p>
<p>Description: Step by step guide to combine multiple ggplots on the same page, as well as, over multiple pages.</p>
<p>Contents:</p>
<ul>
<li>Create some plots</li>
<li>Arrange on one page</li>
<li>Annotate the arranged figure</li>
<li>Align plot panels</li>
<li>Change column/row span of a plot</li>
<li>Use common legend for combined ggplots</li>
<li>Scatter plot with marginal density plots</li>
<li>Mix table, text and ggplot2 graphs</li>
<li>Insert a graphical element inside a ggplot
<ul>
<li>Place a table within a ggplot</li>
<li>Place a box plot within a ggplot</li>
<li>Add background image to ggplot2 graphs</li>
</ul></li>
<li>Arrange over multiple pages</li>
<li>Nested layout with ggarrange()</li>
<li>Export plots</li>
</ul>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/82-ggplot2-easy-way-to-change-graphical-parameters/"><i class = "fa fa-file-text"> </i> ggplot2 - Easy Way to Change Graphical Parameters</a></p>
<p>Description: Describe the function ggpar() [in ggpubr], which can be used to simply and easily customize any ggplot2-based graphs.</p>
<p>Contents:</p>
<ul>
<li>Change titles and axis labels</li>
<li>Change legend position &amp; appearance</li>
<li>Change color palettes
<ul>
<li>Group colors</li>
<li>Gradient colors</li>
</ul></li>
<li>Change axis limits and scales</li>
<li>Customize axis text and ticks</li>
<li>Rotate a plot</li>
<li>Change themes</li>
<li>Remove ggplot components</li>
</ul>
<p><a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/83-create-and-customize-multi-panel-ggplots-easy-guide-to-facet/"><i class = "fa fa-file-text"> </i> Create and Customize Multi-panel ggplots: Easy Guide to Facet</a></p>
<p>Description: split up your data by one or more variables and to visualize the subsets of the data together.</p>
<p>Contents:</p>
<ul>
<li>Facet by one grouping variables</li>
<li>Facet by two grouping variables</li>
<li>Modifying panel label appearance</li>
</ul>
</div>
</div>


</div><!--end rdoc-->


<!-- END HTML -->]]></description>
			<pubDate>Thu, 14 Sep 2017 08:38:34 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[The Ultimate Guide To Partitioning Clustering]]></title>
			<link>https://www.sthda.com/english/wiki/the-ultimate-guide-to-partitioning-clustering</link>
			<guid>https://www.sthda.com/english/wiki/the-ultimate-guide-to-partitioning-clustering</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">
<p>In this first volume of <a href="https://www.sthda.com/english/wiki/simplyr">symplyR</a>, we are excited to share our <a href="https://www.sthda.com/english/articles/27-partitioning-clustering-essentials/">Practical Guides to Partioning Clustering</a>.</p>
<p><img src="https://www.sthda.com/english/sthda-upload/images/symplyr/partitioning-clustering.png" alt="Partitioning clustering methods" /></p>
<p>The course materials contain 3 chapters organized as follow:</p>
<p><a href="https://www.sthda.com/english/articles/27-partitioning-clustering-essentials/87-k-means-clustering-essentials/"><i class = "fa fa-file-text"> </i> K-Means Clustering Essentials</a></p>
<p>Contents:</p>
<ul>
<li>K-means basic ideas</li>
<li>K-means algorithm</li>
<li>Computing k-means clustering in R
<ul>
<li>Data</li>
<li>Required R packages and functions: <strong>stats::kmeans</strong>()</li>
<li>Estimating the optimal number of clusters: <strong>factoextra::fviz_nbclust</strong>()</li>
<li>Computing k-means clustering</li>
<li>Accessing to the results of kmeans() function</li>
<li>Visualizing k-means clusters: <strong>factoextra::fviz_cluster</strong>()</li>
</ul></li>
<li>K-means clustering advantages and disadvantages</li>
<li>Alternative to k-means clustering</li>
</ul>
<p><a href="https://www.sthda.com/english/articles/27-partitioning-clustering-essentials/88-k-medoids-essentials/"><i class = "fa fa-file-text"> </i> K-Medoids Essentials: PAM clustering</a></p>
<p>Contents:</p>
<ul>
<li>PAM concept</li>
<li>PAM algorithm</li>
<li>Computing PAM in R
<ul>
<li>Data</li>
<li>Required R packages and functions: <strong>cluster::pam</strong>() or <strong>fpc::pamk</strong>()</li>
<li>Estimating the optimal number of clusters: <strong>factoextra::fviz_nbclust</strong>()</li>
<li>Computing PAM clustering</li>
<li>Accessing to the results of the pam() function</li>
<li>Visualizing PAM clusters: <strong>factoextra::fviz_cluster</strong>()</li>
</ul></li>
</ul>
<p><a href="https://www.sthda.com/english/articles/27-partitioning-clustering-essentials/89-clara-clustering-large-applications/"><i class = "fa fa-file-text"> </i> CLARA - Clustering Large Applications</a></p>
<p>Contents:</p>
<ul>
<li>CLARA concept</li>
<li>CLARA Algorithm</li>
<li>Computing CLARA in R
<ul>
<li>Data format and preparation</li>
<li>Required R packages and functions: <strong>cluster::clara</strong>()</li>
<li>Estimating the optimal number of clusters: <strong>factoextra::fviz_nbclust</strong>()</li>
<li>Computing CLARA</li>
<li>Visualizing CLARA clusters: <strong>factoextra::fviz_cluster</strong>()</li>
</ul></li>
</ul>
<p>Example of plots:</p>
<p><img src="https://www.sthda.com/english/sthda-upload/images/symplyr/kmeans-clustering-plot.png" alt="K means clustering plots" width = "400px"/></p>
</div>
<br/>
Licence: <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/us/" target="_blank"><img alt="Licence Creative Commons" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/3.0/fr/88x31.png" /></a>
<br/><br/>
</div><!--end rdoc-->
 
<!-- END HTML -->]]></description>
			<pubDate>Wed, 06 Sep 2017 13:47:55 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Practical Guide to Principal Component Methods in R]]></title>
			<link>https://www.sthda.com/english/wiki/practical-guide-to-principal-component-methods-in-r</link>
			<guid>https://www.sthda.com/english/wiki/practical-guide-to-principal-component-methods-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--STHDA_START-->
  <div id="rdoc">
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>Although there are several good books on <strong>principal component methods</strong> (PCMs) and related topics, we felt that many of them are either too theoretical or too advanced.</p>
<p>This book provides a solid practical guidance to summarize, visualize and interpret the most important information in a large multivariate data sets, using principal component methods in R.</p>
<p>Where to find the book:</p>
<ul>
<li>Download the <strong>PDF</strong> through <a href="https://payhip.com/b/shrk">payhip</a></li>
<li>Read the <strong>ebook</strong> on <a href="https://play.google.com/store/books/author?id=Alboukadel+Kassambara">google play</a></li>
<li>Order a <strong>physical copy</strong> from <a href="https://www.amazon.com/dp/1975721136">amazon</a></li>
<li>(Download the <a href="https://www.sthda.com/english/upload/principal_component_methods_in_r_preview.pdf">book preview</a>)</li>
</ul>
<p><a href ="https://payhip.com/b/shrk" target = "_blank" rel="nofollow" title = "Download now the PDF on Payhip"><img src = "/english/sthda-upload/images/principal-component-methods-cover-200px.png" /></a><br/><br/></p>
<p><a href ="https://www.amazon.com/dp/1975721136" target = "_blank" rel="nofollow"> <img src = "/english/sthda-upload/images/buy-on-amazon.png"/></a><a href ="https://payhip.com/b/shrk" target = "_blank" rel="nofollow"> <img src = "/english/sthda-upload/images/payhip.png"/></a> <a href ="https://play.google.com/store/books/details/Alboukadel_KASSAMBARA_Practical_Guide_To_Principal?id=eFEyDwAAQBAJ" target = "_blank" rel="nofollow"> <img src = "/english/sthda-upload/images/google-play.png"/></a></p>
<p>The following figure illustrates the type of analysis to be performed depending on the type of variables contained in the data set.</p>
<p><img src="https://www.sthda.com/english/english/sthda-upload/images/multivariate-analysis-factoextra.png" alt="Principal component methods" /></p>
<p>There are a number of R packages implementing principal component methods. These packages include: <em>FactoMineR</em>, <em>ade4</em>, <em>stats</em>, <em>ca</em>, <em>MASS</em> and <em>ExPosition</em>.</p>
<p>However, the result is presented differently depending on the used package.</p>
<p>To help in the interpretation and in the visualization of multivariate analysis - such as <a href="https://www.sthda.com/english/web/5-bookadvisor/17-practical-guide-to-cluster-analysis-in-r/">cluster analysis</a> and principal component methods - we developed an easy-to-use R package named <a href="https://www.sthda.com/english/rpkgs/factoextra"><strong>factoextra</strong></a> (official online documentation: <a href="https://www.sthda.com/english/rpkgs/factoextra" class="uri">https://www.sthda.com/english/rpkgs/factoextra</a>).</p>
<div class="block">
<p>
No matter which package you decide to use for computing principal component methods, the factoextra R package can help to extract easily, in a human readable data format, the analysis results from the different packages mentioned above. factoextra provides also convenient solutions to create ggplot2-based beautiful graphs.
</p>
</div>
<p>Methods, which outputs can be visualized using the factoextra package are shown in the figure below:</p>
<p><img src="https://www.sthda.com/english/english/sthda-upload/images/factoextra-r-package.png" alt="Principal component methods and clustering methods supported by the factoextra R package" /></p>
<p>In this book, we’ll use mainly:</p>
<div class="success">
<ul>
<li>
the <strong>FactoMineR</strong> package to compute principal component methods;
</li>
<li>
and the <strong>factoextra</strong> package for extracting, visualizing and interpreting the results.
</li>
</ul>
<p>
The other packages - ade4, ExPosition, etc - will be also presented briefly.
</p>
</div>
</div>
<div id="how-this-book-is-organized" class="section level2">
<h2>How this book is organized</h2>
<p>This book contains 4 parts.</p>
<p><img src="https://www.sthda.com/english/english/sthda-upload/images/principal-component-methods-book-structure.png" alt="Principal Component Methods book structure" /></p>
<p><strong>Part I</strong> provides a quick introduction to R and presents the key features of FactoMineR and factoextra.</p>
<p><img src="https://www.sthda.com/english/english/sthda-upload/images/r-packages-multivariate-analysis.png" alt="Key features of FactoMineR and factoextra for multivariate analysis" /></p>
<p><strong>Part II</strong> describes classical principal component methods to analyze data sets containing, predominantly, either continuous or categorical variables. These methods include:</p>
<ul>
<li>Principal Component Analysis (PCA, for continuous variables),</li>
<li>Simple correspondence analysis (CA, for large contingency tables formed by two categorical variables)</li>
<li>Multiple correspondence analysis (MCA, for a data set with more than 2 categorical variables).</li>
</ul>
<p>In <strong>Part III</strong>, you’ll learn advanced methods for analyzing a data set containing a mix of variables (continuous and categorical) structured or not into groups:</p>
<ul>
<li>Factor Analysis of Mixed Data (FAMD) and,</li>
<li>Multiple Factor Analysis (MFA).</li>
</ul>
<p><strong>Part IV</strong> covers hierarchical clustering on principal components (HCPC), which is useful for performing clustering with a data set containing only categorical variables or with a mixed data of categorical and continuous variables</p>
</div>
<div id="key-features-of-this-book" class="section level2">
<h2>Key features of this book</h2>
<p>This book presents the basic principles of the different methods and provide many examples in R. This book offers solid guidance in data mining for students and researchers.</p>
<p>Key features:</p>
<ul>
<li>Covers principal component methods and implementation in R</li>
<li>Highlights the most important information in your data set using ggplot2-based elegant visualization</li>
<li>Short, self-contained chapters with tested examples that allow for flexibility in designing a course and for easy reference</li>
</ul>
<div class="block">
<p>
At the end of each chapter, we present R lab sections in which we systematically work through applications of the various methods discussed in that chapter. Additionally, we provide links to other resources and to our hand-curated list of videos on principal component methods for further learning.
</p>
</div>
</div>
<div id="examples-of-plots" class="section level2">
<h2>Examples of plots</h2>
<p>Some examples of plots generated in this book are shown hereafter. You’ll learn how to create, customize and interpret these plots.</p>
<ol style="list-style-type: decimal">
<li><strong>Eigenvalues/variances of principal components</strong>. Proportion of information retained by each principal component.</li>
</ol>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-pca-eigenvalue-1.png" width="432" style="margin-bottom:10px;" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>PCA - Graph of variables</strong>:</li>
</ol>
<ul>
<li>Control variable colors using their contributions to the principal components.</li>
</ul>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-pca-variable-colors-by-contributions-1.png" width="480" style="margin-bottom:10px;" /></p>
<ul>
<li>Highlight the most contributing variables to each principal dimension:</li>
</ul>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-pca-variable-contributions-1.png" width="288" style="margin-bottom:10px;" /><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-pca-variable-contributions-2.png" width="288" style="margin-bottom:10px;" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>PCA - Graph of individuals</strong>:</li>
</ol>
<ul>
<li>Control automatically the color of individuals using the cos2 (the quality of the individuals on the factor map)</li>
</ul>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-pca-individuals-1.png" width="528" style="margin-bottom:10px;" /></p>
<ul>
<li>Change the point size according to the cos2 of the corresponding individuals:</li>
</ul>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-pca-graph-individuals-point-size-by-cos2-1.png" width="528" style="margin-bottom:10px;" /></p>
<ol start="4" style="list-style-type: decimal">
<li><strong>PCA - Biplot of individuals and variables</strong></li>
</ol>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-pca-color-individuals-and-variables-by-groups-1.png" width="528" style="margin-bottom:10px;" /></p>
<ol start="5" style="list-style-type: decimal">
<li><strong>Correspondence analysis</strong>. Association between categorical variables.</li>
</ol>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-correspondence-analysis-1.png" width="528" style="margin-bottom:10px;" /></p>
<ol start="6" style="list-style-type: decimal">
<li><strong>FAMD/MFA</strong> - Analyzing mixed and structured data</li>
</ol>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-famd-plot-ellipse-1.png" width="499.2" style="margin-bottom:10px;" /></p>
<ol start="7" style="list-style-type: decimal">
<li><strong>Clustering on principal components</strong></li>
</ol>
<p><img src="https://www.sthda.com/english/english/sthda-upload/figures/principal-component-methods/principal-component-methods-book-intro-hierarchical-clustering-on-principal-component-1.png" width="528" style="margin-bottom:10px;" /></p>
</div>
<div id="book-preview" class="section level2">
<h2>Book preview</h2>
<p>Download the preview of the book at: <a href="https://www.sthda.com/english/upload/principal_component_methods_in_r_preview.pdf">Principal Component Methods in R (Book preview)</a></p>
</div>
<div id="order-now" class="section level2">
<h2>Order now</h2>
<p><a href ="https://payhip.com/b/shrk" target = "_blank" rel="nofollow" title = "Download now the PDF on Payhip"><img src = "/english/sthda-upload/images/principal-component-methods-cover-200px.png" /></a><br/><br/></p>
<p><a href ="https://www.amazon.com/dp/1975721136" target = "_blank" rel="nofollow"> <img src = "/english/sthda-upload/images/buy-on-amazon.png"/></a> <a href ="https://payhip.com/b/shrk" target = "_blank" rel="nofollow"> <img src = "/english/sthda-upload/images/payhip.png"/></a> <a href ="https://play.google.com/store/books/details/Alboukadel_KASSAMBARA_Practical_Guide_To_Principal?id=eFEyDwAAQBAJ" target = "_blank" rel="nofollow"> <img src = "/english/sthda-upload/images/google-play.png"/></a></p>
</div>
<div id="about-the-author" class="section level2">
<h2>About the author</h2>
<p>Alboukadel Kassambara is a PhD in Bioinformatics and Cancer Biology. He works since many years on genomic data analysis and visualization (read more: <a href="http://www.alboukadel.com/" class="uri">http://www.alboukadel.com/</a>).</p>
<p>He has work experiences in statistical and computational methods to identify prognostic and predictive biomarker signatures through integrative analysis of large-scale genomic and clinical data sets.</p>
<p>He created a bioinformatics web-tool named GenomicScape (www.genomicscape.com) which is an easy-to-use web tool for gene expression data analysis and visualization.</p>
<p>He developed also a training website on data science, named STHDA (Statistical Tools for High-throughput Data Analysis, www.sthda.com/english), which contains many tutorials on data analysis and visualization using R software and packages.</p>
<p>He is the author of many popular R packages for:</p>
<ul>
<li>multivariate data analysis (<strong>factoextra</strong>, <a href="https://www.sthda.com/english/rpkgs/factoextra" class="uri">https://www.sthda.com/english/rpkgs/factoextra</a>),</li>
<li>survival analysis (<strong>survminer</strong>, <a href="https://www.sthda.com/english/rpkgs/survminer/" class="uri">https://www.sthda.com/english/rpkgs/survminer/</a>),</li>
<li>correlation analysis (<strong>ggcorrplot</strong>, <a href="https://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlation-matrix-using-ggplot2" class="uri">https://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlation-matrix-using-ggplot2</a>),</li>
<li>creating publication ready plots in R (<strong>ggpubr</strong>, <a href="https://www.sthda.com/english/rpkgs/ggpubr" class="uri">https://www.sthda.com/english/rpkgs/ggpubr</a>).</li>
</ul>
<p>Recently, he published three books on data analysis and visualization:</p>
<ol style="list-style-type: decimal">
<li>Practical Guide to Cluster Analysis in R (<a href="https://goo.gl/DmJ5y5" class="uri">https://goo.gl/DmJ5y5</a>)</li>
<li>Guide to Create Beautiful Graphics in R (<a href="https://goo.gl/vJ0OYb" class="uri">https://goo.gl/vJ0OYb</a>).</li>
<li>Complete Guide to 3D Plots in R (<a href="https://goo.gl/v5gwl0" class="uri">https://goo.gl/v5gwl0</a>).</li>
</ol>
</div>
</div>
</div><!--end rdoc-->
<!--STHDA_END-->

<!-- END HTML -->]]></description>
			<pubDate>Thu, 24 Aug 2017 14:04:17 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[simplyR]]></title>
			<link>https://www.sthda.com/english/wiki/simplyr</link>
			<guid>https://www.sthda.com/english/wiki/simplyr</guid>
			<description><![CDATA[<!-- START HTML -->

<p><strong>simplyR</strong> is a web space where we’ll be posting practical and easy guides for solving real important problems using R programming language.</p>
<p>As we aren’t fans of unnecessary complications, we’ll keep the content of our tutorials / R codes as simple as possible.</p>
<p>Many tutorials are coming soon.</p>
<p>Topics we love include:</p>
<ul>
<li>R programming</li>
<li>Biostatistics</li>
<li>Genomic data analysis</li>
<li>Survival analysis</li>
<li>Machine/statistical learning</li>
<li>Data visualization</li>
</ul>
<p>Samples of our recent publications, on R &amp; Data Science, are:</p>
<ul>
<li><a href="https://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software">Correlation matrix : A quick start guide to analyze, format and visualize a correlation matrix using R software</a></li>
<li><a href="https://www.sthda.com/english/wiki/ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page">ggplot2 - Easy way to mix multiple graphs on the same page</a></li>
<li><a href="https://www.sthda.com/english/wiki/bar-plots-and-modern-alternatives">Bar Plots and Modern Alternatives</a></li>
<li><a href="https://www.sthda.com/english/wiki/facilitating-exploratory-data-visualization-application-to-tcga-genomic-data">Facilitating Exploratory Data Visualization: Application to TCGA Genomic Data</a></li>
<li><a href="https://www.sthda.com/english/wiki/add-p-values-and-significance-levels-to-ggplots">Add P-values and Significance Levels to ggplots</a></li>
<li><a href="https://www.sthda.com/english/wiki/fastqcr-an-r-package-facilitating-quality-controls-of-sequencing-data-for-large-numbers-of-samples">fastqcr: An R Package Facilitating Quality Controls of Sequencing Data for Large Numbers of Samples</a></li>
<li><a href="https://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate-data-analyses-and-elegant-visualization">Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization</a></li>
<li><a href="https://www.sthda.com/english/wiki/survival-analysis">Survival Analysis</a></li>
<li><a href="https://www.sthda.com/english/wiki/cluster-analysis-in-r-unsupervised-machine-learning">Cluster Analysis</a></li>
<li><a href="https://www.sthda.com/english/wiki/r-xlsx-package-a-quick-start-guide-to-manipulate-excel-files-in-r">R xlsx package : A quick start guide to manipulate Excel files in R</a></li>
<li><a href="https://www.sthda.com/english/wiki/wiki.php">See More…</a></li>
</ul>
<p>If you want to contribute, read this: <a href="https://www.sthda.com/english/pages/contribute-to-sthda" class="uri">https://www.sthda.com/english/pages/contribute-to-sthda</a></p>

<!-- END HTML -->]]></description>
			<pubDate>Sat, 19 Aug 2017 23:56:23 +0200</pubDate>
			
		</item>
		
	</channel>
</rss>
