<?xml version="1.0" encoding="UTF-8" ?>
<!-- RSS generated by PHPBoost on Fri, 05 Jun 2026 09:03:10 +0200 -->

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Last articles - STHDA : Hierarchical Clustering Essentials]]></title>
		<atom:link href="https://www.sthda.com/english/syndication/rss/articles/28" rel="self" type="application/rss+xml"/>
		<link>https://www.sthda.com</link>
		<description><![CDATA[Last articles - STHDA : Hierarchical Clustering Essentials]]></description>
		<copyright>(C) 2005-2026 PHPBoost</copyright>
		<language>en</language>
		<generator>PHPBoost</generator>
		
		
		<item>
			<title><![CDATA[Divisive Hierarchical Clustering Essentials]]></title>
			<link>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/94-divisive-hierarchical-clustering-essentials/</link>
			<guid>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/94-divisive-hierarchical-clustering-essentials/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">
<p>The <strong>divisive hierarchical clustering</strong>, also known as <em>DIANA</em> (<em>DIvisive ANAlysis</em>) is the inverse of agglomerative clustering (Chapter @ref(agglomerative-clustering)).</p>
<div class="block">
<p>
This article introduces the divisive clustering algorithms and provides practical examples showing how to compute divise clustering using R.
</p>
</div>
<div id="algorithm" class="section level2">
<h2>Algorithm</h2>
<p>It starts by including all objects in a single large cluster. At each step of iteration, the most heterogeneous cluster is divided into two. The process is iterated until all objects are in their own cluster.</p>
<p>Recall that, divisive clustering is good at identifying large clusters while agglomerative clustering is good at identifying small clusters.</p>
</div>
<div id="computation" class="section level2">
<h2>Computation</h2>
<p>The R function <strong>diana</strong>() [<em>cluster</em> package] can be used to compute divisive clustering. It returns an object of class “diana” (see ?diana.object) which has also methods for the functions: print(), summary(), plot(), pltree(), as.dendrogram(), as.hclust() and cutree().</p>
<p>The output of DIANA can be visualized as <em>dendrograms</em> using the function <em>fviz_dend</em>() [<em>factoextra</em> package]. For example, the following R code shows how to computes and visualize divise clustering:</p>
<pre class="r"><code># Compute diana()
library(cluster)
res.diana <- diana(USArrests, stand = TRUE)
# Plot the dendrogram
library(factoextra)
fviz_dend(res.diana, cex = 0.5,
          k = 4, # Cut in four groups
          palette = "jco" # Color palette
          )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/009c-divisive-hierarchical-clustering-compute-diana-1.png" width="518.4" /></p>
<p>For interpreting dendrograms, read the “agglomerative clustering” chapter.</p>
</div>
<br/>
<p>Related Book:</p>
<div class = "small-block content-privileged-friends cluster-book">
    <center>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/17-practical-guide-to-cluster-analysis-in-r/">
          <img src = "https://www.sthda.com/english/sthda-upload/images/cluster-analysis/clustering-book-cover.png" /><br/>
      Practical Guide to Cluster Analysis in R
      </a>
      </center>
</div>
<div class="spacer"></div>
</div><!--end rdoc-->

<!-- END HTML -->]]></description>
			<pubDate>Wed, 06 Sep 2017 17:22:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Heatmap - Static and Interactive: Absolute Guide]]></title>
			<link>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/93-heatmap-static-and-interactive-absolute-guide/</link>
			<guid>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/93-heatmap-static-and-interactive-absolute-guide/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">
<p>A <strong>heatmap</strong> (or <strong>heat map</strong>) is another way to visualize hierarchical clustering. It’s also called a false colored image, where data values are transformed to color scale.</p>
<p>Heat maps allow us to simultaneously visualize clusters of samples and features. First hierarchical clustering is done of both the rows and the columns of the data matrix. The columns/rows of the data matrix are re-ordered according to the hierarchical clustering result, putting similar observations close to each other. The blocks of ‘high’ and ‘low’ values are adjacent in the data matrix. Finally, a color scheme is applied for the visualization and the data matrix is displayed. Visualizing the data matrix in this way can help to find the variables that appear to be characteristic for each sample cluster.</p>
<div class="block">
<p>
Previously, we described how to visualize dendrograms. Here, we’ll demonstrate how to draw and arrange a heatmap in R.
</p>
</div>
<br/>
<p>Contents: </p>
<div id="TOC">
<ul>
<li><a href="#r-packagesfunctions-for-drawing-heatmaps">R Packages/functions for drawing heatmaps</a></li>
<li><a href="#data-preparation">Data preparation</a></li>
<li><a href="#r-base-heatmap-heatmap">R base heatmap: heatmap()</a></li>
<li><a href="#enhanced-heat-maps-heatmap.2">Enhanced heat maps: heatmap.2()</a></li>
<li><a href="#pretty-heat-maps-pheatmap">Pretty heat maps: pheatmap()</a></li>
<li><a href="#interactive-heat-maps-d3heatmap">Interactive heat maps: d3heatmap()</a></li>
<li><a href="#enhancing-heatmaps-using-dendextend">Enhancing heatmaps using dendextend</a></li>
<li><a href="#complex-heatmap">Complex heatmap</a><ul>
<li><a href="#simple-heatmap">Simple heatmap</a></li>
<li><a href="#splitting-heatmap-by-rows">Splitting heatmap by rows</a></li>
<li><a href="#heatmap-annotation">Heatmap annotation</a></li>
</ul></li>
<li><a href="#application-to-gene-expression-matrix">Application to gene expression matrix</a></li>
<li><a href="#visualizing-the-distribution-of-columns-in-matrix">Visualizing the distribution of columns in matrix</a></li>
<li><a href="#summary">Summary</a></li>
</ul>
</div>
<br/>
<p>Related Book:</p>
<div class = "small-block content-privileged-friends cluster-book">
    <center>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/17-practical-guide-to-cluster-analysis-in-r/">
          <img src = "https://www.sthda.com/english/sthda-upload/images/cluster-analysis/clustering-book-cover.png" /><br/>
      Practical Guide to Cluster Analysis in R
      </a>
      </center>
</div>
<div class="spacer"></div>
<div id="r-packagesfunctions-for-drawing-heatmaps" class="section level2">
<h2>R Packages/functions for drawing heatmaps</h2>
<p>There are a multiple numbers of R packages and functions for drawing interactive and static heatmaps, including:</p>
<ul>
<li><em>heatmap</em>() [R base function, <em>stats</em> package]: Draws a simple heatmap</li>
<li><em>heatmap.2</em>() [<em>gplots</em> R package]: Draws an enhanced heatmap compared to the R base function.</li>
<li><em>pheatmap</em>() [<em>pheatmap</em> R package]: Draws pretty heatmaps and provides more control to change the appearance of heatmaps.</li>
<li><em>d3heatmap</em>() [<em>d3heatmap</em> R package]: Draws an interactive/clickable heatmap</li>
<li><em>Heatmap</em>() [<em>ComplexHeatmap</em> R/Bioconductor package]: Draws, annotates and arranges complex heatmaps (very useful for genomic data analysis)</li>
</ul>
<p>Here, we start by describing the 5 R functions for drawing heatmaps. Next, we’ll focus on the <em>ComplexHeatmap</em> package, which provides a flexible solution to arrange and annotate multiple heatmaps. It allows also to visualize the association between different data from different sources.</p>
</div>
<div id="data-preparation" class="section level2">
<h2>Data preparation</h2>
<p>We use mtcars data as a demo data set. We start by standardizing the data to make variables comparable:</p>
<pre class="r"><code>df <- scale(mtcars)</code></pre>
</div>
<div id="r-base-heatmap-heatmap" class="section level2">
<h2>R base heatmap: heatmap()</h2>
<p>The built-in R <em>heatmap</em>() function [in <em>stats</em> package] can be used.</p>
<p>A simplified format is:</p>
<pre class="r"><code>heatmap(x, scale = "row")</code></pre>
<ul>
<li><strong>x</strong>: a numeric matrix</li>
<li><strong>scale</strong>: a character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. Allowed values are in c(“row”, “column”, “none”). Default is “row”.</li>
</ul>
<pre class="r"><code># Default plot
heatmap(df, scale = "none")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-r-base-heatmap-1.png" width="518.4" /></p>
<div class="success">
<p>
In the plot above, high values are in red and low values are in yellow.
</p>
</div>
<p>It’s possible to specify a color palette using the argument <em>col</em>, which can be defined as follow:</p>
<ul>
<li>Using custom colors:</li>
</ul>
<pre class="r"><code>col<- colorRampPalette(c("red", "white", "blue"))(256)</code></pre>
<ul>
<li>Or, using RColorBrewer color palette:</li>
</ul>
<pre class="r"><code>library("RColorBrewer")
col <- colorRampPalette(brewer.pal(10, "RdYlBu"))(256)</code></pre>
<p>Additionally, you can use the argument <em>RowSideColors</em> and <em>ColSideColors</em> to annotate rows and columns, respectively.</p>
<p>For example, in the the R code below will customize the heatmap as follow:</p>
<ol style="list-style-type: decimal">
<li>An RColorBrewer color palette name is used to change the appearance</li>
<li>The argument <em>RowSideColors</em> and <em>ColSideColors</em> are used to annotate rows and columns respectively. The expected values for these options are a vector containing color names specifying the classes for rows/columns.</li>
</ol>
<pre class="r"><code># Use RColorBrewer color palette names
library("RColorBrewer")
col <- colorRampPalette(brewer.pal(10, "RdYlBu"))(256)
heatmap(df, scale = "none", col =  col, 
        RowSideColors = rep(c("blue", "pink"), each = 16),
        ColSideColors = c(rep("purple", 5), rep("orange", 6)))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-r-base-heatmap-color-1.png" width="518.4" /></p>
</div>
<div id="enhanced-heat-maps-heatmap.2" class="section level2">
<h2>Enhanced heat maps: heatmap.2()</h2>
<p>The function <em>heatmap.2</em>() [in <em>gplots</em> package] provides many extensions to the standard R <em>heatmap</em>() function presented in the previous section.</p>
<pre class="r"><code># install.packages("gplots")
library("gplots")
heatmap.2(df, scale = "none", col = bluered(100), 
          trace = "none", density.info = "none")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-gplots-heatmap-2-1.png" width="518.4" /></p>
<p>Other arguments can be used including:</p>
<ul>
<li><em>labRow</em>, <em>labCol</em></li>
<li><em>hclustfun</em>: hclustfun=function(x) hclust(x, method=“ward”)</li>
</ul>
<p>In the R code above, the <em>bluered</em>() function [in <em>gplots</em> package] is used to generate a smoothly varying set of colors. You can also use the following color generator functions:</p>
<ul>
<li><em>colorpanel</em>(n, low, mid, high)
<ul>
<li><em>n</em>: Desired number of color elements to be generated</li>
<li><em>low, mid, high</em>: Colors to use for the Lowest, middle, and highest values. mid may be omitted.</li>
</ul></li>
<li><em>redgreen</em>(n), <em>greenred</em>(n), <em>bluered</em>(n) and <em>redblue</em>(n)</li>
</ul>
</div>
<div id="pretty-heat-maps-pheatmap" class="section level2">
<h2>Pretty heat maps: pheatmap()</h2>
<p>First, install the <em>pheatmap</em> package: install.packages(“pheatmap”); then type this:</p>
<pre class="r"><code>library("pheatmap")
pheatmap(df, cutree_rows = 4)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-pheatmap-1.png" width="518.4" /></p>
<p>Arguments are available for changing the default clustering metric (“euclidean”) and method (“complete”). It’s also possible to annotate rows and columns using grouping variables.</p>
</div>
<div id="interactive-heat-maps-d3heatmap" class="section level2">
<h2>Interactive heat maps: d3heatmap()</h2>
<p>First, install the <em>d3heatmap</em> package: install.packages(“d3heatmap”); then type this:</p>
<pre class="r"><code>library("d3heatmap")
d3heatmap(scale(mtcars), colors = "RdYlBu",
          k_row = 4, # Number of groups in rows
          k_col = 2 # Number of groups in columns
          )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/images/cluster-analysis/interactive-heatmap.png" alt="Interactive heatmap" /></p>
<p>The <em>d3heamap</em>() function makes it possible to:</p>
<ul>
<li>Put the mouse on a heatmap cell of interest to view the row and the column names as well as the corresponding value.</li>
<li>Select an area for zooming. After zooming, click on the heatmap again to go back to the previous display</li>
</ul>
</div>
<div id="enhancing-heatmaps-using-dendextend" class="section level2">
<h2>Enhancing heatmaps using dendextend</h2>
<p>The package <em>dendextend</em> can be used to enhance functions from other packages. The <em>mtcars</em> data is used in the following sections. We’ll start by defining the order and the appearance for rows and columns using dendextend. These results are used in others functions from others packages.</p>
<p>The order and the appearance for rows and columns can be defined as follow:</p>
<pre class="r"><code>library(dendextend)
# order for rows
Rowv  <- mtcars %>% scale %>% dist %>% hclust %>% as.dendrogram %>%
   set("branches_k_color", k = 3) %>% set("branches_lwd", 1.2) %>%
   ladderize
# Order for columns: We must transpose the data
Colv  <- mtcars %>% scale %>% t %>% dist %>% hclust %>% as.dendrogram %>%
   set("branches_k_color", k = 2, value = c("orange", "blue")) %>%
   set("branches_lwd", 1.2) %>%
   ladderize</code></pre>
<p>The arguments above can be used in the functions below:</p>
<ol style="list-style-type: decimal">
<li>The standard <em>heatmap</em>() function [in <em>stats</em> package]:</li>
</ol>
<pre class="r"><code>heatmap(scale(mtcars), Rowv = Rowv, Colv = Colv,
        scale = "none")</code></pre>
<ol start="2" style="list-style-type: decimal">
<li>The enhanced <em>heatmap.2</em>() function [in <em>gplots</em> package]:</li>
</ol>
<pre class="r"><code>library(gplots)
heatmap.2(scale(mtcars), scale = "none", col = bluered(100), 
          Rowv = Rowv, Colv = Colv,
          trace = "none", density.info = "none")</code></pre>
<ol start="3" style="list-style-type: decimal">
<li>The interactive heatmap generator <em>d3heatmap</em>() function [in <em>d3heatmap</em> package]:</li>
</ol>
<pre class="r"><code>library("d3heatmap")
d3heatmap(scale(mtcars), colors = "RdBu",
          Rowv = Rowv, Colv = Colv)</code></pre>
</div>
<div id="complex-heatmap" class="section level2">
<h2>Complex heatmap</h2>
<p><strong>ComplexHeatmap</strong> is an R/bioconductor package, developed by Zuguang Gu, which provides a flexible solution to arrange and annotate multiple heatmaps. It allows also to visualize the association between different data from different sources.</p>
<p>It can be installed as follow:</p>
<pre class="r"><code>source("https://bioconductor.org/biocLite.R")
biocLite("ComplexHeatmap")</code></pre>
<div id="simple-heatmap" class="section level3">
<h3>Simple heatmap</h3>
<p>You can draw a simple heatmap as follow:</p>
<pre class="r"><code>library(ComplexHeatmap)
Heatmap(df, 
        name = "mtcars", #title of legend
        column_title = "Variables", row_title = "Samples",
        row_names_gp = gpar(fontsize = 7) # Text size for row names
        )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-simple-heatmap-1.png" width="518.4" /></p>
<p>Additional arguments:</p>
<ol style="list-style-type: decimal">
<li>show_row_names, show_column_names: whether to show row and column names, respectively. Default value is TRUE</li>
<li>show_row_hclust, show_column_hclust: logical value; whether to show row and column clusters. Default is TRUE</li>
<li>clustering_distance_rows, clustering_distance_columns: metric for clustering: “euclidean”, “maximum”, “manhattan”, “canberra”, “binary”, “minkowski”, “pearson”, “spearman”, “kendall”)</li>
<li>clustering_method_rows, clustering_method_columns: clustering methods: “ward.D”, “ward.D2”, “single”, “complete”, “average”, … (see <strong>?hclust</strong>).</li>
</ol>
<p>To specify a custom colors, you must use the the <em>colorRamp2</em>() function [<em>circlize</em> package], as follow:</p>
<pre class="r"><code>library(circlize)
mycols <- colorRamp2(breaks = c(-2, 0, 2), 
                    colors = c("green", "white", "red"))
Heatmap(df, name = "mtcars", col = mycols)</code></pre>
<p>It’s also possible to use <strong>RColorBrewer</strong> color palettes:</p>
<pre class="r"><code>library("circlize")
library("RColorBrewer")
Heatmap(df, name = "mtcars",
        col = colorRamp2(c(-2, 0, 2), brewer.pal(n=3, name="RdBu")))</code></pre>
<p>We can also customize the appearance of dendograms using the function <em>color_branches</em>() [<em>dendextend</em> package]:</p>
<pre class="r"><code>library(dendextend)
row_dend = hclust(dist(df)) # row clustering
col_dend = hclust(dist(t(df))) # column clustering
Heatmap(df, name = "mtcars", 
        row_names_gp = gpar(fontsize = 6.5),
        cluster_rows = color_branches(row_dend, k = 4),
        cluster_columns = color_branches(col_dend, k = 2))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-dendogram-appearance-1.png" width="518.4" /></p>
</div>
<div id="splitting-heatmap-by-rows" class="section level3">
<h3>Splitting heatmap by rows</h3>
<p>You can split the heatmap using either the k-means algorithm or a grouping variable.</p>
<div class="notice">
<p>
It’s important to use the set.seed() function when performing k-means so that the results obtained can be reproduced precisely at a later time.
</p>
</div>
<ul>
<li>To split the dendrogram using k-means, type this:</li>
</ul>
<pre class="r"><code># Divide into 2 groups
set.seed(2)
Heatmap(df, name = "mtcars", k = 2)</code></pre>
<ul>
<li>To split by a grouping variable, use the argument <em>split</em>. In the following example we’ll use the levels of the factor variable cyl [in mtcars data set] to split the heatmap by rows. Recall that the column cyl corresponds to the number of cylinders.</li>
</ul>
<pre class="r"><code># split by a vector specifying rowgroups
Heatmap(df, name = "mtcars", split = mtcars$cyl,
        row_names_gp = gpar(fontsize = 7))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-split-heatmap-1.png" width="518.4" /></p>
<div class="notice">
<p>
Note that, <em>split</em> can be also a data frame in which different combinations of levels split the rows of the heatmap.
</p>
</div>
<pre class="r"><code># Split by combining multiple variables
Heatmap(df, name ="mtcars", 
        split = data.frame(cyl = mtcars$cyl, am = mtcars$am),
        row_names_gp = gpar(fontsize = 7))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-split-heatmap-multiple-variables-1.png" width="518.4" /></p>
<ul>
<li>It’s also possible to combine km and split:</li>
</ul>
<pre class="r"><code>Heatmap(df, name ="mtcars", col = mycol,
        km = 2, split =  mtcars$cyl)</code></pre>
<ul>
<li>If you want to use other partitioning method, rather than k-means, you can easily do it by just assigning the partitioning vector to split. In the R code below, we’ll use <em>pam</em>() function [cluster package]. pam() stands for Partitioning of the data into k clusters “around medoids”, a more robust version of K-means.</li>
</ul>
<pre class="r"><code># install.packages("cluster")
library("cluster")
set.seed(2)
pa = pam(df, k = 3)
Heatmap(df, name = "mtcars", col = mycol,
        split = paste0("pam", pa$clustering))</code></pre>
</div>
<div id="heatmap-annotation" class="section level3">
<h3>Heatmap annotation</h3>
<p>The <em>HeatmapAnnotation</em> class is used to define annotation on row or column. A simplified format is:</p>
<pre class="r"><code>HeatmapAnnotation(df, name, col, show_legend)</code></pre>
<ul>
<li><strong>df</strong>: a data.frame with column names</li>
<li><strong>name</strong>: the name of the heatmap annotation</li>
<li><strong>col</strong>: a list of colors which contains color mapping to columns in df</li>
</ul>
<p>For the example below, we’ll transpose our data to have the observations in columns and the variables in rows.</p>
<pre class="r"><code>df <- t(df)</code></pre>
<div id="simple-annotation" class="section level4">
<h4>Simple annotation</h4>
<p>A vector, containing discrete or continuous values, is used to annotate rows or columns. We’ll use the qualitative variables <em>cyl</em> (levels = “4”, “5” and “8”) and <em>am</em> (levels = “0” and “1”), and the continuous variable <em>mpg</em> to annotate columns.</p>
<p>For each of these 3 variables, custom colors are defined as follow:</p>
<pre class="r"><code># Annotation data frame
annot_df <- data.frame(cyl = mtcars$cyl, am = mtcars$am, 
                       mpg = mtcars$mpg)
# Define colors for each levels of qualitative variables
# Define gradient color for continuous variable (mpg)
col = list(cyl = c("4" = "green", "6" = "gray", "8" = "darkred"),
            am = c("0" = "yellow", "1" = "orange"),
            mpg = circlize::colorRamp2(c(17, 25), 
                                       c("lightblue", "purple")) )
# Create the heatmap annotation
ha <- HeatmapAnnotation(annot_df, col = col)
# Combine the heatmap and the annotation
Heatmap(df, name = "mtcars",
        top_annotation = ha)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/012-heatmap-heatmap-annotation-1.png" width="518.4" /></p>
<div class="block">
<p>It’s possible to hide the annotation legend using the argument <em>show_legend = FALSE</em> as follow:</p>
<pre class="r"><code>ha <- HeatmapAnnotation(annot_df, col = col, show_legend = FALSE)
Heatmap(df, name = "mtcars", top_annotation = ha)</code></pre>
</div>
</div>
<div id="complex-annotation" class="section level4">
<h4>Complex annotation</h4>
<p>In this section we’ll see how to combine heatmap and some basic graphs to show the data distribution. For simple annotation graphics, the following functions can be used: <em>anno_points</em>(), <em>anno_barplot</em>(), <em>anno_boxplot</em>(), <em>anno_density</em>() and <em>anno_histogram</em>().</p>
<p>An example is shown below:</p>
<pre class="r"><code># Define some graphics to display the distribution of columns
.hist = anno_histogram(df, gp = gpar(fill = "lightblue"))
.density = anno_density(df, type = "line", gp = gpar(col = "blue"))
ha_mix_top = HeatmapAnnotation(hist = .hist, density = .density)
# Define some graphics to display the distribution of rows
.violin = anno_density(df, type = "violin", 
                       gp = gpar(fill = "lightblue"), which = "row")
.boxplot = anno_boxplot(df, which = "row")
ha_mix_right = HeatmapAnnotation(violin = .violin, bxplt = .boxplot,
                              which = "row", width = unit(4, "cm"))
# Combine annotation with heatmap
Heatmap(df, name = "mtcars", 
        column_names_gp = gpar(fontsize = 8),
        top_annotation = ha_mix_top, 
        top_annotation_height = unit(3.8, "cm")) + ha_mix_right</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/images/cluster-analysis/complex-heatmap-annotation.png" alt="Complex heatmap annotation" /></p>
</div>
<div id="combining-multiple-heatmaps" class="section level4">
<h4>Combining multiple heatmaps</h4>
<p>Multiple heatmaps can be arranged as follow:</p>
<pre class="r"><code># Heatmap 1
ht1 = Heatmap(df, name = "ht1", km = 2,
              column_names_gp = gpar(fontsize = 9))
# Heatmap 2
ht2 = Heatmap(df, name = "ht2", 
        col = circlize::colorRamp2(c(-2, 0, 2), c("green", "white", "red")),
        column_names_gp = gpar(fontsize = 9))
# Combine the two heatmaps
ht1 + ht2</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/images/cluster-analysis/combine-multiple-heatmaps.png" alt="Combine multiple heatmaps" /></p>
<div class="notice">
<p>
You can use the option width = unit(3, “cm”)) to control the size of the heatmaps.
</p>
</div>
<div class="warning">
<p>
Note that when combining multiple heatmaps, the first heatmap is considered as the main heatmap. Some settings of the remaining heatmaps are auto-adjusted according to the setting of the main heatmap. These include: removing row clusters and titles, and adding splitting.
</p>
</div>
<p>The <em>draw</em>() function can be used to customize the appearance of the final image:</p>
<pre class="r"><code>draw(ht1 + ht2, 
    row_title = "Two heatmaps, row title", 
    row_title_gp = gpar(col = "red"),
    column_title = "Two heatmaps, column title", 
    column_title_side = "bottom",
    # Gap between heatmaps
    gap = unit(0.5, "cm"))</code></pre>
<div class="notice">
<p>
Legends can be removed using the arguments <em>show_heatmap_legend = FALSE</em>, <em>show_annotation_legend = FALSE</em>.
</p>
</div>
</div>
</div>
</div>
<div id="application-to-gene-expression-matrix" class="section level2">
<h2>Application to gene expression matrix</h2>
<p>In gene expression data, rows are genes and columns are samples. More information about genes can be attached after the expression heatmap such as gene length and type of genes.</p>
<pre class="r"><code>expr <- readRDS(paste0(system.file(package = "ComplexHeatmap"),
                      "/extdata/gene_expression.rds"))
mat <- as.matrix(expr[, grep("cell", colnames(expr))])
type <- gsub("s\\d+_", "", colnames(mat))
ha = HeatmapAnnotation(df = data.frame(type = type))
Heatmap(mat, name = "expression", km = 5, top_annotation = ha, 
    top_annotation_height = unit(4, "mm"), 
    show_row_names = FALSE, show_column_names = FALSE) +
Heatmap(expr$length, name = "length", width = unit(5, "mm"),
    col = circlize::colorRamp2(c(0, 100000), c("white", "orange"))) +
Heatmap(expr$type, name = "type", width = unit(5, "mm")) +
Heatmap(expr$chr, name = "chr", width = unit(5, "mm"),
    col = circlize::rand_color(length(unique(expr$chr))))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/images/cluster-analysis/heatmap-gene-expression-data.png" alt="Heatmap gene expression data" /></p>
<div class="notice">
<p>
It’s also possible to visualize genomic alterations and to integrate different molecular levels (gene expression, DNA methylation, …). Read the vignette, on Bioconductor, for further examples.
</p>
</div>
</div>
<div id="visualizing-the-distribution-of-columns-in-matrix" class="section level2">
<h2>Visualizing the distribution of columns in matrix</h2>
<pre class="r"><code>densityHeatmap(scale(mtcars))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/images/cluster-analysis/matrix-column-distribution.png" alt="Matrix column distribution" /></p>
<p>The dashed lines on the heatmap correspond to the five quantile numbers. The text for the five quantile levels are added in the right of the heatmap.</p>
</div>
<div id="summary" class="section level2">
<h2>Summary</h2>
<p>We described many functions for drawing heatmaps in R (from basic to complex heatmaps). A basic heatmap can be produced using either the R base function <em>heatmap</em>() or the function <em>heatmap.2</em>() [in the <em>gplots</em> package].
The <em>pheatmap</em>() function, in the package of the same name, creates pretty heatmaps, where ones has better control over some graphical parameters such as cell size.</p>
<p>The <em>Heatmap</em>() function [in <em>ComplexHeatmap</em> package] allows us to easily, draw, annotate and arrange complex heatmaps. This might be very useful in genomic fields.</p>
</div>
</div><!--end rdoc-->
 
<!-- END HTML -->]]></description>
			<pubDate>Wed, 06 Sep 2017 12:46:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Visualizing Dendrograms: Ultimate Guide]]></title>
			<link>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/92-visualizing-dendrograms-ultimate-guide/</link>
			<guid>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/92-visualizing-dendrograms-ultimate-guide/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">
<p>As described in previous chapters, a <strong>dendrogram</strong> is a tree-based representation of a data created using hierarchical clustering methods (Chapter @ref(agglomerative-clustering)). In this article, we provide R code for <strong>visualizing</strong> and customizing dendrograms. Additionally, we show how to save and to zoom a large dendrogram.</p>
<br/>
<p>Contents: </p>
<div id="TOC">
<ul>
<li><a href="#visualizing-dendrograms">Visualizing dendrograms</a></li>
<li><a href="#case-of-dendrogram-with-large-data-sets">Case of dendrogram with large data sets</a><ul>
<li><a href="#zooming-in-the-dendrogram">Zooming in the dendrogram</a></li>
<li><a href="#plotting-a-sub-tree-of-dendrograms">Plotting a sub-tree of dendrograms</a></li>
<li><a href="#saving-dendrogram-into-a-large-pdf-page">Saving dendrogram into a large PDF page</a></li>
</ul></li>
<li><a href="#manipulating-dendrograms-using-dendextend">Manipulating dendrograms using dendextend</a></li>
<li><a href="#summary">Summary</a></li>
</ul>
</div>
<br/>
<p>The Book:</p>
<div class = "small-block content-privileged-friends cluster-book">
    <center>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/17-practical-guide-to-cluster-analysis-in-r/">
          <img src = "https://www.sthda.com/english/sthda-upload/images/cluster-analysis/clustering-book-cover.png" /><br/>
      Practical Guide to Cluster Analysis in R
      </a>
      </center>
</div>
<div class="spacer"></div>
<p>We start by computing hierarchical clustering using the USArrests data sets:</p>
<pre class="r"><code># Load data
data(USArrests)
# Compute distances and hierarchical clustering
dd <- dist(scale(USArrests), method = "euclidean")
hc <- hclust(dd, method = "ward.D2")</code></pre>
<p>To visualize the dendrogram, we’ll use the following R functions and packages:</p>
<ul>
<li><em>fviz_dend</em>()[in factoextra R package] to create easily a ggplot2-based beautiful dendrogram.</li>
<li><em>dendextend</em> package to manipulate dendrograms</li>
</ul>
<p>Before continuing, install the required package as follow:</p>
<pre class="r"><code>install.packages(c("factoextra", "dendextend"))</code></pre>
<div id="visualizing-dendrograms" class="section level2">
<h2>Visualizing dendrograms</h2>
<p>We’ll use the function <em>fviz_dend</em>()[in <em>factoextra</em> R package] to create easily a beautiful dendrogram using either the R base plot or ggplot2. It provides also an option for drawing circular dendrograms and phylogenic-like trees.</p>
<p>To create a basic dendrograms, type this:</p>
<pre class="r"><code>library(factoextra)
fviz_dend(hc, cex = 0.5)</code></pre>
<p>You can use the arguments main, sub, xlab, ylab to change plot titles as follow:</p>
<pre class="r"><code>fviz_dend(hc, cex = 0.5, 
          main = "Dendrogram - ward.D2",
          xlab = "Objects", ylab = "Distance", sub = "")</code></pre>
<p>To draw a horizontal dendrogram, type this:</p>
<pre class="r"><code>fviz_dend(hc, cex = 0.5, horiz = TRUE)</code></pre>
<p>It’s also possible to cut the tree at a given height for partitioning the data into multiple groups as described in the previous chapter: Hierarchical clustering (Chapter @ref(agglomerative-clustering)). In this case, it’s possible to color branches by groups and to add rectangle around each group.</p>
<p>For example:</p>
<pre class="r"><code>fviz_dend(hc, k = 4, # Cut in four groups
          cex = 0.5, # label size
          k_colors = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"),
          color_labels_by_k = TRUE, # color labels by groups
          rect = TRUE, # Add rectangle around groups
          rect_border = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"), 
          rect_fill = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-cutree-1.png" width="518.4" /></p>
<p>To change the plot theme, use the argument ggtheme, which allowed values include ggplot2 official themes [ <em>theme_gray</em>(), <em>theme_bw</em>(), <em>theme_minimal</em>(), <em>theme_classic</em>(), <em>theme_void</em>()] or any other user-defined ggplot2 themes.</p>
<pre class="r"><code>fviz_dend(hc, k = 4,                 # Cut in four groups
          cex = 0.5,                 # label size
          k_colors = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"),
          color_labels_by_k = TRUE,  # color labels by groups
          ggtheme = theme_gray()     # Change theme
          )</code></pre>
<p>Allowed values for k_color include brewer palettes from <em>RColorBrewer</em> Package (e.g. “RdBu”, “Blues”, “Dark2”, “Set2”, …; ) and scientific journal palettes from <em>ggsci</em> R package (e.g.: “npg”, “aaas”, “lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty”).</p>
<p>In the R code below, we’ll change group colors using “jco” (journal of clinical oncology) color palette:</p>
<pre class="r"><code>fviz_dend(hc, cex = 0.5, k = 4, # Cut in four groups
          k_colors = "jco")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-ggplot2-dendrogram-color-palette-1.png" width="518.4" /></p>
<p>If you want to draw a horizontal dendrogram with rectangle around clusters, use this:</p>
<pre class="r"><code>fviz_dend(hc, k = 4, cex = 0.4, horiz = TRUE,  k_colors = "jco", 
          rect = TRUE, rect_border = "jco", rect_fill = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-horizontal-dendrograms-rectangle-1.png" width="518.4" /></p>
<p>Additionally, you can plot a circular dendrogram using the option type = “circular”.</p>
<pre class="r"><code>fviz_dend(hc, cex = 0.5, k = 4, 
          k_colors = "jco", type = "circular")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-ggplot2-dendrogram-circular-1.png" width="355.2" /></p>
<p>To plot a phylogenic-like tree, use type = “phylogenic” and repel = TRUE (to avoid labels overplotting). This functionality requires the R package <em>igraph</em>. Make sure that it’s installed before typing the following R code.</p>
<pre class="r"><code>require("igraph")
fviz_dend(hc, k = 4, k_colors = "jco",
          type = "phylogenic", repel = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-phylogenic-tree-1.png" width="624" /></p>
<p>The default layout for phylogenic trees is “layout.auto”. Allowed values are one of: c(“layout.auto”, “layout_with_drl”, “layout_as_tree”, “layout.gem”, “layout.mds”, “layout_with_lgl”). To read more about these layouts, read the documentation of the igraph R package.</p>
<p>Let’s try phylo.layout = “layout.gem”:</p>
<pre class="r"><code>require("igraph")
fviz_dend(hc, k = 4, # Cut in four groups
          k_colors = "jco",
          type = "phylogenic", repel = TRUE,
          phylo_layout = "layout.gem")</code></pre>
</div>
<div id="case-of-dendrogram-with-large-data-sets" class="section level2">
<h2>Case of dendrogram with large data sets</h2>
<p>If you compute hierarchical clustering on a large data set, you might want to zoom in the dendrogram or to plot only a subset of the dendrogram.</p>
<p>Alternatively, you could also plot the dendrogram to a large page on a PDF, which can be zoomed without loss of resolution.</p>
<div id="zooming-in-the-dendrogram" class="section level3">
<h3>Zooming in the dendrogram</h3>
<p>If you want to zoom in the first clusters, its possible to use the option xlim and ylim to limit the plot area. For example, type the code below:</p>
<pre class="r"><code>fviz_dend(hc, xlim = c(1, 20), ylim = c(1, 8))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-dendrogram-zoom-1.png" width="518.4" /></p>
</div>
<div id="plotting-a-sub-tree-of-dendrograms" class="section level3">
<h3>Plotting a sub-tree of dendrograms</h3>
<p>To plot a sub-tree, we’ll follow the procedure below:</p>
<ol style="list-style-type: decimal">
<li><p>Create the whole dendrogram using <em>fviz_dend</em>() and save the result into an object, named dend_plot for example.</p></li>
<li><p>Use the R base function <em>cut.dendrogram</em>() to cut the dendrogram, at a given height (h), into multiple sub-trees. This returns a list with components $upper and $lower, the first is a truncated version of the original tree, also of class dendrogram, the latter a list with the branches obtained from cutting the tree, each a dendrogram.</p></li>
<li><p>Visualize sub-trees using <em>fviz_dend</em>().</p></li>
</ol>
<p>The R code is as follow.</p>
<ul>
<li>Cut the dendrogram and visualize the truncated version:</li>
</ul>
<pre class="r"><code># Create a plot of the whole dendrogram,
# and extract the dendrogram data
dend_plot <- fviz_dend(hc, k = 4, # Cut in four groups
          cex = 0.5, # label size
          k_colors = "jco"
          )
dend_data <- attr(dend_plot, "dendrogram") # Extract dendrogram data
# Cut the dendrogram at height h = 10
dend_cuts <- cut(dend_data, h = 10)
# Visualize the truncated version containing
# two branches
fviz_dend(dend_cuts$upper)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-cut-dendrogram-1.png" width="288" /></p>
<ul>
<li>Plot dendrograms sub-trees:</li>
</ul>
<pre class="r"><code># Plot the whole dendrogram
print(dend_plot)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-whole-dendrogam-1.png" width="518.4" /></p>
<pre class="r"><code># Plot subtree 1
fviz_dend(dend_cuts$lower[[1]], main = "Subtree 1")
# Plot subtree 2
fviz_dend(dend_cuts$lower[[2]], main = "Subtree 2")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-subtree-1.png" width="307.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-subtree-2.png" width="307.2" /></p>
<p>You can also plot circular trees as follow:</p>
<pre class="r"><code>fviz_dend(dend_cuts$lower[[2]], type = "circular")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/011-visualizing-dendrograms-sub-tree-circular-1.png" width="307.2" /></p>
</div>
<div id="saving-dendrogram-into-a-large-pdf-page" class="section level3">
<h3>Saving dendrogram into a large PDF page</h3>
<p>If you have a large dendrogram, you can save it to a large PDF page, which can be zoomed without loss of resolution.</p>
<pre class="r"><code>pdf("dendrogram.pdf", width=30, height=15)            # Open a PDF
p <- fviz_dend(hc, k = 4, cex = 1, k_colors = "jco" ) # Do plotting
print(p)
dev.off()                                             # Close the PDF</code></pre>
</div>
</div>
<div id="manipulating-dendrograms-using-dendextend" class="section level2">
<h2>Manipulating dendrograms using dendextend</h2>
<p>The package <em>dendextend</em> provide functions for changing easily the appearance of a dendrogram and for comparing dendrograms.</p>
<p>In this section we’ll use the chaining operator (<em>%>%</em>) to simplify our code. The chaining operator turns x %>% f(y) into f(x, y) so you can use it to rewrite multiple operations such that they can be read from left-to-right, top-to-bottom. For instance, the results of the two R codes below are equivalent.</p>
<ul>
<li>Standard R code for creating a dendrogram:</li>
</ul>
<pre class="r"><code>data <- scale(USArrests)
dist.res <- dist(data)
hc <- hclust(dist.res, method = "ward.D2")
dend <- as.dendrogram(hc)
plot(dend)</code></pre>
<ul>
<li>R code for creating a dendrogram using chaining operator:</li>
</ul>
<pre class="r"><code>library(dendextend)
dend <- USArrests[1:5,] %>% # data
        scale %>% # Scale the data
        dist %>% # calculate a distance matrix, 
        hclust(method = "ward.D2") %>% # Hierarchical clustering 
        as.dendrogram # Turn the object into a dendrogram.
plot(dend)</code></pre>
<ul>
<li>Functions to customize dendrograms: The function <em>set</em>() [in dendextend package] can be used to change the parameters of a dendrogram. The format is:</li>
</ul>
<pre class="r"><code>set(object, what, value)</code></pre>
<div class="block">
<ol style="list-style-type: decimal">
<li>
<strong>object</strong>: a dendrogram object
</li>
<li>
<strong>what</strong>: a character indicating what is the property of the tree that should be set/updated
</li>
<li>
<strong>value</strong>: a vector with the value to set in the tree (the type of the value depends on the “what”).
</li>
</ol>
</div>
<p>Possible values for the argument <strong>what</strong> include:</p>
<table style="width:88%;">
<colgroup>
<col width="45%" />
<col width="41%" />
</colgroup>
<thead>
<tr class="header">
<th>Value for the argument <strong>what</strong></th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>labels</strong></td>
<td>set the labels</td>
</tr>
<tr class="even">
<td><strong>labels_colors</strong> and <strong>labels_cex</strong></td>
<td>Set the color and the size of labels, respectively</td>
</tr>
<tr class="odd">
<td><strong>leaves_pch</strong>, <strong>leaves_cex</strong> and <strong>leaves_col</strong></td>
<td>set the point type, size and color for leaves, respectively</td>
</tr>
<tr class="even">
<td><strong>nodes_pch</strong>, <strong>nodes_cex</strong> and <strong>nodes_col</strong></td>
<td>set the point type, size and color for nodes, respectively</td>
</tr>
<tr class="odd">
<td><strong>hang_leaves</strong></td>
<td>hang the leaves</td>
</tr>
<tr class="even">
<td><strong>branches_k_color</strong></td>
<td>color the branches</td>
</tr>
<tr class="odd">
<td><strong>branches_col</strong>, <strong>branches_lwd </strong>, <strong>branches_lty</strong></td>
<td>Set the color, the line width and the line type of branches, respectively</td>
</tr>
<tr class="even">
<td><strong>by_labels_branches_col</strong>, <strong>by_labels_branches_lwd</strong> and <strong>by_labels_branches_lty </strong></td>
<td>Set the color, the line width and the line type of branches with specific labels, respectively</td>
</tr>
<tr class="odd">
<td><strong>clear_branches</strong> and <strong>clear_leaves</strong></td>
<td>Clear branches and leaves, respectively</td>
</tr>
</tbody>
</table>
<ul>
<li>Examples:</li>
</ul>
<pre class="r"><code>library(dendextend)
# 1. Create a customized dendrogram
mycols <- c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07")
dend <-  as.dendrogram(hc) %>%
   set("branches_lwd", 1) %>% # Branches line width
   set("branches_k_color", mycols, k = 4) %>% # Color branches by groups
   set("labels_colors", mycols, k = 4) %>%  # Color labels by groups
   set("labels_cex", 0.5) # Change label size
# 2. Create plot 
fviz_dend(dend) </code></pre>
</div>
<div id="summary" class="section level2">
<h2>Summary</h2>
<p>We described functions and packages for visualizing and customizing dendrograms including:</p>
<ul>
<li><em>fviz_dend</em>() [in factoextra R package], which provides convenient solutions for plotting easily a beautiful dendrogram. It can be used to create rectangular and circular dendrograms, as well as, a phylogenic tree.</li>
<li>and the <em>dendextend</em> package, which provides a flexible methods to customize dendrograms.</li>
</ul>
<p>Additionally, we described how to plot a subset of large dendrograms.</p>
</div>
</div><!--end rdoc-->

<!-- END HTML -->]]></description>
			<pubDate>Wed, 06 Sep 2017 11:29:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Comparing Dendrograms: Essentials]]></title>
			<link>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/91-comparing-dendrograms-essentials/</link>
			<guid>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/91-comparing-dendrograms-essentials/</guid>
			<description><![CDATA[<!-- START HTML -->


  <div id="rdoc">





<p>After showing how to compute hierarchical clustering (Chapter @ref(agglomerative-clustering)), we describe, here, how to <strong>compare two dendrograms</strong> using the <em>dendextend</em> R package.</p>
<p>The <em>dendextend</em> package provides several functions for comparing dendrograms. Here, we’ll focus on two functions:</p>
<ul>
<li><em>tanglegram</em>() for visual comparison of two dendrograms</li>
<li>and <em>cor.dendlist</em>() for computing a correlation matrix between dendrograms.</li>
</ul>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#data-preparation">Data preparation</a></li>
<li><a href="#dendrograms-comparison">Dendrograms comparison</a></li>
</ul>
</div><br/>
<p>Related Book:</p>
<div class = "small-block content-privileged-friends cluster-book">
    <center>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/17-practical-guide-to-cluster-analysis-in-r/">
          <img src = "https://www.sthda.com/english/sthda-upload/images/cluster-analysis/clustering-book-cover.png" /><br/>
      Practical Guide to Cluster Analysis in R
      </a>
      </center>
</div>
<div class="spacer"></div>


<div id="data-preparation" class="section level2">
<h2>Data preparation</h2>
<p>We’ll use the R base USArrests data sets and we start by standardizing the variables using the function <em>scale</em>() as follow:</p>
<pre class="r"><code>df <- scale(USArrests)</code></pre>
<p>To make readable the plots, generated in the next sections, we’ll work with a small random subset of the data set. Therefore, we’ll use the function <em>sample</em>() to randomly select 10 observations among the 50 observations contained in the data set:</p>
<pre class="r"><code># Subset containing 10 rows
set.seed(123)
ss <- sample(1:50, 10)
df <- df[ss,]</code></pre>
</div>
<div id="dendrograms-comparison" class="section level2">
<h2>Dendrograms comparison</h2>
<p>We start by creating a list of two dendrograms by computing hierarchical clustering (HC) using two different linkage methods (“average” and “ward.D2”). Next, we transform the results as dendrograms and create a list to hold the two dendrograms.</p>
<pre class="r"><code>library(dendextend)

# Compute distance matrix
res.dist <- dist(df, method = "euclidean")

# Compute 2 hierarchical clusterings
hc1 <- hclust(res.dist, method = "average")
hc2 <- hclust(res.dist, method = "ward.D2")

# Create two dendrograms
dend1 <- as.dendrogram (hc1)
dend2 <- as.dendrogram (hc2)

# Create a list to hold dendrograms
dend_list <- dendlist(dend1, dend2)</code></pre>
<ol style="list-style-type: decimal">
<li><strong>Visual comparison of two dendrograms</strong></li>
</ol>
<p>To visually compare two dendrograms, we’ll use the following R functions [<em>dendextend</em> package]:</p>
<ul>
<li><em>untangle</em>(): finds the best layout to align dendrogram lists, using heuristic methods</li>
<li><em>tanglegram</em>(): plots the two dendrograms, side by side, with their labels connected by lines.</li>
<li><p><em>entanglement</em>(): computes the quality of the alignment of the two trees. Entanglement is a measure between 1 (full entanglement) and 0 (no entanglement). A lower entanglement coefficient corresponds to a good alignment.</p></li>
<li><p>Draw a tanglegram:</p></li>
</ul>
<pre class="r"><code># Align and plot two dendrograms side by side
dendlist(dend1, dend2) %>%
  untangle(method = "step1side") %>% # Find the best alignment layout
  tanglegram()                       # Draw the two dendrograms</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/010-comparing-dendrograms-compare-dendrogram-tanglegram-1.png" width="672" /></p>
<pre class="r"><code># Compute alignment quality. Lower value = good alignment quality
dendlist(dend1, dend2) %>%
  untangle(method = "step1side") %>% # Find the best alignment layout
  entanglement()                     # Alignment quality</code></pre>
<pre><code>## [1] 0.0384</code></pre>
<ul>
<li>Customized the tanglegram using many other options as follow:</li>
</ul>
<pre class="r"><code>dendlist(dend1, dend2) %>%
  untangle(method = "step1side") %>% 
  tanglegram(
    highlight_distinct_edges = FALSE, # Turn-off dashed lines
    common_subtrees_color_lines = FALSE, # Turn-off line colors
    common_subtrees_color_branches = TRUE # Color common branches 
    )</code></pre>
<div class="success">
<p>
Note that, “unique” nodes, with a combination of labels/items not present in the other tree, are highlighted with dashed lines.
</p>
</div>
<div class="warning">
<p>
Note that, just because we can get two trees to have horizontal connecting lines, it doesn’t mean these trees are identical (or even very similar topologically).
</p>
<p>
In the following section, we’ll perform correlation analysis to measure the similarity between dendrograms.
</p>
</div>
<ol start="2" style="list-style-type: decimal">
<li><strong>Correlation matrix between a list of dendrogams</strong></li>
</ol>
<p>The function <em>cor.dendlist</em>() is used to compute “<em>Baker</em>” or “<em>Cophenetic</em>” correlation matrix between a list of trees. The value can range between -1 to 1. With near 0 values meaning that the two trees are not statistically similar.</p>
<pre class="r"><code># Cophenetic correlation matrix
cor.dendlist(dend_list, method = "cophenetic")</code></pre>
<pre><code>##       [,1]  [,2]
## [1,] 1.000 0.965
## [2,] 0.965 1.000</code></pre>
<pre class="r"><code># Baker correlation matrix
cor.dendlist(dend_list, method = "baker")</code></pre>
<pre><code>##       [,1]  [,2]
## [1,] 1.000 0.962
## [2,] 0.962 1.000</code></pre>
<p>The correlation between two trees can be also computed as follow:</p>
<pre class="r"><code># Cophenetic correlation coefficient
cor_cophenetic(dend1, dend2)</code></pre>
<pre><code>## [1] 0.965</code></pre>
<pre class="r"><code># Baker correlation coefficient
cor_bakers_gamma(dend1, dend2)</code></pre>
<pre><code>## [1] 0.962</code></pre>
<p>It’s also possible to compare simultaneously multiple dendrograms. A chaining operator <em>%>%</em> is used to run multiple function at the same time. It’s useful for simplifying the code:</p>

<pre class="r"><code># Create multiple dendrograms by chaining
dend1 <- df %>% dist %>% hclust("complete") %>% as.dendrogram
dend2 <- df %>% dist %>% hclust("single") %>% as.dendrogram
dend3 <- df %>% dist %>% hclust("average") %>% as.dendrogram
dend4 <- df %>% dist %>% hclust("centroid") %>% as.dendrogram
# Compute correlation matrix
dend_list <- dendlist("Complete" = dend1, "Single" = dend2,
                      "Average" = dend3, "Centroid" = dend4)
cors <- cor.dendlist(dend_list)
# Print correlation matrix
round(cors, 2)</code></pre>
<pre><code>##          Complete Single Average Centroid
## Complete     1.00   0.76    0.99     0.75
## Single       0.76   1.00    0.80     0.84
## Average      0.99   0.80    1.00     0.74
## Centroid     0.75   0.84    0.74     1.00</code></pre>
<pre class="r"><code># Visualize the correlation matrix using corrplot package
library(corrplot)
corrplot(cors, "pie", "lower")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/010-comparing-dendrograms-compare-multiple-dendrograms-1.png" width="384" /></p>
</div>


</div><!--end rdoc-->



<!-- END HTML -->]]></description>
			<pubDate>Wed, 06 Sep 2017 03:11:00 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Agglomerative Clustering Essentials]]></title>
			<link>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/90-agglomerative-clustering-essentials/</link>
			<guid>https://www.sthda.com/english/articles/28-hierarchical-clustering-essentials/90-agglomerative-clustering-essentials/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">
<p>The <strong>agglomerative clustering</strong> is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. It’s also known as <em>AGNES</em> (<em>Agglomerative Nesting</em>). The algorithm starts by treating each object as a singleton cluster. Next, pairs of clusters are successively merged until all clusters have been merged into one big cluster containing all objects. The result is a tree-based representation of the objects, named <em>dendrogram</em>.</p>
<div class="block">
<p>
In this article we start by describing the agglomerative clustering algorithms. Next, we provide R lab sections with many examples for computing and visualizing hierarchical clustering. We continue by explaining how to interpret dendrogram. Finally, we provide R codes for cutting dendrograms into groups.
</p>
</div>
<br/>
<p>Contents: </p>
<div id="TOC">
<ul>
<li><a href="#algorithm">Algorithm</a></li>
<li><a href="#steps-to-agglomerative-hierarchical-clustering">Steps to agglomerative hierarchical clustering</a><ul>
<li><a href="#data-structure-and-preparation">Data structure and preparation</a></li>
<li><a href="#similarity-measures">Similarity measures</a></li>
<li><a href="#linkage">Linkage</a></li>
<li><a href="#dendrogram">Dendrogram</a></li>
</ul></li>
<li><a href="#verify-the-cluster-tree">Verify the cluster tree</a></li>
<li><a href="#cut-the-dendrogram-into-different-groups">Cut the dendrogram into different groups</a></li>
<li><a href="#cluster-r-package">Cluster R package</a></li>
<li><a href="#application-of-hierarchical-clustering-to-gene-expression-data-analysis">Application of hierarchical clustering to gene expression data analysis</a></li>
<li><a href="#summary">Summary</a></li>
</ul>
</div>
<br/>
<p>Related Books:</p>
<div class = "small-block content-privileged-friends cluster-book">
    <center>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/17-practical-guide-to-cluster-analysis-in-r/">
          <img src = "https://www.sthda.com/english/sthda-upload/images/cluster-analysis/clustering-book-cover.png" /><br/>
      Practical Guide to Cluster Analysis in R
      </a>
      </center>
</div>
<div class="spacer"></div>
<div id="algorithm" class="section level2">
<h2>Algorithm</h2>
<p>Agglomerative clustering works in a “bottom-up” manner. That is, each object is initially considered as a single-element cluster (leaf). At each step of the algorithm, the two clusters that are the most similar are combined into a new bigger cluster (nodes). This procedure is iterated until all points are member of just one single big cluster (root) (see figure below).</p>
<p>The inverse of agglomerative clustering is <em>divisive clustering</em>, which is also known as DIANA (<em>Divise Analysis</em>) and it works in a “top-down” manner. It begins with the root, in which all objects are included in a single cluster. At each step of iteration, the most heterogeneous cluster is divided into two. The process is iterated until all objects are in their own cluster (see figure below).</p>
<p><span class="warning">Note that, agglomerative clustering is good at identifying small clusters. Divisive clustering is good at identifying large clusters. In this article, we’ll focus mainly on agglomerative hierarchical clustering.</span></p>
<p><img src="https://www.sthda.com/english/sthda-upload/images/cluster-analysis/hierarchical-clustering-agnes-diana.png" alt="Hierarchical clustering methods" /></p>
</div>
<div id="steps-to-agglomerative-hierarchical-clustering" class="section level2">
<h2>Steps to agglomerative hierarchical clustering</h2>
<p>We’ll follow the steps below to perform agglomerative hierarchical clustering using R software:</p>
<ol style="list-style-type: decimal">
<li><p>Preparing the data</p></li>
<li><p>Computing (dis)similarity information between every pair of objects in the data set.</p></li>
<li><p>Using linkage function to group objects into hierarchical cluster tree, based on the distance information generated at step 1. Objects/clusters that are in close proximity are linked together using the linkage function.</p></li>
<li><p>Determining where to cut the hierarchical tree into clusters. This creates a partition of the data.</p></li>
</ol>
<p>We’ll describe each of these steps in the next section.</p>
<div id="data-structure-and-preparation" class="section level3">
<h3>Data structure and preparation</h3>
<p>The data should be a numeric matrix with:</p>
<ul>
<li>rows representing observations (individuals);</li>
<li>and columns representing variables.</li>
</ul>
<p>Here, we’ll use the R base USArrests data sets.</p>
<div class="warning">
<p>
Note that, it’s generally recommended to standardize variables in the data set before performing subsequent analysis. Standardization makes variables comparable, when they are measured in different scales. For example one variable can measure the height in meter and another variable can measure the weight in kg. The R function <em>scale</em>() can be used for standardization, See ?scale documentation.
</p>
</div>
<pre class="r"><code># Load the data
data("USArrests")
# Standardize the data
df <- scale(USArrests)
# Show the first 6 rows
head(df, nrow = 6)</code></pre>
<pre><code>##            Murder Assault UrbanPop     Rape
## Alabama    1.2426   0.783   -0.521 -0.00342
## Alaska     0.5079   1.107   -1.212  2.48420
## Arizona    0.0716   1.479    0.999  1.04288
## Arkansas   0.2323   0.231   -1.074 -0.18492
## California 0.2783   1.263    1.759  2.06782
## Colorado   0.0257   0.399    0.861  1.86497</code></pre>
</div>
<div id="similarity-measures" class="section level3">
<h3>Similarity measures</h3>
<p>In order to decide which objects/clusters should be combined or divided, we need methods for measuring the similarity between objects.</p>
<p>There are many methods to calculate the (dis)similarity information, including Euclidean and manhattan distances (Chapter @ref(clustering-distance-measures)). In R software, you can use the function <em>dist</em>() to compute the distance between every pair of object in a data set. The results of this computation is known as a distance or dissimilarity matrix.</p>
<p>By default, the function <em>dist</em>() computes the Euclidean distance between objects; however, it’s possible to indicate other metrics using the argument method. See ?dist for more information.</p>
<p>For example, consider the R base data set USArrests, you can compute the distance matrix as follow:</p>
<pre class="r"><code># Compute the dissimilarity matrix
# df = the standardized data
res.dist <- dist(df, method = "euclidean")</code></pre>
<div class="success">
<p>
Note that, the function () computes the distance between the rows of a data matrix using the specified distance measure method.
</p>
</div>
<p>To see easily the distance information between objects, we reformat the results of the function <em>dist</em>() into a matrix using the <em>as.matrix</em>() function. In this matrix, value in the cell formed by the row i, the column j, represents the distance between object i and object j in the original data set. For instance, element 1,1 represents the distance between object 1 and itself (which is zero). Element 1,2 represents the distance between object 1 and object 2, and so on.</p>
<p>The R code below displays the first 6 rows and columns of the distance matrix:</p>
<pre class="r"><code>as.matrix(res.dist)[1:6, 1:6]</code></pre>
<pre><code>##            Alabama Alaska Arizona Arkansas California Colorado
## Alabama       0.00   2.70    2.29     1.29       3.26     2.65
## Alaska        2.70   0.00    2.70     2.83       3.01     2.33
## Arizona       2.29   2.70    0.00     2.72       1.31     1.37
## Arkansas      1.29   2.83    2.72     0.00       3.76     2.83
## California    3.26   3.01    1.31     3.76       0.00     1.29
## Colorado      2.65   2.33    1.37     2.83       1.29     0.00</code></pre>
</div>
<div id="linkage" class="section level3">
<h3>Linkage</h3>
<p>The linkage function takes the distance information, returned by the function <em>dist</em>(), and groups pairs of objects into clusters based on their similarity. Next, these newly formed clusters are linked to each other to create bigger clusters. This process is iterated until all the objects in the original data set are linked together in a hierarchical tree.</p>
<p>For example, given a distance matrix “res.dist” generated by the function <em>dist</em>(), the R base function <em>hclust</em>() can be used to create the hierarchical tree.</p>
<p><em>hclust</em>() can be used as follow:</p>
<pre class="r"><code>res.hc <- hclust(d = res.dist, method = "ward.D2")</code></pre>
<ul>
<li><strong>d</strong>: a dissimilarity structure as produced by the <strong>dist()</strong> function.</li>
<li><strong>method</strong>: The agglomeration (linkage) method to be used for computing distance between clusters. Allowed values is one of “ward.D”, “ward.D2”, “single”, “complete”, “average”, “mcquitty”, “median” or “centroid”.</li>
</ul>
<p>There are many cluster agglomeration methods (i.e, linkage methods). The most common linkage methods are described below.</p>
<div class="block">
<ul>
<li>
<p>
Maximum or <em>complete linkage</em>: The distance between two clusters is defined as the maximum value of all pairwise distances between the elements in cluster 1 and the elements in cluster 2. It tends to produce more compact clusters.
</p>
</li>
<li>
<p>
Minimum or <em>single linkage</em>: The distance between two clusters is defined as the minimum value of all pairwise distances between the elements in cluster 1 and the elements in cluster 2. It tends to produce long, “loose” clusters.
</p>
</li>
<li>
<p>
Mean or <em>average linkage</em>: The distance between two clusters is defined as the average distance between the elements in cluster 1 and the elements in cluster 2.
</p>
</li>
<li>
<p>
<em>Centroid linkage</em>: The distance between two clusters is defined as the distance between the centroid for cluster 1 (a mean vector of length p variables) and the centroid for cluster 2.
</p>
</li>
<li>
<p>
<em>Ward’s minimum variance method</em>: It minimizes the total within-cluster variance. At each step the pair of clusters with minimum between-cluster distance are merged.
</p>
</li>
</ul>
</div>
<p>Note that, at each stage of the clustering process the two clusters, that have the smallest linkage distance, are linked together.</p>
<div class="success">
<p>
Complete linkage and Ward’s method are generally preferred.
</p>
</div>
</div>
<div id="dendrogram" class="section level3">
<h3>Dendrogram</h3>
<p>Dendrograms correspond to the graphical representation of the hierarchical tree generated by the function <em>hclust</em>(). Dendrogram can be produced in R using the base function <em>plot</em>(res.hc), where res.hc is the output of <em>hclust</em>(). Here, we’ll use the function <em>fviz_dend</em>()[ in <em>factoextra</em> R package] to produce a beautiful dendrogram.</p>
<p>First install factoextra by typing this: install.packages(“factoextra”); next visualize the dendrogram as follow:</p>
<pre class="r"><code># cex: label size
library("factoextra")
fviz_dend(res.hc, cex = 0.5)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/009b-agglomerative-clustering-visualize-dendrogram-1.png" width="518.4" /></p>
<p>In the dendrogram displayed above, each leaf corresponds to one object. As we move up the tree, objects that are similar to each other are combined into branches, which are themselves fused at a higher height.</p>
<p>The height of the fusion, provided on the vertical axis, indicates the (dis)similarity/distance between two objects/clusters. The higher the height of the fusion, the less similar the objects are. This height is known as the <em>cophenetic distance</em> between the two objects.</p>
<div class="notice">
<p>
Note that, conclusions about the proximity of two objects can be drawn only based on the height where branches containing those two objects first are fused. We cannot use the proximity of two objects along the horizontal axis as a criteria of their similarity.
</p>
</div>
<p>In order to identify sub-groups, we can cut the dendrogram at a certain height as described in the next sections.</p>
</div>
</div>
<div id="verify-the-cluster-tree" class="section level2">
<h2>Verify the cluster tree</h2>
<p>After linking the objects in a data set into a hierarchical cluster tree, you might want to assess that the distances (i.e., heights) in the tree reflect the original distances accurately.</p>
<p>One way to measure how well the cluster tree generated by the <em>hclust</em>() function reflects your data is to compute the correlation between the <em>cophenetic</em> distances and the original distance data generated by the <em>dist</em>() function. If the clustering is valid, the linking of objects in the cluster tree should have a strong correlation with the distances between objects in the original distance matrix.</p>
<p>The closer the value of the correlation coefficient is to 1, the more accurately the clustering solution reflects your data. Values above 0.75 are felt to be good. The “average” linkage method appears to produce high values of this statistic. This may be one reason that it is so popular.</p>
<p>The R base function <em>cophenetic</em>() can be used to compute the cophenetic distances for hierarchical clustering.</p>
<pre class="r"><code># Compute cophentic distance
res.coph <- cophenetic(res.hc)
# Correlation between cophenetic distance and
# the original distance
cor(res.dist, res.coph)</code></pre>
<pre><code>## [1] 0.698</code></pre>
<p>Execute the <em>hclust</em>() function again using the average linkage method. Next, call <em>cophenetic</em>() to evaluate the clustering solution.</p>
<pre class="r"><code>res.hc2 <- hclust(res.dist, method = "average")
cor(res.dist, cophenetic(res.hc2))</code></pre>
<pre><code>## [1] 0.718</code></pre>
<p>The correlation coefficient shows that using a different linkage method creates a tree that represents the original distances slightly better.</p>
</div>
<div id="cut-the-dendrogram-into-different-groups" class="section level2">
<h2>Cut the dendrogram into different groups</h2>
<p>One of the problems with hierarchical clustering is that, it does not tell us how many clusters there are, or where to cut the dendrogram to form clusters.</p>
<p>You can cut the hierarchical tree at a given height in order to partition your data into clusters. The R base function <em>cutree</em>() can be used to cut a tree, generated by the <em>hclust</em>() function, into several groups either by specifying the desired number of groups or the cut height. It returns a vector containing the cluster number of each observation.</p>
<pre class="r"><code># Cut tree into 4 groups
grp <- cutree(res.hc, k = 4)
head(grp, n = 4)</code></pre>
<pre><code>##  Alabama   Alaska  Arizona Arkansas 
##        1        2        2        3</code></pre>
<pre class="r"><code># Number of members in each cluster
table(grp)</code></pre>
<pre><code>## grp
##  1  2  3  4 
##  7 12 19 12</code></pre>
<pre class="r"><code># Get the names for the members of cluster 1
rownames(df)[grp == 1]</code></pre>
<pre><code>## [1] "Alabama"        "Georgia"        "Louisiana"      "Mississippi"   
## [5] "North Carolina" "South Carolina" "Tennessee"</code></pre>
<p>The result of the cuts can be visualized easily using the function <em>fviz_dend</em>() [in factoextra]:</p>
<pre class="r"><code># Cut in 4 groups and color by groups
fviz_dend(res.hc, k = 4, # Cut in four groups
          cex = 0.5, # label size
          k_colors = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"),
          color_labels_by_k = TRUE, # color labels by groups
          rect = TRUE # Add rectangle around groups
          )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/009b-agglomerative-clustering-cutree-cut-dendrogram-1.png" width="518.4" /></p>
<p>Using the function <em>fviz_cluster</em>() [in <em>factoextra</em>], we can also visualize the result in a scatter plot. Observations are represented by points in the plot, using principal components. A frame is drawn around each cluster.</p>
<pre class="r"><code>fviz_cluster(list(data = df, cluster = grp),
             palette = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"), 
             ellipse.type = "convex", # Concentration ellipse
             repel = TRUE, # Avoid label overplotting (slow)
             show.clust.cent = FALSE, ggtheme = theme_minimal())</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/cluster-analysis/009b-agglomerative-clustering-cluster-plot-1.png" width="576" /></p>
</div>
<div id="cluster-r-package" class="section level2">
<h2>Cluster R package</h2>
<p>The R package <em>cluster</em> makes it easy to perform cluster analysis in R. It provides the function <em>agnes</em>() and <em>diana</em>() for computing agglomerative and divisive clustering, respectively. These functions perform all the necessary steps for you. You don’t need to execute the <em>scale</em>(), <em>dist</em>() and <em>hclust</em>() function separately.</p>
<p>The functions can be executed as follow:</p>
<pre class="r"><code>library("cluster")
# Agglomerative Nesting (Hierarchical Clustering)
res.agnes <- agnes(x = USArrests, # data matrix
                   stand = TRUE, # Standardize the data
                   metric = "euclidean", # metric for distance matrix
                   method = "ward" # Linkage method
                   )
# DIvisive ANAlysis Clustering
res.diana <- diana(x = USArrests, # data matrix
                   stand = TRUE, # standardize the data
                   metric = "euclidean" # metric for distance matrix
                   )</code></pre>
<p>After running <em>agnes</em>() and <em>diana</em>(), you can use the function <em>fviz_dend</em>()[in <em>factoextra</em>] to visualize the output:</p>
<pre class="r"><code>fviz_dend(res.agnes, cex = 0.6, k = 4)</code></pre>
</div>
<div id="application-of-hierarchical-clustering-to-gene-expression-data-analysis" class="section level2">
<h2>Application of hierarchical clustering to gene expression data analysis</h2>
<p>In <em>gene expression data analysis</em>, <em>clustering</em> is generaly used as one of the first step to explore the data. We are interested in whether there are groups of genes or groups of samples that have similar gene expression patterns.</p>
<p>Several distance measures (Chapter @ref(clustering-distance-measures)) have been described for assessing the similarity or the dissimilarity between items, in order to decide which items have to be grouped together or not. These measures can be used to cluster genes or samples that are similar.</p>
<p>For most common clustering softwares, the default distance measure is the Euclidean distance. The most popular methods for gene expression data are to use log2(expression + 0.25), correlation distance and complete linkage clustering agglomerative-clustering.</p>
<p>Single and Complete linkage give the same dendrogram whether you use the raw data, the log of the data or any other transformation of the data that preserves the order because what matters is which ones have the smallest distance. The other methods are sensitive to the measurement scale.</p>
<div class="notice">
<p>
Note that, when the data are scaled, the Euclidean distance of the z-scores is the same as correlation distance.
</p>
<p>
Pearson’s correlation is quite sensitive to outliers. When clustering genes, it is important to be aware of the possible impact of outliers. An alternative option is to use Spearman’s correlation instead of Pearson’s correlation.
</p>
</div>
<p>In principle it is possible to cluster all the genes, although visualizing a huge dendrogram might be problematic. Usually, some type of preliminary analysis, such as differential expression analysis is used to select genes for clustering.</p>
<p>Selecting genes based on differential expression analysis removes genes which are likely to have only chance patterns. This should enhance the patterns found in the gene clusters.</p>
</div>
<div id="summary" class="section level2">
<h2>Summary</h2>
<p>Hierarchical clustering is a cluster analysis method, which produce a tree-based representation (i.e.: dendrogram) of a data. Objects in the dendrogram are linked together based on their similarity.</p>
<p>To perform hierarchical cluster analysis in R, the first step is to calculate the pairwise distance matrix using the function <em>dist</em>(). Next, the result of this computation is used by the <em>hclust</em>() function to produce the hierarchical tree. Finally, you can use the function <em>fviz_dend</em>() [in factoextra R package] to plot easily a beautiful dendrogram.</p>
<p>It’s also possible to cut the tree at a given height for partitioning the data into multiple groups (R function <em>cutree</em>()).</p>
</div>
</div><!--end rdoc-->

<!-- END HTML -->]]></description>
			<pubDate>Wed, 06 Sep 2017 02:44:00 +0200</pubDate>
			
		</item>
		
	</channel>
</rss>
