As described in previous chapters, a dendrogram is a tree-based representation of a data created using hierarchical clustering methods (Chapter @ref(agglomerative-clustering)). In this article, we provide R code for visualizing and customizing dendrograms. Additionally, we show how to save and to zoom a large dendrogram.
We start by computing hierarchical clustering using the USArrests data sets:
# Load data data(USArrests) # Compute distances and hierarchical clustering dd <- dist(scale(USArrests), method = "euclidean") hc <- hclust(dd, method = "ward.D2")
To visualize the dendrogram, we’ll use the following R functions and packages:
- fviz_dend()[in factoextra R package] to create easily a ggplot2-based beautiful dendrogram.
- dendextend package to manipulate dendrograms
Before continuing, install the required package as follow:
We’ll use the function fviz_dend()[in factoextra R package] to create easily a beautiful dendrogram using either the R base plot or ggplot2. It provides also an option for drawing circular dendrograms and phylogenic-like trees.
To create a basic dendrograms, type this:
library(factoextra) fviz_dend(hc, cex = 0.5)
You can use the arguments main, sub, xlab, ylab to change plot titles as follow:
fviz_dend(hc, cex = 0.5, main = "Dendrogram - ward.D2", xlab = "Objects", ylab = "Distance", sub = "")
To draw a horizontal dendrogram, type this:
fviz_dend(hc, cex = 0.5, horiz = TRUE)
It’s also possible to cut the tree at a given height for partitioning the data into multiple groups as described in the previous chapter: Hierarchical clustering (Chapter @ref(agglomerative-clustering)). In this case, it’s possible to color branches by groups and to add rectangle around each group.
fviz_dend(hc, k = 4, # Cut in four groups cex = 0.5, # label size k_colors = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"), color_labels_by_k = TRUE, # color labels by groups rect = TRUE, # Add rectangle around groups rect_border = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"), rect_fill = TRUE)
To change the plot theme, use the argument ggtheme, which allowed values include ggplot2 official themes [ theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void()] or any other user-defined ggplot2 themes.
fviz_dend(hc, k = 4, # Cut in four groups cex = 0.5, # label size k_colors = c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07"), color_labels_by_k = TRUE, # color labels by groups ggtheme = theme_gray() # Change theme )
Allowed values for k_color include brewer palettes from RColorBrewer Package (e.g. “RdBu”, “Blues”, “Dark2”, “Set2”, …; ) and scientific journal palettes from ggsci R package (e.g.: “npg”, “aaas”, “lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty”).
In the R code below, we’ll change group colors using “jco” (journal of clinical oncology) color palette:
fviz_dend(hc, cex = 0.5, k = 4, # Cut in four groups k_colors = "jco")
If you want to draw a horizontal dendrogram with rectangle around clusters, use this:
fviz_dend(hc, k = 4, cex = 0.4, horiz = TRUE, k_colors = "jco", rect = TRUE, rect_border = "jco", rect_fill = TRUE)
Additionally, you can plot a circular dendrogram using the option type = “circular”.
fviz_dend(hc, cex = 0.5, k = 4, k_colors = "jco", type = "circular")
To plot a phylogenic-like tree, use type = “phylogenic” and repel = TRUE (to avoid labels overplotting). This functionality requires the R package igraph. Make sure that it’s installed before typing the following R code.
require("igraph") fviz_dend(hc, k = 4, k_colors = "jco", type = "phylogenic", repel = TRUE)
The default layout for phylogenic trees is “layout.auto”. Allowed values are one of: c(“layout.auto”, “layout_with_drl”, “layout_as_tree”, “layout.gem”, “layout.mds”, “layout_with_lgl”). To read more about these layouts, read the documentation of the igraph R package.
Let’s try phylo.layout = “layout.gem”:
require("igraph") fviz_dend(hc, k = 4, # Cut in four groups k_colors = "jco", type = "phylogenic", repel = TRUE, phylo_layout = "layout.gem")
Case of dendrogram with large data sets
If you compute hierarchical clustering on a large data set, you might want to zoom in the dendrogram or to plot only a subset of the dendrogram.
Alternatively, you could also plot the dendrogram to a large page on a PDF, which can be zoomed without loss of resolution.
Zooming in the dendrogram
If you want to zoom in the first clusters, its possible to use the option xlim and ylim to limit the plot area. For example, type the code below:
fviz_dend(hc, xlim = c(1, 20), ylim = c(1, 8))
Plotting a sub-tree of dendrograms
To plot a sub-tree, we’ll follow the procedure below:
Create the whole dendrogram using fviz_dend() and save the result into an object, named dend_plot for example.
Use the R base function cut.dendrogram() to cut the dendrogram, at a given height (h), into multiple sub-trees. This returns a list with components $upper and $lower, the first is a truncated version of the original tree, also of class dendrogram, the latter a list with the branches obtained from cutting the tree, each a dendrogram.
Visualize sub-trees using fviz_dend().
The R code is as follow.
- Cut the dendrogram and visualize the truncated version:
# Create a plot of the whole dendrogram, # and extract the dendrogram data dend_plot <- fviz_dend(hc, k = 4, # Cut in four groups cex = 0.5, # label size k_colors = "jco" ) dend_data <- attr(dend_plot, "dendrogram") # Extract dendrogram data # Cut the dendrogram at height h = 10 dend_cuts <- cut(dend_data, h = 10) # Visualize the truncated version containing # two branches fviz_dend(dend_cuts$upper)
- Plot dendrograms sub-trees:
# Plot the whole dendrogram print(dend_plot)
# Plot subtree 1 fviz_dend(dend_cuts$lower[], main = "Subtree 1") # Plot subtree 2 fviz_dend(dend_cuts$lower[], main = "Subtree 2")
You can also plot circular trees as follow:
fviz_dend(dend_cuts$lower[], type = "circular")
Saving dendrogram into a large PDF page
If you have a large dendrogram, you can save it to a large PDF page, which can be zoomed without loss of resolution.
pdf("dendrogram.pdf", width=30, height=15) # Open a PDF p <- fviz_dend(hc, k = 4, cex = 1, k_colors = "jco" ) # Do plotting print(p) dev.off() # Close the PDF
Manipulating dendrograms using dendextend
The package dendextend provide functions for changing easily the appearance of a dendrogram and for comparing dendrograms.
In this section we’ll use the chaining operator (%>%) to simplify our code. The chaining operator turns x %>% f(y) into f(x, y) so you can use it to rewrite multiple operations such that they can be read from left-to-right, top-to-bottom. For instance, the results of the two R codes below are equivalent.
- Standard R code for creating a dendrogram:
data <- scale(USArrests) dist.res <- dist(data) hc <- hclust(dist.res, method = "ward.D2") dend <- as.dendrogram(hc) plot(dend)
- R code for creating a dendrogram using chaining operator:
library(dendextend) dend <- USArrests[1:5,] %>% # data scale %>% # Scale the data dist %>% # calculate a distance matrix, hclust(method = "ward.D2") %>% # Hierarchical clustering as.dendrogram # Turn the object into a dendrogram. plot(dend)
- Functions to customize dendrograms: The function set() [in dendextend package] can be used to change the parameters of a dendrogram. The format is:
set(object, what, value)
- object: a dendrogram object
- what: a character indicating what is the property of the tree that should be set/updated
- value: a vector with the value to set in the tree (the type of the value depends on the “what”).
Possible values for the argument what include:
|Value for the argument what||Description|
|labels||set the labels|
|labels_colors and labels_cex||Set the color and the size of labels, respectively|
|leaves_pch, leaves_cex and leaves_col||set the point type, size and color for leaves, respectively|
|nodes_pch, nodes_cex and nodes_col||set the point type, size and color for nodes, respectively|
|hang_leaves||hang the leaves|
|branches_k_color||color the branches|
|branches_col, branches_lwd , branches_lty||Set the color, the line width and the line type of branches, respectively|
|by_labels_branches_col, by_labels_branches_lwd and by_labels_branches_lty||Set the color, the line width and the line type of branches with specific labels, respectively|
|clear_branches and clear_leaves||Clear branches and leaves, respectively|
library(dendextend) # 1. Create a customized dendrogram mycols <- c("#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07") dend <- as.dendrogram(hc) %>% set("branches_lwd", 1) %>% # Branches line width set("branches_k_color", mycols, k = 4) %>% # Color branches by groups set("labels_colors", mycols, k = 4) %>% # Color labels by groups set("labels_cex", 0.5) # Change label size # 2. Create plot fviz_dend(dend)
We described functions and packages for visualizing and customizing dendrograms including:
- fviz_dend() [in factoextra R package], which provides convenient solutions for plotting easily a beautiful dendrogram. It can be used to create rectangular and circular dendrograms, as well as, a phylogenic tree.
- and the dendextend package, which provides a flexible methods to customize dendrograms.
Additionally, we described how to plot a subset of large dendrograms.