<?xml version="1.0" encoding="UTF-8" ?>
<!-- RSS generated by PHPBoost on Sun, 03 May 2026 02:48:46 +0200 -->

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Last articles - STHDA : R Graphics Essentials]]></title>
		<atom:link href="https://www.sthda.com/english/syndication/rss/articles/32" rel="self" type="application/rss+xml"/>
		<link>https://www.sthda.com</link>
		<description><![CDATA[Last articles - STHDA : R Graphics Essentials]]></description>
		<copyright>(C) 2005-2026 PHPBoost</copyright>
		<language>en</language>
		<generator>PHPBoost</generator>
		
		
		<item>
			<title><![CDATA[R Basics for Data Visualization]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/134-r-basics-for-data-visualization/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/134-r-basics-for-data-visualization/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">
<p>R is a free and powerful statistical software for analyzing and visualizing data.</p>
<p>In this chapter, you’ll learn:</p>
<ul>
<li>the basics of R programming for importing and manipulating your data:
<ul>
<li>filtering and ordering rows,</li>
<li>renaming and adding columns,</li>
<li>computing summary statistics</li>
</ul></li>
<li>R graphics systems and packages for data visualization:
<ul>
<li>R traditional base plots</li>
<li>Lattice plotting system that aims to improve on R base graphics</li>
<li>ggplot2 package, a powerful and a flexible R package, for producing elegant graphics piece by piece.</li>
<li>ggpubr package, which facilitates the creation of beautiful ggplot2-based graphs for researcher with non-advanced programming backgrounds.</li>
<li>ggformula package, an extension of ggplot2, based on formula interfaces (much like the lattice interface)</li>
</ul></li>
</ul>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#install-r-and-rstudio">Install R and RStudio</a></li>
<li><a href="#install-and-load-required-r-packages">Install and load required R packages</a></li>
<li><a href="#data-format">Data format</a></li>
<li><a href="#import-your-data-in-r">Import your data in R</a></li>
<li><a href="#demo-data-sets">Demo data sets</a></li>
<li><a href="#data-manipulation">Data manipulation</a></li>
<li><a href="#r-graphics-systems">R graphics systems</a><ul>
<li><a href="#r-base-graphs">R base graphs</a></li>
<li><a href="#lattice-graphics">Lattice graphics</a></li>
<li><a href="#ggplot2-graphics">ggplot2 graphics</a></li>
<li><a href="#ggpubr-for-publication-ready-plots">ggpubr for publication ready plots</a></li>
</ul></li>
<li><a href="#export-r-graphics">Export R graphics</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>
<div id="install-r-and-rstudio" class="section level2">
<h2>Install R and RStudio</h2>
<p>RStudio is an integrated development environment for R that makes using R easier. R and RStudio can be installed on Windows, MAC OSX and Linux platforms.</p>
<ol style="list-style-type: decimal">
<li>R can be downloaded and installed from the Comprehensive R Archive Network (CRAN) webpage (<a href="http://cran.r-project.org/" class="uri">http://cran.r-project.org/</a>)</li>
<li>After installing R software, install also the RStudio software available at: <a href="http://www.rstudio.com/products/RStudio/" class="uri">http://www.rstudio.com/products/RStudio/</a>.</li>
<li>Launch RStudio and start use R inside R studio.</li>
</ol>
</div>
<div id="install-and-load-required-r-packages" class="section level2">
<h2>Install and load required R packages</h2>
<p>An R package is a collection of functionalities that extends the capabilities of base R. To use the R code provide in this book, you should install the following R packages:</p>
<ul>
<li><code>tidyverse</code> packages, which are a collection of R packages that share the same programming philosophy. These packages include:
<ul>
<li><code>readr</code>: for importing data into R</li>
<li><code>dplyr</code>: for data manipulation</li>
<li><code>ggplot2</code> and <code>ggpubr</code> for data visualization.</li>
</ul></li>
<li><code>ggpubr</code> package, which makes it easy, for beginner, to create publication ready plots.</li>
</ul>
<ol style="list-style-type: decimal">
<li><strong>Install the tidyverse package</strong>. Installing tidyverse will install automatically readr, dplyr, ggplot2 and more. Type the following code in the R console:</li>
</ol>
<pre class="r"><code>install.packages("tidyverse")</code></pre>
<ol start="2" style="list-style-type: decimal">
<li><strong>Install the ggpubr package</strong>.</li>
</ol>
<ul>
<li>We recommend to install the latest developmental version of ggpubr as follow:</li>
</ul>
<pre class="r"><code>if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")</code></pre>
<ul>
<li>If the above R code fails, you can install the latest stable version on CRAN:</li>
</ul>
<pre class="r"><code>install.packages("ggpubr")</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><strong>Load required packages</strong>. After installation, you must first load the package for using the functions in the package. The function <code>library()</code> is used for this task. An alternative function is <code>require()</code>. For example, to load ggplot2 and ggpubr packages, type this:</li>
</ol>
<pre class="r"><code>library("ggplot2")
library("ggpubr")</code></pre>
<p>Now, we can use R functions, such as <em>ggscatter</em>() [in the ggpubr package] for creating a scatter plot.</p>
<p>If you want to learn more about a given function, say ggscatter(), type this in R console: <code>?ggscatter</code>.</p>
</div>
<div id="data-format" class="section level2">
<h2>Data format</h2>
<p>Your data should be in rectangular format, where columns are variables and rows are observations (individuals or samples).</p>
<ul>
<li><p>Column names should be compatible with R naming conventions. Avoid column with blank space and special characters. Good column names: <code>long_jump</code> or <code>long.jump</code>. Bad column name: <code>long jump</code>.</p></li>
<li><p>Avoid beginning column names with a number. Use letter instead. Good column names: <code>sport_100m</code> or <code>x100m</code>. Bad column name: <code>100m</code>.</p></li>
<li><p>Replace missing values by <code>NA</code> (for not available)</p></li>
</ul>
<p>For example, your data should look like this:</p>
<pre><code>  manufacturer model displ year cyl      trans drv
1         audi    a4   1.8 1999   4   auto(l5)   f
2         audi    a4   1.8 1999   4 manual(m5)   f
3         audi    a4   2.0 2008   4 manual(m6)   f
4         audi    a4   2.0 2008   4   auto(av)   f</code></pre>
<p>Read more at: <a href="https://www.sthda.com/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r">Best Practices in Preparing Data Files for Importing into R</a></p>
</div>
<div id="import-your-data-in-r" class="section level2">
<h2>Import your data in R</h2>
<p>First, save your data into txt or csv file formats and import it as follow (you will be asked to choose the file):</p>
<pre class="r"><code>library("readr")
# Reads tab delimited files (.txt tab)
my_data <- read_tsv(file.choose())
# Reads comma (,) delimited files (.csv)
my_data <- read_csv(file.choose())
# Reads semicolon(;) separated files(.csv)
my_data <- read_csv2(file.choose())</code></pre>
<p>Read more about how to import data into R at this link: <a href="https://www.sthda.com/english/wiki/importing-data-into-r" class="uri">https://www.sthda.com/english/wiki/importing-data-into-r</a></p>
</div>
<div id="demo-data-sets" class="section level2">
<h2>Demo data sets</h2>
<p>R comes with several demo data sets for playing with R functions. The most used R demo data sets include: <strong>USArrests</strong>, <strong>iris</strong> and <strong>mtcars</strong>. To load a demo data set, use the function <strong>data</strong>() as follow. The function <code>head()</code> is used to inspect the data.</p>
<pre class="r"><code>data("iris")   # Loading
head(iris, n = 3)  # Print the first n = 3 rows</code></pre>
<pre><code>##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa</code></pre>
<p>To learn more about iris data sets, type this:</p>
<pre class="r"><code>?iris</code></pre>
<p>After typing the above R code, you will see the description of <code>iris</code> data set: this iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.</p>
</div>
<div id="data-manipulation" class="section level2">
<h2>Data manipulation</h2>
<p>After importing your data in R, you can easily manipulate it using the <code>dplyr</code> package <span class="citation">(Wickham et al. 2017)</span>, which can be installed using the R code: <code>install.packages("dplyr")</code>.</p>
<p>After loading dplyr, you can use the following R functions:</p>
<ul>
<li><code>filter()</code>: Pick rows (observations/samples) based on their values.</li>
<li><code>distinct()</code>: Remove duplicate rows.</li>
<li><code>arrange()</code>: Reorder the rows.</li>
<li><code>select()</code>: Select columns (variables) by their names.</li>
<li><code>rename()</code>: Rename columns.</li>
<li><code>mutate()</code>: Add/create new variables.</li>
<li><code>summarise()</code>: Compute statistical summaries (e.g., computing the mean or the sum)</li>
<li><code>group_by()</code>: Operate on subsets of the data set.</li>
</ul>
<div class="success">
<p>
Note that, dplyr package allows to use the forward-pipe chaining operator (%>%) for combining multiple operations. For example, x %>% f is equivalent to f(x). Using the pipe (%>%), the output of each operation is passed to the next operation. This makes R programming easy.
</p>
</div>
<p>We’ll show you how these functions work in the different chapters of this book.</p>
</div>
<div id="r-graphics-systems" class="section level2">
<h2>R graphics systems</h2>
<p>There are different <a href="https://www.sthda.com/english/wiki/data-visualization">graphic packages available in R</a> for visualizing your data: 1) R base graphs, 2) Lattice Graphs <span class="citation">(Sarkar 2016)</span> and 3) ggplot2 <span class="citation">(Wickham and Chang 2017)</span>.</p>
<p>In this section, we start by providing a quick overview of R base and lattice plots, and then we move to ggplot2 graphic system. The vast majority of plots generated in this book is based on the modern and flexible <strong>ggplot2</strong> R package.</p>
<div id="r-base-graphs" class="section level3">
<h3>R base graphs</h3>
<p>R comes with simple functions to create many types of graphs. For example:</p>
<table>
<thead>
<tr class="header">
<th>Plot Types</th>
<th>R base function</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Scatter plot</td>
<td>plot()</td>
</tr>
<tr class="even">
<td>Scatter plot matrix</td>
<td>pairs()</td>
</tr>
<tr class="odd">
<td>Box plot</td>
<td>boxplot()</td>
</tr>
<tr class="even">
<td>Strip chart</td>
<td>stripchart()</td>
</tr>
<tr class="odd">
<td>Histogram plot</td>
<td>hist()</td>
</tr>
<tr class="even">
<td>density plot</td>
<td>density()</td>
</tr>
<tr class="odd">
<td>Bar plot</td>
<td>barplot()</td>
</tr>
<tr class="even">
<td>Line plot</td>
<td>plot() and line()</td>
</tr>
<tr class="odd">
<td>Pie charts</td>
<td>pie()</td>
</tr>
<tr class="even">
<td>Dot charts</td>
<td>dotchart()</td>
</tr>
<tr class="odd">
<td>Add text to a plot</td>
<td>text()</td>
</tr>
</tbody>
</table>
<p>In the most cases, you can use the following arguments to customize the plot:</p>
<ul>
<li><code>pch</code>: change point shapes. Allowed values comprise number from 1 to 25.</li>
<li><code>cex</code>: change point size. Example: <code>cex = 0.8</code>.</li>
<li><code>col</code>: change point color. Example: col = “blue”.</li>
<li><code>frame</code>: logical value. <code>frame = FALSE</code> removes the plot panel border frame.</li>
<li><code>main</code>, <code>xlab</code>, <code>ylab</code>. Specify the main title and the x/y axis labels -, respectively</li>
<li><code>las</code>: For a vertical x axis text, use <code>las = 2</code>.</li>
</ul>
<p>In the following R code, we’ll use the iris data set to create a:</p>
<ul>
<li><ol style="list-style-type: decimal">
<li>Scatter plot of Sepal.Length (on x-axis) and Sepal.Width (on y-axis).</li>
</ol></li>
<li><ol start="2" style="list-style-type: decimal">
<li>Box plot of Sepal.length (y-axis) by Species (x-axis)</li>
</ol></li>
</ul>
<pre class="r"><code># (1) Create a scatter lot
plot(
  x = iris$Sepal.Length, y = iris$Sepal.Width,
  pch = 19, cex = 0.8, frame = FALSE,
  xlab = "Sepal Length",ylab = "Sepal Width"
  )
# (2) Create a box plot
boxplot(Sepal.Length ~ Species, data = iris,
        ylab = "Sepal.Length", 
        frame = FALSE, col = "lightgray")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-r-base-graphics-examples-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-r-base-graphics-examples-2.png" width="316.8" /></p>
<div class="success">
<p>
Read more examples at: R base Graphics on STHDA, <a href="https://www.sthda.com/english/wiki/r-base-graphs" class="uri">https://www.sthda.com/english/wiki/r-base-graphs</a>
</p>
</div>
</div>
<div id="lattice-graphics" class="section level3">
<h3>Lattice graphics</h3>
<p>The <strong>lattice</strong> R package provides a plotting system that aims to improve on R base graphs. After installing the package, whith the R command <code>install.packages("lattice")</code>, you can test the following functions.</p>
<ul>
<li>Main functions in the lattice package:</li>
</ul>
<table>
<thead>
<tr class="header">
<th>Plot types</th>
<th>Lattice functions</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Scatter plot</td>
<td>xyplot()</td>
</tr>
<tr class="even">
<td>Scatter plot matrix</td>
<td>splom()</td>
</tr>
<tr class="odd">
<td>3D scatter plot</td>
<td>cloud()</td>
</tr>
<tr class="even">
<td>Box plot</td>
<td>bwplot()</td>
</tr>
<tr class="odd">
<td>strip plots (1-D scatter plots)</td>
<td>stripplot()</td>
</tr>
<tr class="even">
<td>Dot plot</td>
<td>dotplot()</td>
</tr>
<tr class="odd">
<td>Bar chart</td>
<td>barchart()</td>
</tr>
<tr class="even">
<td>Histogram</td>
<td>histogram()</td>
</tr>
<tr class="odd">
<td>Density plot</td>
<td>densityplot()</td>
</tr>
<tr class="even">
<td>Theoretical quantile plot</td>
<td>qqmath()</td>
</tr>
<tr class="odd">
<td>Two-sample quantile plot</td>
<td>qq()</td>
</tr>
<tr class="even">
<td>3D contour plot of surfaces</td>
<td>contourplot()</td>
</tr>
<tr class="odd">
<td>False color level plot of surfaces</td>
<td>levelplot()</td>
</tr>
<tr class="even">
<td>Parallel coordinates plot</td>
<td>parallel()</td>
</tr>
<tr class="odd">
<td>3D wireframe graph</td>
<td>wireframe()</td>
</tr>
</tbody>
</table>
<div class="warning">
<p>
The lattice package uses formula interface. For example, in lattice terminology, the formula y ~ x | group, means that we want to plot the y variable according to the x variable, splitting the plot into multiple panels by the variable group.
</p>
</div>
<ul>
<li><strong>Create a basic scatter plot of y by x</strong>. Syntax: <code>y ~ x</code>. Change the color by groups and use <code>auto.key = TRUE</code> to show legends:</li>
</ul>
<pre class="r"><code>library("lattice")
xyplot(
  Sepal.Length ~ Petal.Length, group = Species, 
  data = iris, auto.key = TRUE, pch = 19, cex = 0.5
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-lattice-scatter-plot-1.png" width="288" /></p>
<ul>
<li><strong>Multiple panel plots by groups</strong>. Syntax: <code>y ~ x | group</code>.</li>
</ul>
<pre class="r"><code>xyplot(
  Sepal.Length ~ Petal.Length | Species, 
  layout = c(3, 1),               # panel with ncol = 3 and nrow = 1
  group = Species, data = iris,
  type = c("p", "smooth"),        # Show points and smoothed line
  scales = "free"                 # Make panels axis scales independent
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-lattice-scatter-plot-multiple-panels-1.png" width="576" /></p>
<div class="success">
<p>
Read more examples at: <a href="https://www.sthda.com/english/wiki/lattice-graphs">Lattice Graphics on STHDA</a>
</p>
</div>
</div>
<div id="ggplot2-graphics" class="section level3">
<h3>ggplot2 graphics</h3>
<p><strong>GGPlot2</strong> is a powerful and a flexible R package, implemented by Hadley Wickham, for producing elegant graphics piece by piece. The <strong>gg</strong> in ggplot2 means <em>Grammar of Graphics</em>, a graphic concept which describes plots by using a “grammar”. According to the ggplot2 concept, a plot can be divided into different fundamental parts: <strong>Plot = data + Aesthetics + Geometry</strong></p>
<ul>
<li><strong>data</strong>: a data frame</li>
<li><strong>aesthetics</strong>: used to indicate the <strong>x</strong> and <strong>y</strong> variables. It can be also used to control the <strong>color</strong>, the <strong>size</strong> and the <strong>shape</strong> of points, etc…..</li>
<li><strong>geometry</strong>: corresponds to the type of graphics (histogram, box plot, line plot, ….)</li>
</ul>
<div class="warning">
<p>
The ggplot2 syntax might seem opaque for beginners, but once you understand the basics, you can create and customize any kind of plots you want.
</p>
<p>
Note that, to reduce this opacity, we recently created an R package, named <strong>ggpubr</strong> (ggplot2 Based Publication Ready Plots), for making ggplot simpler for students and researchers with non-advanced programming backgrounds. We’ll present ggpubr in the next section.
</p>
</div>
<p>After installing and loading the ggplot2 package, you can use the following key functions:</p>
<table>
<thead>
<tr class="header">
<th>Plot types</th>
<th>GGPlot2 functions</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Initialize a ggplot</td>
<td>ggplot()</td>
</tr>
<tr class="even">
<td>Scatter plot</td>
<td>geom_point()</td>
</tr>
<tr class="odd">
<td>Box plot</td>
<td>geom_boxplot()</td>
</tr>
<tr class="even">
<td>Violin plot</td>
<td>geom_violin()</td>
</tr>
<tr class="odd">
<td>strip chart</td>
<td>geom_jitter()</td>
</tr>
<tr class="even">
<td>Dot plot</td>
<td>geom_dotplot()</td>
</tr>
<tr class="odd">
<td>Bar chart</td>
<td>geom_bar()</td>
</tr>
<tr class="even">
<td>Line plot</td>
<td>geom_line()</td>
</tr>
<tr class="odd">
<td>Histogram</td>
<td>geom_histogram()</td>
</tr>
<tr class="even">
<td>Density plot</td>
<td>geom_density()</td>
</tr>
<tr class="odd">
<td>Error bars</td>
<td>geom_errorbar()</td>
</tr>
<tr class="even">
<td>QQ plot</td>
<td>stat_qq()</td>
</tr>
<tr class="odd">
<td>ECDF plot</td>
<td>stat_ecdf()</td>
</tr>
<tr class="even">
<td>Title and axis labels</td>
<td>labs()</td>
</tr>
</tbody>
</table>
<p>The main function in the ggplot2 package is <code>ggplot()</code>, which can be used to initialize the plotting system with data and x/y variables.</p>
<p>For example, the following R code takes the <code>iris</code> data set to initialize the ggplot and then a layer (<code>geom_point()</code>) is added onto the ggplot to create a scatter plot of <code>x = Sepal.Length</code> by <code>y = Sepal.Width</code>:</p>
<pre class="r"><code>library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width))+
  geom_point()
# Change point size, color and shape
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width))+
  geom_point(size = 1.2, color = "steelblue", shape = 21)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-scatter-plot-1.png" width="288" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-scatter-plot-2.png" width="288" /></p>
<p>Note that, in the code above, the shape of points is specified as number. To display the different point shape available in R, type this:</p>
<pre class="r"><code>ggpubr::show_point_shapes()</code></pre>
<p>It’s also possible to control points shape and color by a grouping variable (here, <code>Species</code>). For example, in the code below, we map points color and shape to the <code>Species</code> grouping variable.</p>
<pre class="r"><code># Control points color by groups
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width))+
  geom_point(aes(color = Species, shape = Species))
# Change the default color manually.
# Use the scale_color_manual() function
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width))+
  geom_point(aes(color = Species, shape = Species))+
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-aesthetic-mapping-control-points-color-shape-and-size-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-aesthetic-mapping-control-points-color-shape-and-size-2.png" width="316.8" /></p>
<p>You can also split the plot into multiple panels according to a grouping variable. R function: <code>facet_wrap()</code>. Another interesting feature of ggplot2, is the possibility to combine multiple layers on the same plot. For example, with the following R code, we’ll:</p>
<ul>
<li>Add points with <code>geom_point()</code>, colored by groups.</li>
<li>Add the fitted smoothed regression line using <code>geom_smooth()</code>. By default the function <code>geom_smooth()</code> add the regression line and the confidence area. You can control the line color and confidence area fill color by groups.</li>
<li>Facet the plot into multiple panels by groups</li>
<li>Change color and fill manually using the function <code>scale_color_manual()</code> and <code>scale_fill_manual()</code></li>
</ul>
<pre class="r"><code>ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width))+
  geom_point(aes(color = Species))+               
  geom_smooth(aes(color = Species, fill = Species))+
  facet_wrap(~Species, ncol = 3, nrow = 1)+
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-scatter-plot-with-regression-line-1.png" width="624" /></p>
<p>Note that, the default theme of ggplots is <code>theme_gray()</code> (or <code>theme_grey()</code>), which is theme with grey background and white grid lines. More themes are available for professional presentations or publications. These include: <code>theme_bw()</code>, <code>theme_classic()</code> and <code>theme_minimal()</code>.</p>
<p>To change the theme of a given ggplot (p), use this: <code>p + theme_classic()</code>. To change the default theme to <code>theme_classic()</code> for all the future ggplots during your entire R session, type the following R code:</p>
<pre class="r"><code>theme_set(
  theme_classic()
)</code></pre>
<p>Now you can create ggplots with <code>theme_classic()</code> as default theme:</p>
<pre class="r"><code>ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width))+
  geom_point()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-examples-of-plots-1.png" width="288" /></p>
</div>
<div id="ggpubr-for-publication-ready-plots" class="section level3">
<h3>ggpubr for publication ready plots</h3>
<p>The <strong>ggpubr</strong> R package facilitates the creation of beautiful ggplot2-based graphs for researcher with non-advanced programming backgrounds <span class="citation">(Kassambara 2017)</span>.</p>
<p>For example, to create the density distribution of “Sepal.Length”, colored by groups (“Species”), type this:</p>
<pre class="r"><code>library(ggpubr)
# Density plot with mean lines and marginal rug
ggdensity(iris, x = "Sepal.Length",
   add = "mean", rug = TRUE,             # Add mean line and marginal rugs
   color = "Species", fill = "Species",  # Color by groups
   palette = "jco")                      # use jco journal color palette</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-density-plot-1.png" width="288" /></p>
<div class="notice">
<p>
Note that the argument <code>palette</code> can take also a custom color palette. For example <code>palette= c(“#00AFBB”, “#E7B800”, “#FC4E07”)</code>.
</p>
</div>
<ul>
<li>Create a box plot with p-values comparing groups:</li>
</ul>
<pre class="r"><code># Groups that we want to compare
my_comparisons <- list(
  c("setosa", "versicolor"), c("versicolor", "virginica"),
  c("setosa", "virginica")
)
# Create the box plot. Change colors by groups: Species
# Add jitter points and change the shape by groups
ggboxplot(
  iris, x = "Species", y = "Sepal.Length",
  color = "Species", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
  add = "jitter"
  )+
  stat_compare_means(comparisons = my_comparisons, method = "t.test")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/001-r-basics-for-data-visualization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-box-plot-with-strip-charts-and-p-values-1.png" width="384" /></p>
<div class="success">
<p>
Learn more on STHDA at: <a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/">ggpubr: Publication Ready Plots</a>
</p>
</div>
</div>
</div>
<div id="export-r-graphics" class="section level2">
<h2>Export R graphics</h2>
<p>You can export R graphics to many file formats, including: PDF, PostScript, SVG vector files, Windows MetaFile (WMF), PNG, TIFF, JPEG, etc.</p>
<p>The standard procedure to save any graphics from R is as follow:</p>
<ol style="list-style-type: decimal">
<li><strong>Open a graphic device</strong> using one of the following functions:</li>
</ol>
<ul>
<li>pdf(“r-graphics.pdf”),</li>
<li>postscript(“r-graphics.ps”),</li>
<li>svg(“r-graphics.svg”),</li>
<li>png(“r-graphics.png”),</li>
<li>tiff(“r-graphics.tiff”),</li>
<li>jpeg(“r-graphics.jpg”),</li>
<li>win.metafile(“r-graphics.wmf”),</li>
<li>and so on.</li>
</ul>
<p>Additional arguments indicating the width and the height (in inches) of the graphics region can be also specified in the mentioned function.</p>
<ol start="2" style="list-style-type: decimal">
<li><p><strong>Create a plot</strong></p></li>
<li><p><strong>Close the graphic device</strong> using the function <code>dev.off()</code></p></li>
</ol>
<p>For example, you can export R base plots to a pdf file as follow:</p>
<pre class="r"><code>pdf("r-base-plot.pdf") 
# Plot 1 --> in the first page of PDF
plot(x = iris$Sepal.Length, y = iris$Sepal.Width)
# Plot 2 ---> in the second page of the PDF
hist(iris$Sepal.Length)
dev.off()</code></pre>
<p>To export ggplot2 graphs, the R code looks like this:</p>
<pre class="r"><code># Create some plots
library(ggplot2)
myplot1 <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) + 
  geom_point()
myplot2 <- ggplot(iris, aes(Species, Sepal.Length)) + 
  geom_boxplot()
# Print plots to a pdf file
pdf("ggplot.pdf")
print(myplot1)     # Plot 1 --> in the first page of PDF
print(myplot2)     # Plot 2 ---> in the second page of the PDF
dev.off() </code></pre>
<p>Note that for a ggplot, you can also use the following functions to export the graphic:</p>
<ul>
<li><code>ggsave()</code> [in ggplot2]. Makes it easy to save a ggplot. It guesses the type of graphics device from the file extension.</li>
<li><code>ggexport()</code> [in ggpubr]. Makes it easy to arrange and export multiple ggplots at once.</li>
</ul>
<div class="success">
<p>
See also the following blog post to <a href="https://www.sthda.com/english/wiki/saving-high-resolution-ggplots-how-to-preserve-semi-transparency">save high-resolution ggplots</a>
</p>
</div>
</div>
<div id="references" class="section level2 unnumbered">
<h2>References</h2>
<div id="refs" class="references">
<div id="ref-R-ggpubr">
<p>Kassambara, Alboukadel. 2017. <em>Ggpubr: ’Ggplot2’ Based Publication Ready Plots</em>. <a href="https://www.sthda.com/english/rpkgs/ggpubr" class="uri">https://www.sthda.com/english/rpkgs/ggpubr</a>.</p>
</div>
<div id="ref-R-lattice">
<p>Sarkar, Deepayan. 2016. <em>Lattice: Trellis Graphics for R</em>. <a href="https://CRAN.R-project.org/package=lattice" class="uri">https://CRAN.R-project.org/package=lattice</a>.</p>
</div>
<div id="ref-R-ggplot2">
<p>Wickham, Hadley, and Winston Chang. 2017. <em>Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics</em>.</p>
</div>
<div id="ref-R-dplyr">
<p>Wickham, Hadley, Romain Francois, Lionel Henry, and Kirill Müller. 2017. <em>Dplyr: A Grammar of Data Manipulation</em>. <a href="https://CRAN.R-project.org/package=dplyr" class="uri">https://CRAN.R-project.org/package=dplyr</a>.</p>
</div>
</div>
</div>
</div><!--end rdoc-->

<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 21:42:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Plot One Variable: Frequency Graph, Density Distribution and More]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/133-plot-one-variable-frequency-graph-density-distribution-and-more/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/133-plot-one-variable-frequency-graph-density-distribution-and-more/</guid>
			<description><![CDATA[<!-- START HTML -->


  <div id="rdoc">

<p>To visualize one variable, the type of graphs to use depends on the type of the variable:</p>
<ul>
<li>For <strong>categorical variables</strong> (or grouping variables). You can visualize the count of categories using a <strong>bar plot</strong> or using a <strong>pie chart</strong> to show the proportion of each category.</li>
<li>For <strong>continuous variable</strong>, you can visualize the distribution of the variable using <strong>density plots</strong>, <strong>histograms</strong> and alternatives.</li>
</ul>
<p>In this R graphics tutorial, you’ll learn how to:</p>
<ul>
<li>Visualize the frequency distribution of a categorical variable using bar plots, dot charts and pie charts</li>
<li>Visualize the distribution of a continuous variable using:
<ul>
<li>density and histogram plots,</li>
<li>other alternatives, such as frequency polygon, area plots, dot plots, box plots, Empirical cumulative distribution function (ECDF) and Quantile-quantile plot (QQ plots).</li>
<li>Density ridgeline plots, which are useful for visualizing changes in distributions, of a continuous variable, over time or space.</li>
<li>Bar plot and modern alternatives, including lollipop charts and cleveland’s dot plots.</li>
</ul></li>
</ul>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#one-categorical-variable">One categorical variable</a><ul>
<li><a href="#bar-plot-of-counts">Bar plot of counts</a></li>
<li><a href="#pie-charts">Pie charts</a></li>
<li><a href="#dot-charts">Dot charts</a></li>
</ul></li>
<li><a href="#one-continuous-variable">One continuous variable</a><ul>
<li><a href="#data-format">Data format</a></li>
<li><a href="#basic-plots">Basic plots</a></li>
<li><a href="#density-plots">Density plots</a></li>
<li><a href="#histogram-plots">Histogram plots</a></li>
<li><a href="#alternative-to-density-and-histogram-plots">Alternative to density and histogram plots</a></li>
<li><a href="#density-ridgeline-plots">Density ridgeline plots</a></li>
<li><a href="#bar-plot-and-modern-alternatives">Bar plot and modern alternatives</a></li>
</ul></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="prerequisites" class="section level2">
<h2>Prerequisites</h2>
<p>Load required packages and set the theme function <code>theme_pubr()</code> [in ggpubr] as the default theme:</p>
<pre class="r"><code>library(ggplot2)
library(ggpubr)
theme_set(theme_pubr())</code></pre>
</div>
<div id="one-categorical-variable" class="section level2">
<h2>One categorical variable</h2>
<div id="bar-plot-of-counts" class="section level3">
<h3>Bar plot of counts</h3>

<ul>
<li>Plot types: Bar plot of the count of group levels</li>
<li>Key function: <code>geom_bar()</code></li>
<li>Key arguments: <code>alpha</code>, <code>color</code>, <code>fill</code>, <code>linetype</code> and <code>size</code></li>
</ul>
<p>Demo data set: <code>diamonds</code> [in ggplot2]. Contains the prices and other attributes of almost 54000 diamonds. The column <code>cut</code> contains the quality of the diamonds cut (Fair, Good, Very Good, Premium, Ideal).</p>
<p>The R code below creates a bar plot visualizing the number of elements in each category of diamonds cut.</p>
<pre class="r"><code>ggplot(diamonds, aes(cut)) +
  geom_bar(fill = "#0073C2FF") +
  theme_pubclean()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-frequency-graph-using-geom_bar-discrete-variable-1.png" width="451.2" /></p>
<p>Compute the frequency of each category and add the labels on the bar plot:</p>
<ul>
<li><code>dplyr</code> package used to summarise the data</li>
<li><code>geom_bar()</code> with option <code>stat = "identity"</code> is used to create the bar plot of the summary output as it is.</li>
<li><code>geom_text()</code> used to add text labels. Adjust the position of the labels by using <code>hjust</code> (horizontal justification) and <code>vjust</code> (vertical justification). Values should be in [0, 1].</li>
</ul>
<pre class="r"><code># Compute the frequency
library(dplyr)
df <- diamonds %>%
  group_by(cut) %>%
  summarise(counts = n())
df</code></pre>
<pre><code>## # A tibble: 5 x 2
##         cut counts
##       <ord>  <int>
## 1      Fair   1610
## 2      Good   4906
## 3 Very Good  12082
## 4   Premium  13791
## 5     Ideal  21551</code></pre>
<pre class="r"><code># Create the bar plot. Use theme_pubclean() [in ggpubr]
ggplot(df, aes(x = cut, y = counts)) +
  geom_bar(fill = "#0073C2FF", stat = "identity") +
  geom_text(aes(label = counts), vjust = -0.3) + 
  theme_pubclean()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-annotated-frequency-bar-plot-1.png" width="451.2" /></p>
</div>
<div id="pie-charts" class="section level3">
<h3>Pie charts</h3>
<p>Pie chart is just a stacked bar chart in polar coordinates. </p>
<p>First,</p>
<ul>
<li>Arrange the grouping variable (<code>cut</code>) in descending order. This important to compute the y coordinates of labels.</li>
<li>compute the proportion (counts/total) of each category</li>
<li>compute the position of the text labels as the cumulative sum of the proportion. To put the labels in the center of pies, we’ll use <code>cumsum(prop) - 0.5*prop</code> as label position.</li>
</ul>
<pre class="r"><code>df <- df %>%
  arrange(desc(cut)) %>%
  mutate(prop = round(counts*100/sum(counts), 1),
         lab.ypos = cumsum(prop) - 0.5*prop)
head(df, 4)</code></pre>
<pre><code>## # A tibble: 4 x 4
##         cut counts  prop lab.ypos
##       <ord>  <int> <dbl>    <dbl>
## 1     Ideal  21551  40.0     20.0
## 2   Premium  13791  25.6     52.8
## 3 Very Good  12082  22.4     76.8
## 4      Good   4906   9.1     92.5</code></pre>
<ul>
<li>Create the pie charts using ggplot2 verbs. Key function: <code>coord_polar()</code>.</li>
</ul>
<pre class="r"><code>ggplot(df, aes(x = "", y = prop, fill = cut)) +
  geom_bar(width = 1, stat = "identity", color = "white") +
  geom_text(aes(y = lab.ypos, label = prop), color = "white")+
  coord_polar("y", start = 0)+
  ggpubr::fill_palette("jco")+
  theme_void()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-pie-charts-1.png" width="384" /></p>
<ul>
<li>Alternative solution to easily create a pie chart: use the function <code>ggpie()</code>[in ggpubr]:</li>
</ul>
<pre class="r"><code>ggpie(
  df, x = "prop", label = "prop",
  lab.pos = "in", lab.font = list(color = "white"), 
  fill = "cut", color = "white",
  palette = "jco"
)</code></pre>
</div>
<div id="dot-charts" class="section level3">
<h3>Dot charts</h3>
<p>Dot chart is an alternative to bar plots.  Key functions:</p>
<ul>
<li><code>geom_linerange()</code>:Creates line segments from x to ymax</li>
<li><code>geom_point()</code>: adds dots</li>
<li><code>ggpubr::color_palette()</code>: changes color palette.</li>
</ul>
<pre class="r"><code>ggplot(df, aes(cut, prop)) +
  geom_linerange(
    aes(x = cut, ymin = 0, ymax = prop), 
    color = "lightgray", size = 1.5
    )+
  geom_point(aes(color = cut), size = 2)+
  ggpubr::color_palette("jco")+
  theme_pubclean()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-frequency-dot-charts-1.png" width="480" /></p>
<p>Easy alternative to create a dot chart. Use <code>ggdotchart()</code> [ggpubr]:</p>
<pre class="r"><code>ggdotchart(
  df, x = "cut", y = "prop",
  color = "cut", size = 3,      # Points color and size
  add = "segment",              # Add line segments
  add.params = list(size = 2), 
  palette = "jco",
  ggtheme = theme_pubclean()
)</code></pre>
</div>
</div>
<div id="one-continuous-variable" class="section level2">
<h2>One continuous variable</h2>
<p>Different types of graphs can be used to visualize the distribution of a continuous variable, including: density and histogram plots.</p>
<div id="data-format" class="section level3">
<h3>Data format</h3>
<p>Create some data (<code>wdata</code>) containing the weights by sex (M for male; F for female):</p>
<pre class="r"><code>set.seed(1234)
wdata = data.frame(
        sex = factor(rep(c("F", "M"), each=200)),
        weight = c(rnorm(200, 55), rnorm(200, 58))
        )

head(wdata, 4)</code></pre>
<pre><code>##   sex weight
## 1   F   53.8
## 2   F   55.3
## 3   F   56.1
## 4   F   52.7</code></pre>
<p>Compute the mean weight by sex using the <code>dplyr</code> package. First, the data is grouped by sex and then summarized by computing the mean weight by groups. The operator <code>%>%</code> is used to combine multiple operations:</p>
<pre class="r"><code>library("dplyr")
mu <- wdata %>% 
  group_by(sex) %>%
  summarise(grp.mean = mean(weight))
mu</code></pre>
<pre><code>## # A tibble: 2 x 2
##      sex grp.mean
##   <fctr>    <dbl>
## 1      F     54.9
## 2      M     58.1</code></pre>
</div>
<div id="basic-plots" class="section level3">
<h3>Basic plots</h3>
<p>We start by creating a plot, named <code>a</code>, that we’ll finish in the next section by adding a layer.</p>
<pre class="r"><code>a <- ggplot(wdata, aes(x = weight))</code></pre>
<p>Possible layers include: <code>geom_density()</code> (for density plots) and <code>geom_histogram()</code> (for histogram plots).</p>
<p>Key arguments to customize the plots:</p>
<ul>
<li><code>color, size, linetype</code>: change the line color, size and type, respectively</li>
<li><code>fill</code>: change the areas fill color (for bar plots, histograms and density plots)</li>
<li><code>alpha</code>: create a semi-transparent color.</li>
</ul>
</div>
<div id="density-plots" class="section level3">
<h3>Density plots</h3>
<p>Key function: <code>geom_density()</code>. </p>
<ol style="list-style-type: decimal">
<li><strong>Create basic density plots</strong>. Add a vertical line corresponding to the mean value of the weight variable (<code>geom_vline()</code>):</li>
</ol>
<pre class="r"><code># y axis scale = ..density.. (default behaviour)
a + geom_density() +
  geom_vline(aes(xintercept = mean(weight)), 
             linetype = "dashed", size = 0.6)
  
# Change y axis to count instead of density
a + geom_density(aes(y = ..count..), fill = "lightgray") +
  geom_vline(aes(xintercept = mean(weight)), 
             linetype = "dashed", size = 0.6,
             color = "#FC4E07")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-basic-density-plot-1.png" width="288" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-basic-density-plot-2.png" width="288" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Change areas fill and add line color by groups</strong> (sex):</li>
</ol>
<ul>
<li>Add vertical mean lines using <code>geom_vline()</code>. Data: <code>mu</code>, which contains the mean values of weights by sex (computed in the previous section).</li>
<li>Change color manually:
<ul>
<li>use <code>scale_color_manual()</code> or <code>scale_colour_manual()</code> for changing line color</li>
<li>use <code>scale_fill_manual()</code> for changing area fill colors.</li>
</ul></li>
</ul>
<pre class="r"><code># Change line color by sex
a + geom_density(aes(color = sex)) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

# Change fill color by sex and add mean line
# Use semi-transparent fill: alpha = 0.4
a + geom_density(aes(fill = sex), alpha = 0.4) +
      geom_vline(aes(xintercept = grp.mean, color = sex),
             data = mu, linetype = "dashed") +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))+
  scale_fill_manual(values = c("#868686FF", "#EFC000FF"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-density-change-color-by-groups-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-density-change-color-by-groups-2.png" width="316.8" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Simple solution to create a ggplot2-based density plots</strong>: use <code>ggboxplot()</code> [in ggpubr].</li>
</ol>
<pre class="r"><code>library(ggpubr)

# Basic density plot with mean line and marginal rug
ggdensity(wdata, x = "weight", 
          fill = "#0073C2FF", color = "#0073C2FF",
          add = "mean", rug = TRUE)
     
# Change outline and fill colors by groups ("sex")
# Use a custom palette
ggdensity(wdata, x = "weight",
   add = "mean", rug = TRUE,
   color = "sex", fill = "sex",
   palette = c("#0073C2FF", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-density-plots-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-density-plots-2.png" width="316.8" /></p>
</div>
<div id="histogram-plots" class="section level3">
<h3>Histogram plots</h3>
<p>An alternative to density plots is histograms, which represents the distribution of a continuous variable by dividing into bins and counting the number of observations in each bin. </p>
<p>Key function: <code>geom_histogram()</code>. The basic usage is quite similar to <code>geom_density()</code>.</p>
<ol style="list-style-type: decimal">
<li><strong>Create a basic plots</strong>. Add a vertical line corresponding to the mean value of the weight variable:</li>
</ol>
<pre class="r"><code>a + geom_histogram(bins = 30, color = "black", fill = "gray") +
  geom_vline(aes(xintercept = mean(weight)), 
             linetype = "dashed", size = 0.6)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-basic-histograms-1.png" width="316.8" /></p>
<div class="notice">
<p>
Note that, by default:
</p>
<ul>
<li>
By default, <code>geom_histogram()</code> uses 30 bins - this might not be good default. You can change the number of bins (e.g.: bins = 50) or the bin width (e.g.: binwidth = 0.5)
</li>
<li>
The y axis corresponds to the count of weight values. If you want to change the plot in order to have the density on y axis, specify the argument <code>y = ..density..</code> in <code>aes()</code>.
</li>
</ul>
</div>
<ol start="2" style="list-style-type: decimal">
<li><strong>Change areas fill and add line color by groups</strong> (sex):</li>
</ol>
<ul>
<li>Add vertical mean lines using <code>geom_vline()</code>. Data: <code>mu</code>, which contains the mean values of weights by sex.</li>
<li>Change color manually:
<ul>
<li>use <code>scale_color_manual()</code> or <code>scale_colour_manual()</code> for changing line color</li>
<li>use <code>scale_fill_manual()</code> for changing area fill colors.</li>
</ul></li>
<li>Adjust the position of histogram bars by using the argument <code>position</code>. Allowed values: “identity”, “stack”, “dodge”. Default value is “stack”.</li>
</ul>
<pre class="r"><code># Change line color by sex
a + geom_histogram(aes(color = sex), fill = "white", 
                   position = "identity") +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) 

# change fill and outline color manually 
a + geom_histogram(aes(color = sex, fill = sex),
                         alpha = 0.4, position = "identity") +
  scale_fill_manual(values = c("#00AFBB", "#E7B800")) +
  scale_color_manual(values = c("#00AFBB", "#E7B800"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-histogram-change-color-by-groups-1.png" width="326.4" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-histogram-change-color-by-groups-2.png" width="326.4" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Combine histogram and density plots</strong>:</li>
</ol>
<ul>
<li>Plot histogram with density values on y-axis (instead of count values).</li>
<li>Add density plot with transparent density plot</li>
</ul>
<pre class="r"><code># Histogram with density plot
a + geom_histogram(aes(y = ..density..), 
                   colour="black", fill="white") +
  geom_density(alpha = 0.2, fill = "#FF6666") 
     

# Color by groups
a + geom_histogram(aes(y = ..density.., color = sex), 
                   fill = "white",
                   position = "identity")+
  geom_density(aes(color = sex), size = 1) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-combine-density-and-histogram-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-combine-density-and-histogram-2.png" width="316.8" /></p>
<ol start="4" style="list-style-type: decimal">
<li><strong>Simple solution to create a ggplot2-based histogram plots</strong>: use <code>gghistogram()</code> [in ggpubr].</li>
</ol>
<pre class="r"><code>library(ggpubr)

# Basic histogram plot with mean line and marginal rug
gghistogram(wdata, x = "weight", bins = 30, 
            fill = "#0073C2FF", color = "#0073C2FF",
            add = "mean", rug = TRUE)
     
# Change outline and fill colors by groups ("sex")
# Use a custom palette
gghistogram(wdata, x = "weight", bins = 30,
   add = "mean", rug = TRUE,
   color = "sex", fill = "sex",
   palette = c("#0073C2FF", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-histogram-1.png" width="326.4" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-histogram-2.png" width="326.4" /></p>
</div>
<div id="alternative-to-density-and-histogram-plots" class="section level3">
<h3>Alternative to density and histogram plots</h3>
<ol style="list-style-type: decimal">
<li><strong>Frequency polygon</strong>. Very close to histogram plots, but it uses lines instead of bars. 
<ul>
<li>Key function: <code>geom_freqpoly()</code>.</li>
<li>Key arguments: <code>color</code>, <code>size</code>, <code>linetype</code>: change, respectively, line color, size and type.</li>
</ul></li>
<li><strong>Area plots</strong>. This is a continuous analog of a stacked bar plot. 
<ul>
<li>Key function: <code>geom_area()</code>.</li>
<li>Key arguments:
<ul>
<li><code>color</code>, <code>size</code>, <code>linetype</code>: change, respectively, line color, size and type.</li>
<li><code>fill</code>: change area fill color.</li>
</ul></li>
</ul></li>
</ol>
<p>In this section, we’ll use the theme <code>theme_pubclean()</code> [in ggpubr]. This is a theme without axis lines, to direct more attention to the data. Type this to use the theme:</p>
<pre class="r"><code>theme_set(theme_pubclean())</code></pre>
<ul>
<li>Create a basic frequency polygon and basic area plots:</li>
</ul>
<pre class="r"><code># Basic frequency polygon
a + geom_freqpoly(bins = 30) 

# Basic area plots, which can be filled by color
a + geom_area( stat = "bin", bins = 30,
               color = "black", fill = "#00AFBB")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-frequency-polygon-and-area-plot-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-frequency-polygon-and-area-plot-2.png" width="316.8" /></p>
<ul>
<li>Change colors by groups (sex):</li>
</ul>
<pre class="r"><code># Frequency polygon: 
# Change line colors and types by groups
a + geom_freqpoly( aes(color = sex, linetype = sex),
                   bins = 30, size = 1.5) +
  scale_color_manual(values = c("#00AFBB", "#E7B800"))

# Area plots: change fill colors by sex
# Create a stacked area plots
a + geom_area(aes(fill = sex), color = "white", 
              stat ="bin", bins = 30) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-frequency-polygon-and-area-plot-color-by-groups-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-frequency-polygon-and-area-plot-color-by-groups-2.png" width="316.8" /></p>
<div class="notice">
<p>
As in histogram plots, the default y values is count. To have density values on y axis, specify <code>y = ..density..</code> in <code>aes()</code>.
</p>
</div>
<ol start="3" style="list-style-type: decimal">
<li><strong>Dot plots</strong>. Represents another alternative to histograms and density plots, that can be used to visualize a continuous variable. Dots are stacked with each dot representing one observation. The width of a dot corresponds to the bin width. </li>
</ol>
<ul>
<li>Key function: <code>geom_dotplot()</code>.</li>
<li>Key arguments: <code>alpha</code>, <code>color</code>, <code>fill</code> and <code>dotsize</code>.</li>
</ul>
<p>Create a dot plot colored by groups (sex):</p>
<pre class="r"><code>a + geom_dotplot(aes(fill = sex), binwidth = 1/4) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_dotplot-one-variable-1.png" width="384" /></p>
<ol start="4" style="list-style-type: decimal">
<li><strong>Box plot</strong>: 
<ul>
<li>Create a box plot of one continuous variable: <code>geom_boxplot()</code>

</li>
<li>Add jittered points, where each point corresponds to an individual observation: <code>geom_jitter()</code>. Change the color and the shape of points by groups (sex)</li>
</ul></li>
</ol>
<pre class="r"><code>ggplot(wdata, aes(x = factor(1), y = weight)) +
  geom_boxplot(width = 0.4, fill = "white") +
  geom_jitter(aes(color = sex, shape = sex), 
              width = 0.1, size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) + 
  labs(x = NULL)   # Remove x axis label</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_boxplot-box-plot-one-variable-1.png" width="288" /></p>
<ol start="5" style="list-style-type: decimal">
<li><strong>Empirical cumulative distribution function (ECDF)</strong>. Provides another alternative visualization of distribution. It reports for any given number the percent of individuals that are below that threshold. </li>
</ol>
<p>For example, in the following plots, you can see that:</p>
<ul>
<li>about 25% of our females are shorter than 50 inches</li>
<li>about 50% of males are shorter than 58 inches</li>
</ul>
<pre class="r"><code># Another option for geom = "point"
a + stat_ecdf(aes(color = sex,linetype = sex), 
              geom = "step", size = 1.5) +
  scale_color_manual(values = c("#00AFBB", "#E7B800"))+
  labs(y = "f(weight)")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-stat_ecdf-empirical-cumulative-distribution-function-1.png" width="480" /></p>
<ol start="6" style="list-style-type: decimal">
<li><strong>Quantile-quantile plot</strong> (QQ plots). Used to check whether a given data follows normal distribution. </li>
</ol>
<ul>
<li>Key function: <code>stat_qq()</code>.</li>
<li>Key arguments: <code>color</code>, <code>shape</code> and <code>size</code> to change point color, shape and size.</li>
</ul>
<p>Create a qq-plot of weight. Change color by groups (sex)</p>
<pre class="r"><code># Change point shapes by groups
ggplot(wdata, aes(sample = weight)) +
  stat_qq(aes(color = sex)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800"))+
  labs(y = "Weight")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-quantile-quantile-qq-plot-1.png" width="480" /></p>
<p>Alternative plot using the function <code>ggqqplot()</code> [in ggpubr]. The 95% confidence band is shown by default.</p>
<pre class="r"><code>library(ggpubr)
ggqqplot(wdata, x = "weight",
   color = "sex", 
   palette = c("#0073C2FF", "#FC4E07"),
   ggtheme = theme_pubclean())</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggqqplot-quantile-quantile-plot-1.png" width="480" /></p>
</div>
<div id="density-ridgeline-plots" class="section level3">
<h3>Density ridgeline plots</h3>
<p>The density ridgeline plot is an alternative to the standard <code>geom_density()</code> function that can be useful for visualizing changes in distributions, of a continuous variable, over time or space. Ridgeline plots are partially overlapping line plots that create the impression of a mountain range. </p>
<p>This functionality is provided in the R package <code>ggridges</code> <span class="citation">(Wilke 2017)</span>.</p>
<ol style="list-style-type: decimal">
<li><strong>Installation</strong>:</li>
</ol>
<pre class="r"><code>install.packages("ggridges")</code></pre>
<ol start="2" style="list-style-type: decimal">
<li><strong>Load and set the default theme</strong> to <code>theme_ridges()</code> [in ggridges]:</li>
</ol>
<pre class="r"><code>library(ggplot2)
library(ggridges)
theme_set(theme_ridges())</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><strong>Example 1: Simple distribution plots by groups</strong>. Distribution of Sepal.Length by Species using the <code>iris</code> data set. The grouping variable Species will be mapped to the y-axis:</li>
</ol>
<pre class="r"><code>ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  geom_density_ridges(aes(fill = Species)) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggridges-1.png" width="528" /></p>
<div class="notice">
<p>
You can control the overlap between the different densities using the <code>scale</code> option. Default value is 1. Smaller values create a separation between the curves, and larger values create more overlap.
</p>
</div>
<pre class="r"><code>ggplot(iris, aes(x = Sepal.Length, y = Species)) +
  geom_density_ridges(scale = 0.9) </code></pre>
<ol start="4" style="list-style-type: decimal">
<li><strong>Example 4: Visualize temperature data</strong>.</li>
</ol>
<ul>
<li><p>Data set: <code>lincoln_weather</code> [in ggridges]. Weather in Lincoln, Nebraska in 2016.</p></li>
<li><p>Create the density ridge plots of the <code>Mean Temperature</code> by <code>Month</code> and change the fill color according to the temperature value (on x axis). A gradient color is created using the function <code>geom_density_ridges_gradient()</code></p></li>
</ul>
<pre class="r"><code>ggplot(
  lincoln_weather, 
  aes(x = `Mean Temperature [F]`, y = `Month`)
  ) +
  geom_density_ridges_gradient(
    aes(fill = ..x..), scale = 3, size = 0.3
    ) +
  scale_fill_gradientn(
    colours = c("#0D0887FF", "#CC4678FF", "#F0F921FF"),
    name = "Temp. [F]"
    )+
  labs(title = &amp;#39;Temperatures in Lincoln NE&amp;#39;) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggridges-density-gradient-1.png" width="672" /></p>
<p>For more examples, type the following R code:</p>
<pre class="r"><code>browseVignettes("ggridges")</code></pre>
</div>
<div id="bar-plot-and-modern-alternatives" class="section level3">
<h3>Bar plot and modern alternatives</h3>
<p>In this section, we’ll describe how to create easily basic and ordered bar plots using ggplot2 based helper functions available in the ggpubr R package. We’ll also present some modern alternatives to bar plots, including lollipop charts and cleveland’s dot plots.</p>
<ul>
<li>Load required packages:</li>
</ul>
<pre class="r"><code>library(ggpubr)</code></pre>
<ul>
<li>Load and prepare data:</li>
</ul>
<pre class="r"><code># Load data
dfm <- mtcars
# Convert the cyl variable to a factor
dfm$cyl <- as.factor(dfm$cyl)
# Add the name colums
dfm$name <- rownames(dfm)
# Inspect the data
head(dfm[, c("name", "wt", "mpg", "cyl")])</code></pre>
<pre><code>##                                name   wt  mpg cyl
## Mazda RX4                 Mazda RX4 2.62 21.0   6
## Mazda RX4 Wag         Mazda RX4 Wag 2.88 21.0   6
## Datsun 710               Datsun 710 2.32 22.8   4
## Hornet 4 Drive       Hornet 4 Drive 3.21 21.4   6
## Hornet Sportabout Hornet Sportabout 3.44 18.7   8
## Valiant                     Valiant 3.46 18.1   6</code></pre>
<ul>
<li>Create an ordered bar plot of the <code>mpg</code> variable. Change the fill color by the grouping variable “cyl”. Sorting will be done globally, but not by groups.</li>
</ul>
<pre class="r"><code>ggbarplot(dfm, x = "name", y = "mpg",
          fill = "cyl",               # change fill color by cyl
          color = "white",            # Set bar border colors to white
          palette = "jco",            # jco journal color palett. see ?ggpar
          sort.val = "asc",          # Sort the value in dscending order
          sort.by.groups = TRUE,     # Don&amp;#39;t sort inside each group
          x.text.angle = 90,           # Rotate vertically x axis texts
          ggtheme = theme_pubclean()
          )+
  font("x.text", size = 8, vjust = 0.5)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ordered-bar-plots-1.png" width="672" /></p>
<div class="notice">
<p>
To sort bars inside each group, use the argument <strong>sort.by.groups = TRUE</strong>
</p>
</div>
<ul>
<li>Create a Lollipop chart:
<ul>
<li>Color by groups and set a custom color palette.</li>
<li>Sort values in ascending order.</li>
<li>Add segments from y = 0 to dots. Change segment color and size.</li>
</ul></li>
</ul>
<pre class="r"><code>ggdotchart(dfm, x = "name", y = "mpg",
           color = "cyl",                                
           palette = c("#00AFBB", "#E7B800", "#FC4E07"), 
           sorting = "asc", sort.by.groups = TRUE,                      
           add = "segments",                            
           add.params = list(color = "lightgray", size = 2), 
           group = "cyl",                                
           dot.size = 4,                                 
           ggtheme = theme_pubclean()
           )+
  font("x.text", size = 8, vjust = 0.5)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-lollipop-chart-1.png" width="672" /></p>
<p>Read more: <a href="https://goo.gl/eSggcW">Bar Plots and Modern Alternatives</a></p>
</div>
</div>
<div id="conclusion" class="section level2">
<h2>Conclusion</h2>
<ul>
<li>Create a bar plot of a grouping variable:</li>
</ul>
<pre class="r"><code>ggplot(diamonds, aes(cut)) +
  geom_bar(fill = "#0073C2FF") +
  theme_minimal()</code></pre>
<ul>
<li>Visualize a continuous variable:</li>
</ul>
<p>Start by creating a plot, named <code>a</code>, that we’ll be finished by adding a layer.</p>
<pre class="r"><code>a <- ggplot(wdata, aes(x = weight))</code></pre>
<p>Possible layers include:</p>
<div class="block">
<ul>
<li>
<strong>geom_density()</strong>: density plot
</li>
<li>
<strong>geom_histogram()</strong>: histogram plot
</li>
<li>
<strong>geom_freqpoly()</strong>: frequency polygon
</li>
<li>
<strong>geom_area()</strong>: area plot
</li>
<li>
<strong>geom_dotplot()</strong>: dot plot
</li>
<li>
<strong>stat_ecdf()</strong>: empirical cumulative density function
</li>
<li>
<strong>stat_qq()</strong>: quantile - quantile plot
</li>
</ul>
</div>
<p>Key arguments to customize the plots:</p>
<ul>
<li><code>color, size, linetype</code>: change the line color, size and type, respectively</li>
<li><code>fill</code>: change the areas fill color (for bar plots, histograms and density plots)</li>
<li><code>alpha</code>: create a semi-transparent color.</li>
</ul>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-one-continuous-variable-1.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-one-continuous-variable-2.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-one-continuous-variable-3.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-one-continuous-variable-4.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-one-continuous-variable-5.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-one-continuous-variable-6.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/005-plot-one-variable-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-one-continuous-variable-7.png" width="153.6" /></p>
</div>
<div id="references" class="section level2 unnumbered">
<h2>References</h2>
<div id="refs" class="references">
<div id="ref-R-ggridges">
<p>Wilke, Claus O. 2017. <em>Ggridges: Ridgeline Plots in ’Ggplot2’</em>. <a href="https://CRAN.R-project.org/package=ggridges" class="uri">https://CRAN.R-project.org/package=ggridges</a>.</p>
</div>
</div>
</div>


</div><!--end rdoc-->



<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 18:42:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Plot Grouped Data: Box plot, Bar Plot and More]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/132-plot-grouped-data-box-plot-bar-plot-and-more/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/132-plot-grouped-data-box-plot-bar-plot-and-more/</guid>
			<description><![CDATA[<!-- START HTML -->


  <div id="rdoc">
<p>In this chapter, we’ll show how to plot data grouped by the levels of a categorical variable.</p>
<p>We start by describing how to plot grouped or <strong>stacked frequencies</strong> of two categorical variables. This can be done using <strong>bar plots</strong> and <strong>dot charts</strong>. You’ll also learn how to add labels to dodged and stacked bar plots.</p>
<p>Next we’ll show how to display a continuous variable with multiple groups. In this situation, the grouping variable is used as the x-axis and the continuous variable as the y-axis. You’ll learn, how to:</p>
<ul>
<li>Visualize a grouped continuous variable using <strong>box plot</strong>, <strong>violin plots</strong>, <strong>stripcharts</strong> and alternatives.</li>
<li>Add automatically t-test / wilcoxon test p-values comparing the groups.</li>
<li>Create mean and median plots of groups with error bars</li>
</ul>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#grouped-categorical-variables">Grouped categorical variables</a></li>
<li><a href="#grouped-continuous-variables">Grouped continuous variables</a><ul>
<li><a href="#data-format">Data format</a></li>
<li><a href="#box-plots">Box plots</a></li>
<li><a href="#violin-plots">Violin plots</a></li>
<li><a href="#dot-plots">Dot plots</a></li>
<li><a href="#stripcharts">Stripcharts</a></li>
<li><a href="#sinaplot">Sinaplot</a></li>
<li><a href="#mean-and-median-plots-with-error-bars">Mean and median plots with error bars</a></li>
<li><a href="#add-p-values-and-significance-levels">Add p-values and significance levels</a></li>
</ul></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#see-also">See also</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="prerequisites" class="section level2">
<h2>Prerequisites</h2>
<p>Load required packages and set the theme function <code>theme_pubclean()</code> [in ggpubr] as the default theme:</p>
<pre class="r"><code>library(dplyr) 
library(ggplot2)
library(ggpubr)
theme_set(theme_pubclean())</code></pre>
</div>
<div id="grouped-categorical-variables" class="section level2">
<h2>Grouped categorical variables</h2>
<ul>
<li>Plot types: grouped bar plots of the frequencies of the categories. Key function: <code>geom_bar()</code>.</li>
<li>Demo dataset: <code>diamonds</code> [in ggplot2]. The categorical variables to be used in the demo example are:
<ul>
<li><code>cut</code>: quality of the diamonds cut (Fair, Good, Very Good, Premium, Ideal)</li>
<li><code>color</code>: diamond colour, from J (worst) to D (best).</li>
</ul></li>
</ul>
<p>In our demo example, we’ll plot only a subset of the data (color J and D). The different steps are as follow:</p>
<ul>
<li>Filter the data to keep only diamonds which colors are in (“J”, “D”).</li>
<li>Group the data by the quality of the cut and the diamond color</li>
<li>Count the number of records by groups</li>
<li>Create the bar plot</li>
</ul>
<ol style="list-style-type: decimal">
<li><strong>Filter and count the number of records by groups</strong>:</li>
</ol>
<pre class="r"><code>df <- diamonds %>%
  filter(color %in% c("J", "D")) %>%
  group_by(cut, color) %>%
  summarise(counts = n()) 
head(df, 4)</code></pre>
<pre><code>## # A tibble: 4 x 3
## # Groups:   cut [2]
##     cut color counts
##   <ord> <ord>  <int>
## 1  Fair     D    163
## 2  Fair     J    119
## 3  Good     D    662
## 4  Good     J    307</code></pre>
<ol start="2" style="list-style-type: decimal">
<li><strong>Creare the grouped bar plots</strong>: 
<ul>
<li>Key function: <code>geom_bar()</code>. Key argument: <code>stat = "identity"</code> to plot the data as it is.</li>
<li>Use the functions <code>scale_color_manual()</code> and <code>scale_fill_manual()</code> to set manually the bars border line colors and area fill colors.</li>
</ul></li>
</ol>
<pre class="r"><code># Stacked bar plots of y = counts by x = cut,
# colored by the variable color
ggplot(df, aes(x = cut, y = counts)) +
  geom_bar(
    aes(color = color, fill = color),
    stat = "identity", position = position_stack()
    ) +
  scale_color_manual(values = c("#0073C2FF", "#EFC000FF"))+
  scale_fill_manual(values = c("#0073C2FF", "#EFC000FF"))

# Use position = position_dodge() 
p <- ggplot(df, aes(x = cut, y = counts)) +
  geom_bar(
    aes(color = color, fill = color),
    stat = "identity", position = position_dodge(0.8),
    width = 0.7
    ) +
  scale_color_manual(values = c("#0073C2FF", "#EFC000FF"))+
  scale_fill_manual(values = c("#0073C2FF", "#EFC000FF"))
p</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_bar-grouped-bar-plot-of-frequencies-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_bar-grouped-bar-plot-of-frequencies-2.png" width="316.8" /></p>
<div class="notice">
<p>
Note that, <code>position_stack()</code> automatically stack values in reverse order of the group aesthetic. This default ensures that bar colors align with the default legend. You can change this behavior by using <code>position = position_stack(reverse = TRUE)</code>.
</p>
</div>
<p>Alternatively, you can easily create a dot chart with the <code>ggpubr</code> package:</p>
<pre class="r"><code>ggdotchart(df, x = "cut", y ="counts",
           color = "color", palette = "jco", size = 3, 
           add = "segment", 
           add.params = list(color = "lightgray", size = 1.5),
           position = position_dodge(0.3),
           ggtheme = theme_pubclean()
           )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-grouped-dot-chart-1.png" width="480" /></p>
<p>Or, if you prefer the ggplot2 verbs, type this:</p>
<pre class="r"><code>ggplot(df, aes(cut, counts)) +
  geom_linerange(
    aes(x = cut, ymin = 0, ymax = counts, group = color), 
    color = "lightgray", size = 1.5,
    position = position_dodge(0.3)
    )+
  geom_point(
    aes(color = color),
    position = position_dodge(0.3), size = 3
    )+
  scale_color_manual(values = c("#0073C2FF", "#EFC000FF"))+
  theme_pubclean()</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><strong>Add labels to the dodged bar plots</strong>:</li>
</ol>
<pre class="r"><code>p + geom_text(
  aes(label = counts, group = color), 
  position = position_dodge(0.8),
  vjust = -0.3, size = 3.5
)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-add-labels-to-dodged-bar-plots-1.png" width="595.2" /></p>
<ol start="4" style="list-style-type: decimal">
<li><strong>Add labels to a stacked bar plots</strong>. 3 steps required to compute the position of text labels:
<ul>
<li>Sort the data by cut and color columns. As <code>position_stack()</code> reverse the group order, <code>color</code> column should be sorted in descending order.</li>
<li>Calculate the cumulative sum of counts for each cut category. Used as the y coordinates of labels. To put the label in the middle of the bars, we’ll use <code>cumsum(counts) - 0.5 * counts</code>.</li>
<li>Create the bar graph and add labels</li>
</ul></li>
</ol>
<pre class="r"><code># Arrange/sort and compute cumulative summs
 df <- df %>%
  arrange(cut, desc(color)) %>%
  mutate(lab_ypos = cumsum(counts) - 0.5 * counts) 
head(df, 4)</code></pre>
<pre><code>## # A tibble: 4 x 4
## # Groups:   cut [2]
##     cut color counts lab_ypos
##   <ord> <ord>  <int>    <dbl>
## 1  Fair     J    119     59.5
## 2  Fair     D    163    200.5
## 3  Good     J    307    153.5
## 4  Good     D    662    638.0</code></pre>
<pre class="r"><code># Create stacked bar graphs with labels
ggplot(df, aes(x = cut, y = counts)) +
  geom_bar(aes(color = color, fill = color), stat = "identity") +
  geom_text(
    aes(y = lab_ypos, label = counts, group = color),
    color = "white"
  ) + 
  scale_color_manual(values = c("#0073C2FF", "#EFC000FF"))+
  scale_fill_manual(values = c("#0073C2FF", "#EFC000FF")) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-add-labels-to-stacked-bar-plots-1.png" width="480" /></p>
<p>Alternatively, you can easily create the above plot using the function <code>ggbarplot()</code> [in ggpubr]:</p>
<pre class="r"><code>ggbarplot(df, x = "cut", y = "counts",
          color = "color", fill = "color",
          palette = c("#0073C2FF", "#EFC000FF"),
          label = TRUE, lab.pos = "in", lab.col = "white",
          ggtheme = theme_pubclean()
          )</code></pre>
<ol start="6" style="list-style-type: decimal">
<li><strong>Alternative to bar plots</strong>. Instead of the creating a bar plot of the counts, you can plot two discrete variables with discrete x-axis and discrete y-axis. Each individual points are shown by groups. For a given group, the number of points corresponds to the number of records in that group.</li>
</ol>
<p>Key function: <code>geom_jitter()</code>. Arguments: alpha, color, fill, shape and size.</p>
<p>In the example below, we’ll plot a small fraction (1/5) of the diamonds dataset.</p>
<pre class="r"><code>diamonds.frac <- dplyr::sample_frac(diamonds, 1/5)
ggplot(diamonds.frac, aes(cut, color)) +
  geom_jitter(aes(color = cut), size = 0.3)+
  ggpubr::color_palette("jco")+
  ggpubr::theme_pubclean()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-plot-two-categorical-variables-1.png" width="432" /></p>
</div>
<div id="grouped-continuous-variables" class="section level2">
<h2>Grouped continuous variables</h2>
<p>In this section, we’ll show to plot a grouped continuous variable using box plot, violin plot, strip chart and alternatives.</p>
<p>We’ll also describe how to add automatically p-values comparing groups.</p>
<p>In this section, we’ll set the theme <code>theme_bw()</code> as the default ggplot theme:</p>
<pre class="r"><code>theme_set(
  theme_bw()
)</code></pre>
<div id="data-format" class="section level3">
<h3>Data format</h3>
<ul>
<li>Demo dataset: <code>ToothGrowth</code>
<ul>
<li>Continuous variable: <code>len</code> (tooth length). Used on y-axis</li>
<li>Grouping variable: <code>dose</code> (dose levels of vitamin C: 0.5, 1, and 2 mg/day). Used on x-axis.</li>
</ul></li>
</ul>
<p>First, convert the variable <code>dose</code> from a numeric to a discrete factor variable:</p>
<pre class="r"><code>data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth)</code></pre>
<pre><code>##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5</code></pre>
</div>
<div id="box-plots" class="section level3">
<h3>Box plots</h3>
<ul>
<li>Key function: <code>geom_boxplot()</code></li>
<li>Key arguments to customize the plot:
<ul>
<li><code>width</code>: the width of the box plot</li>
<li><code>notch</code>: logical. If TRUE, creates a notched box plot. The notch displays a confidence interval around the median which is normally based on the <code>median +/- 1.58*IQR/sqrt(n)</code>. Notches are used to compare groups; if the notches of two boxes do not overlap, this is a strong evidence that the medians differ.</li>
<li><code>color</code>, <code>size</code>, <code>linetype</code>: Border line color, size and type</li>
<li><code>fill</code>: box plot areas fill color</li>
<li><code>outlier.colour</code>, <code>outlier.shape</code>, <code>outlier.size</code>: The color, the shape and the size for outlying points.</li>
</ul></li>
</ul>

<ol style="list-style-type: decimal">
<li><strong>Create basic box plots</strong>:</li>
</ol>
<ul>
<li>Standard and notched box plots:</li>
</ul>
<pre class="r"><code># Default plot
e <- ggplot(ToothGrowth, aes(x = dose, y = len))
e + geom_boxplot()

# Notched box plot with mean points
e + geom_boxplot(notch = TRUE, fill = "lightgray")+
  stat_summary(fun.y = mean, geom = "point",
               shape = 18, size = 2.5, color = "#FC4E07")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_boxplot-create-basic-box-plots-1.png" width="240" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_boxplot-create-basic-box-plots-2.png" width="240" /></p>
<ul>
<li>Change box plot colors by groups:</li>
</ul>
<pre class="r"><code># Color by group (dose)
e + geom_boxplot(aes(color = dose))+
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))

# Change fill color by group (dose)
e + geom_boxplot(aes(fill = dose)) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_boxplot-color-by-groups-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_boxplot-color-by-groups-2.png" width="316.8" /></p>
<p>Note that, it’s possible to use the function <code>scale_x_discrete()</code> for:</p>
<ul>
<li>choosing which items to display: for example c(“0.5”, “2”),</li>
<li>changing the order of items: for example from c(“0.5”, “1”, “2”) to c(“2”, “0.5”, “1”)</li>
</ul>
<p>For example, type this:</p>
<pre class="r"><code># Choose which items to display: group "0.5" and "2"
e + geom_boxplot() + 
  scale_x_discrete(limits=c("0.5", "2"))

# Change the default order of items
e + geom_boxplot() +
  scale_x_discrete(limits=c("2", "0.5", "1"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-scale_x_discre_box-plot-change-group-order-1.png" width="192" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-scale_x_discre_box-plot-change-group-order-2.png" width="192" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Create a box plot with multiple groups</strong>:</li>
</ol>
<p>Two different grouping variables are used: <code>dose</code> on x-axis and <code>supp</code> as fill color (legend variable).</p>
<p>The space between the grouped box plots is adjusted using the function <code>position_dodge()</code>.</p>
<pre class="r"><code>e2 <- e + geom_boxplot(
  aes(fill = supp),
  position = position_dodge(0.9) 
  ) +
  scale_fill_manual(values = c("#999999", "#E69F00"))
e2</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-box-plot-multiple-groups-1.png" width="336" /></p>
<p>Split the plot into multiple panel. Use the function <code>facet_wrap()</code>:</p>
<pre class="r"><code>e2 + facet_wrap(~supp)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-multiple-panel-box-plot-1.png" width="576" /></p>
</div>
<div id="violin-plots" class="section level3">
<h3>Violin plots</h3>
<p>Violin plots are similar to box plots, except that they also show the kernel probability density of the data at different values. Typically, violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots. </p>
<p>Key function:</p>
<ul>
<li><code>geom_violin()</code>: Creates violin plots. Key arguments:
<ul>
<li><code>color</code>, <code>size</code>, <code>linetype</code>: Border line color, size and type</li>
<li><code>fill</code>: Areas fill color</li>
<li><code>trim</code>: logical value. If TRUE (default), trim the tails of the violins to the range of the data. If FALSE, don’t trim the tails.</li>
</ul></li>
<li><code>stat_summary()</code>: Adds summary statistics (mean, median, …) on the violin plots.</li>
</ul>
<ol style="list-style-type: decimal">
<li><strong>Create basic violin plots with summary statistics</strong>:</li>
</ol>
<pre class="r"><code># Add mean points +/- SD
# Use geom = "pointrange" or geom = "crossbar"
e + geom_violin(trim = FALSE) + 
  stat_summary(
    fun.data = "mean_sdl",  fun.args = list(mult = 1), 
    geom = "pointrange", color = "black"
    )
    
# Combine with box plot to add median and quartiles
# Change color by groups
e + geom_violin(aes(fill = dose), trim = FALSE) + 
  geom_boxplot(width = 0.2)+
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  theme(legend.position = "none")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_violin-violin-plot-with-summary-statistics-1.png" width="288" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_violin-violin-plot-with-summary-statistics-2.png" width="288" /></p>
<div class="notice">
<p>
The function <code>mean_sdl</code> is used for adding mean and standard deviation. It computes the mean plus or minus a constant times the standard deviation. In the R code above, the constant is specified using the argument <code>mult</code> (mult = 1). By default mult = 2. The mean +/- SD can be added as a crossbar or a pointrange.
</p>
</div>
<ol start="2" style="list-style-type: decimal">
<li><strong>Create violin plots with multiple groups</strong>:</li>
</ol>
<pre class="r"><code>e + geom_violin(
  aes(color = supp), trim = FALSE,
  position = position_dodge(0.9) 
  ) +
  geom_boxplot(
    aes(color = supp), width = 0.15,
    position = position_dodge(0.9)
    ) +
  scale_color_manual(values = c("#00AFBB", "#E7B800"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_violin-violin-plot-with-multiple-groups-1.png" width="480" /></p>
</div>
<div id="dot-plots" class="section level3">
<h3>Dot plots</h3>

<ul>
<li>Key function: <code>geom_dotplot()</code>. Creates stacked dots, with each dot representing one observation.</li>
<li>Key arguments:
<ul>
<li><code>stackdir</code>: which direction to stack the dots. “up” (default), “down”, “center”, “centerwhole” (centered, but with dots aligned).</li>
<li><code>stackratio</code>: how close to stack the dots. Default is 1, where dots just just touch. Use smaller values for closer, overlapping dots.</li>
<li><code>color</code>, <code>fill</code>: Dot border color and area fill</li>
<li><code>dotsize</code>: The diameter of the dots relative to binwidth, default 1.</li>
</ul></li>
</ul>
<p>As for violin plots, summary statistics are usually added to dot plots.</p>
<ol style="list-style-type: decimal">
<li><strong>Create basic dot plots</strong>:</li>
</ol>
<pre class="r"><code># Violin plots with mean points +/- SD
e + geom_dotplot(
  binaxis = "y", stackdir = "center",
  fill = "lightgray"
  ) + 
  stat_summary(
    fun.data = "mean_sdl", fun.args = list(mult=1), 
    geom = "pointrange", color = "red"
    )

# Combine with box plots
e + geom_boxplot(width = 0.5) + 
  geom_dotplot(
    binaxis = "y", stackdir = "center",
    fill = "white"
    ) 

# Dot plot + violin plot + stat summary
e + geom_violin(trim = FALSE) +
  geom_dotplot(
    binaxis=&amp;#39;y&amp;#39;, stackdir=&amp;#39;center&amp;#39;,
    color = "black", fill = "#999999"
    ) +
  stat_summary(
    fun.data="mean_sdl",  fun.args = list(mult=1), 
    geom = "pointrange", color = "#FC4E07", size = 0.4
    )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_dotplot-dot-plot-with-summary-statistics-1.png" width="220.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_dotplot-dot-plot-with-summary-statistics-2.png" width="220.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_dotplot-dot-plot-with-summary-statistics-3.png" width="220.8" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Create dot plots with multiple groups</strong>:</li>
</ol>
<pre class="r"><code># Color dots by groups
e + geom_boxplot(width = 0.5, size = 0.4) +
  geom_dotplot(
    aes(fill = supp), trim = FALSE,
    binaxis=&amp;#39;y&amp;#39;, stackdir=&amp;#39;center&amp;#39;
  )+
  scale_fill_manual(values = c("#00AFBB", "#E7B800"))

# Change the position : interval between dot plot of the same group
e + geom_boxplot(
  aes(color = supp), width = 0.5, size = 0.4,
  position = position_dodge(0.8)
  ) +
  geom_dotplot(
    aes(fill = supp, color = supp), trim = FALSE,
    binaxis=&amp;#39;y&amp;#39;, stackdir=&amp;#39;center&amp;#39;, dotsize = 0.8,
    position = position_dodge(0.8)
  )+
  scale_fill_manual(values = c("#00AFBB", "#E7B800"))+
  scale_color_manual(values = c("#00AFBB", "#E7B800"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_dotplot-dotplot-plot-with-multiple-groups-1.png" width="336" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_dotplot-dotplot-plot-with-multiple-groups-2.png" width="336" /></p>
</div>
<div id="stripcharts" class="section level3">
<h3>Stripcharts</h3>
<p>Stripcharts are also known as one dimensional scatter plots. These plots are suitable compared to box plots when sample sizes are small. </p>
<ul>
<li>Key function: <code>geom_jitter()</code></li>
<li>key arguments: <code>color</code>, <code>fill</code>, <code>size</code>, <code>shape</code>. Changes points color, fill, size and shape</li>
</ul>
<ol style="list-style-type: decimal">
<li><strong>Create a basic stripchart</strong>:</li>
</ol>
<ul>
<li>Change points shape and color by groups</li>
<li>Adjust the degree of jittering: <code>position_jitter(0.2)</code></li>
<li>Add summary statistics:</li>
</ul>
<pre class="r"><code>e + geom_jitter(
  aes(shape = dose, color = dose), 
  position = position_jitter(0.2),
  size = 1.2
  ) +
  stat_summary(
    aes(color = dose),
    fun.data="mean_sdl",  fun.args = list(mult=1), 
    geom = "pointrange",  size = 0.4
    )+
  scale_color_manual(values =  c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_jitter-basic-stripcharts-1.png" width="384" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Create stripcharts for multiple groups</strong>. The R code is similar to what we have seen in dot plots section. However, to create dodged jitter points, you should use the function <code>position_jitterdodge()</code> instead of <code>position_dodge()</code>.</li>
</ol>
<pre class="r"><code>e + geom_jitter(
  aes(shape = supp, color = supp), 
  position = position_jitterdodge(jitter.width = 0.2, dodge.width = 0.8),
  size = 1.2
  ) +
  stat_summary(
    aes(color = supp),
    fun.data="mean_sdl",  fun.args = list(mult=1), 
    geom = "pointrange",  size = 0.4,
    position = position_dodge(0.8)
    )+
  scale_color_manual(values =  c("#00AFBB", "#E7B800"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_jitter-stripcharts-for-multiple-groups-1.png" width="432" /></p>
</div>
<div id="sinaplot" class="section level3">
<h3>Sinaplot</h3>
<p><strong>sinaplot</strong> is inspired by the strip chart and the violin plot. By letting the normalized density of points restrict the jitter along the x-axis, the plot displays the same contour as a violin plot, but resemble a simple strip chart for small number of data points <span class="citation">(Sidiropoulos et al. 2015)</span>.</p>
<p>In this way the plot conveys information of both the number of data points, the density distribution, outliers and spread in a very simple, comprehensible and condensed format.</p>
<p>Key function: <code>geom_sina()</code> [ggforce]:</p>
<pre class="r"><code>library(ggforce)
# Create some data
d1 <- data.frame(
  y = c(rnorm(200, 4, 1), rnorm(200, 5, 2), rnorm(400, 6, 1.5)),
  group = rep(c("Grp1", "Grp2", "Grp3"), c(200, 200, 400))
  )

# Sinaplot
ggplot(d1, aes(group, y)) +
  geom_sina(aes(color = group), size = 0.7)+
  scale_color_manual(values =  c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-sinaplot-1.png" width="432" /></p>
</div>
<div id="mean-and-median-plots-with-error-bars" class="section level3">
<h3>Mean and median plots with error bars</h3>
<p>In this section, we’ll show how to plot summary statistics of a continuous variable organized into groups by one or multiple grouping variables.  </p>
<p>Note that, an easy way, with less typing, to create mean/median plots, is provided in the ggpubr package. See the associated article at: <a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/">ggpubr-Plot Means/Medians and Error Bars</a></p>
<p>Set the default theme to <code>theme_pubr()</code> [in ggpubr]:</p>
<pre class="r"><code>theme_set(ggpubr::theme_pubr())</code></pre>
<ol style="list-style-type: decimal">
<li><strong>Basic mean/median plots</strong>. Case of one continuous variable and one grouping variable:</li>
</ol>
<ul>
<li>Prepare the data: <code>ToothGrowth</code> data set.</li>
</ul>
<pre class="r"><code>df <- ToothGrowth
df$dose <- as.factor(df$dose)
head(df, 3)</code></pre>
<pre><code>##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5</code></pre>
<ul>
<li>Compute summary statistics for the variable <code>len</code> organized into groups by the variable <code>dose</code>:</li>
</ul>
<pre class="r"><code>library(dplyr)
df.summary <- df %>%
  group_by(dose) %>%
  summarise(
    sd = sd(len, na.rm = TRUE),
    len = mean(len)
  )
df.summary</code></pre>
<pre><code>## # A tibble: 3 x 3
##     dose    sd   len
##   <fctr> <dbl> <dbl>
## 1    0.5  4.50  10.6
## 2      1  4.42  19.7
## 3      2  3.77  26.1</code></pre>
<ul>
<li>Create error plots using the summary statistics data. Key functions:
<ul>
<li><code>geom_crossbar()</code> for hollow bar with middle indicated by horizontal line</li>
<li><code>geom_errorbar()</code> for error bars</li>
<li><code>geom_errorbarh()</code> for horizontal error bars</li>
<li><code>geom_linerange()</code> for drawing an interval represented by a vertical line</li>
<li><code>geom_pointrange()</code> for creating an interval represented by a vertical line, with a point in the middle.</li>
</ul></li>
</ul>
<p>Start by initializing ggplot with the summary statistics data:

- Specify x and y as usually - Specify <code>ymin = len-sd</code> and <code>ymax = len+sd</code> to add lower and upper error bars. If you want only to add upper error bars but not the lower ones, use <code>ymin = len</code> (instead of <code>len-sd</code>) and <code>ymax = len+sd</code>.</p>
<pre class="r"><code># Initialize ggplot with data
f <- ggplot(
  df.summary, 
  aes(x = dose, y = len, ymin = len-sd, ymax = len+sd)
  )</code></pre>
<p>Possible error plots:</p>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-error-bars-1.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-error-bars-2.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-error-bars-3.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-error-bars-4.png" width="153.6" /></p>
<p>Create simple error plots:</p>
<pre class="r"><code># Vertical line with point in the middle
f + geom_pointrange()

# Standard error bars
f + geom_errorbar(width = 0.2) +
  geom_point(size = 1.5)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-error-bars-1.png" width="240" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-error-bars-2.png" width="240" /></p>
<p>Create horizontal error bars. Put <code>dose</code> on y axis and <code>len</code> on x-axis. Specify <code>xmin</code> and <code>xmax</code>.</p>
<pre class="r"><code># Horizontal error bars with mean points
# Change the color by groups
ggplot(
  df.summary, 
  aes(x = len, y = dose, xmin = len-sd, xmax = len+sd)
  ) +
  geom_point(aes(color = dose)) +
  geom_errorbarh(aes(color = dose), height=.2)+
  theme_light()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2_errorbarh-horizontal-error-bars-1.png" width="336" /></p>
<ul>
<li>Add jitter points (representing individual points), dot plots and violin plots. For this, you should initialize ggplot with original data (<code>df</code>) and specify the <code>df.summary</code> data in the error plot function, here <code>geom_pointrange()</code>.</li>
</ul>
<pre class="r"><code># Combine with jitter points
ggplot(df, aes(dose, len)) +
  geom_jitter(
    position = position_jitter(0.2), color = "darkgray"
    ) + 
  geom_pointrange(
    aes(ymin = len-sd, ymax = len+sd),
    data = df.summary
    )

# Combine with violin plots
ggplot(df, aes(dose, len)) +
  geom_violin(color = "darkgray", trim = FALSE) + 
  geom_pointrange(
    aes(ymin = len-sd, ymax = len+sd),
    data = df.summary
    )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-error-bars-with-jitter-points-dot-plots-violin-plots-1.png" width="288" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-error-bars-with-jitter-points-dot-plots-violin-plots-2.png" width="288" /></p>
<ul>
<li>Create basic bar/line plots of mean +/- error. So we need only the <code>df.summary</code> data.  
<ul>
<li><ol style="list-style-type: decimal">
<li>Add lower and upper error bars for the line plot: <code>ymin = len-sd</code> and <code>ymax = len+sd</code>.</li>
</ol></li>
<li><ol start="2" style="list-style-type: decimal">
<li>Add only upper error bars for the bar plot: <code>ymin = len</code> (instead of <code>len-sd</code>) and <code>ymax = len+sd</code>.</li>
</ol></li>
</ul></li>
</ul>
<div class="warning">
<p>
Note that, for line plot, you should always specify <code>group = 1</code> in the <code>aes()</code>, when you have one group of line.
</p>
</div>
<pre class="r"><code># (1) Line plot
ggplot(df.summary, aes(dose, len)) +
  geom_line(aes(group = 1)) +
  geom_errorbar( aes(ymin = len-sd, ymax = len+sd),width = 0.2) +
  geom_point(size = 2)

# (2) Bar plot
ggplot(df.summary, aes(dose, len)) +
  geom_bar(stat = "identity", fill = "lightgray", 
           color = "black") +
  geom_errorbar(aes(ymin = len, ymax = len+sd), width = 0.2) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-bar-line-plot-of-means-1.png" width="288" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-bar-line-plot-of-means-2.png" width="288" /></p>
<p>For line plot, you might want to treat x-axis as numeric: </p>
<pre class="r"><code>df.sum2 <- df.summary
df.sum2$dose <- as.numeric(df.sum2$dose)
ggplot(df.sum2, aes(dose, len)) +
  geom_line() +
  geom_errorbar( aes(ymin = len-sd, ymax = len+sd),width = 0.2) +
  geom_point(size = 2)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-line-plot-with-numeric-x-axis-1.png" width="316.8" /></p>
<ul>
<li>Bar and line plots + jitter points. We need the original <code>df</code> data for the jitter points and the <code>df.summary</code> data for the other <code>geom</code> layers.
<ul>
<li><ol style="list-style-type: decimal">
<li>For the line plot: First, add jitter points, then add lines + error bars + mean points on top of the jitter points.</li>
</ol></li>
<li><ol start="2" style="list-style-type: decimal">
<li>For the bar plot: First, add the bar plot, then add jitter points + error bars on top of the bars.</li>
</ol></li>
</ul></li>
</ul>
<pre class="r"><code># (1) Create a line plot of means + 
# individual jitter points + error bars 
ggplot(df, aes(dose, len)) +
  geom_jitter( position = position_jitter(0.2),
               color = "darkgray") + 
  geom_line(aes(group = 1), data = df.summary) +
  geom_errorbar(
    aes(ymin = len-sd, ymax = len+sd),
    data = df.summary, width = 0.2) +
  geom_point(data = df.summary, size = 2)

# (2) Bar plots of means + individual jitter points + errors
ggplot(df, aes(dose, len)) +
  geom_bar(stat = "identity", data = df.summary,
           fill = NA, color = "black") +
  geom_jitter( position = position_jitter(0.2),
               color = "black") + 
  geom_errorbar(
    aes(ymin = len-sd, ymax = len+sd),
    data = df.summary, width = 0.2) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-line-plot-with-error-bars-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-line-plot-with-error-bars-2.png" width="316.8" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Mean/median plots for multiple groups</strong>. Case of one continuous variable (<code>len</code>) and two grouping variables (<code>dose</code>, <code>supp</code>).</li>
</ol>
<ul>
<li>Compute the summary statistics of <code>len</code> grouped by <code>dose</code> and <code>supp</code>:</li>
</ul>
<pre class="r"><code>library(dplyr)
df.summary2 <- df %>%
  group_by(dose, supp) %>%
  summarise(
    sd = sd(len),
    len = mean(len)
  )
df.summary2</code></pre>
<pre><code>## # A tibble: 6 x 4
## # Groups:   dose [?]
##     dose   supp    sd   len
##   <fctr> <fctr> <dbl> <dbl>
## 1    0.5     OJ  4.46 13.23
## 2    0.5     VC  2.75  7.98
## 3      1     OJ  3.91 22.70
## 4      1     VC  2.52 16.77
## 5      2     OJ  2.66 26.06
## 6      2     VC  4.80 26.14</code></pre>
<ul>
<li>Create error plots for multiple groups:
<ul>
<li><ol style="list-style-type: decimal">
<li>pointrange colored by groups (supp)</li>
</ol></li>
<li><ol start="2" style="list-style-type: decimal">
<li>standard error bars + mean points colored by groups (supp)</li>
</ol></li>
</ul></li>
</ul>
<pre class="r"><code># (1) Pointrange: Vertical line with point in the middle
ggplot(df.summary2, aes(dose, len)) +
  geom_pointrange(
    aes(ymin = len-sd, ymax = len+sd, color = supp),
    position = position_dodge(0.3)
    )+
  scale_color_manual(values = c("#00AFBB", "#E7B800"))


# (2) Standard error bars
ggplot(df.summary2, aes(dose, len)) +
  geom_errorbar(
    aes(ymin = len-sd, ymax = len+sd, color = supp),
    position = position_dodge(0.3), width = 0.2
    )+
  geom_point(aes(color = supp), position = position_dodge(0.3)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-error-plot-for-multiple-groups-geom_pointrange-and-geom_errorbar-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-error-plot-for-multiple-groups-geom_pointrange-and-geom_errorbar-2.png" width="316.8" /></p>
<ul>
<li>Create simple line/bar plots for multiple groups.
<ul>
<li><ol style="list-style-type: decimal">
<li>Line plots: change linetype by groups (<code>supp</code>)</li>
</ol></li>
<li><ol start="2" style="list-style-type: decimal">
<li>Bar plots: change fill color by groups (<code>supp</code>)</li>
</ol></li>
</ul></li>
</ul>
<pre class="r"><code># (1) Line plot + error bars
ggplot(df.summary2, aes(dose, len)) +
  geom_line(aes(linetype = supp, group = supp))+
  geom_point()+
  geom_errorbar(
    aes(ymin = len-sd, ymax = len+sd, group = supp),
     width = 0.2
    )

# (2) Bar plots + upper error bars.
ggplot(df.summary2, aes(dose, len)) +
  geom_bar(aes(fill = supp), stat = "identity",
           position = position_dodge(0.8), width = 0.7)+
  geom_errorbar(
    aes(ymin = len, ymax = len+sd, group = supp),
    width = 0.2, position = position_dodge(0.8)
    )+
  scale_fill_manual(values = c("grey80", "grey30"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-plots-for-multiple-groups-with-error-bars-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-plots-for-multiple-groups-with-error-bars-2.png" width="316.8" /></p>
<ul>
<li>Create easily plots of mean +/- sd for multiple groups. Use the ggpubr package, which will automatically calculate the summary statistics and create the graphs.</li>
</ul>
<pre class="r"><code>library(ggpubr)
# Create line plots of means
ggline(ToothGrowth, x = "dose", y = "len", 
       add = c("mean_sd", "jitter"),
       color = "supp", palette = c("#00AFBB", "#E7B800"))

# Create bar plots of means
ggbarplot(ToothGrowth, x = "dose", y = "len", 
          add = c("mean_se", "jitter"),
          color = "supp", palette = c("#00AFBB", "#E7B800"),
          position = position_dodge(0.8))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-mean-median-plots-for-grouped-data-with-error-bars-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-mean-median-plots-for-grouped-data-with-error-bars-2.png" width="316.8" /></p>
<ul>
<li>Use the standard ggplot2 verbs, to reproduce the line plots above:</li>
</ul>
<pre class="r"><code># Create line plots
ggplot(df, aes(dose, len)) +
  geom_jitter(
    aes(color = supp),
    position = position_jitter(0.2)
    ) + 
  geom_line(
    aes(group = supp, color = supp),
    data = df.summary2
    ) +
  geom_errorbar(
    aes(ymin = len-sd, ymax = len+sd, color = supp),
    data = df.summary2, width = 0.2
    )+
  scale_color_manual(values = c("#00AFBB", "#E7B800"))</code></pre>
</div>
<div id="add-p-values-and-significance-levels" class="section level3">
<h3>Add p-values and significance levels</h3>
<p>In this section, we’ll describe how to easily i) compare means of two or multiple groups; ii) and to automatically add p-values and significance levels to a ggplot (such as box plots, dot plots, bar plots and line plots, …). </p>
<p>Key functions:</p>
<ul>
<li><code>compare_means()</code> [ggpubr package]: easy to use solution to performs one and multiple mean comparisons.</li>
<li><code>stat_compare_means()</code> [ggpubr package]: easy to use solution to automatically add p-values and significance levels to a ggplot.</li>
</ul>
<p>The most common <a href="https://www.sthda.com/english/wiki/comparing-means-in-r">methods for comparing means</a> include:</p>
<table>
<thead>
<tr class="header">
<th>Methods</th>
<th>R function</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>T-test</td>
<td>t.test()</td>
<td>Compare two groups (parametric)</td>
</tr>
<tr class="even">
<td>Wilcoxon test</td>
<td>wilcox.test()</td>
<td>Compare two groups (non-parametric)</td>
</tr>
<tr class="odd">
<td>ANOVA</td>
<td>aov() or anova()</td>
<td>Compare multiple groups (parametric)</td>
</tr>
<tr class="even">
<td>Kruskal-Wallis</td>
<td>kruskal.test()</td>
<td>Compare multiple groups (non-parametric)</td>
</tr>
</tbody>
</table>
<ol style="list-style-type: decimal">
<li><strong>Compare two independent groups</strong>:</li>
</ol>
<ul>
<li>Compute t-test:</li>
</ul>
<pre class="r"><code>library(ggpubr)
compare_means(len ~ supp, data = ToothGrowth,
              method = "t.test")</code></pre>
<pre><code>## # A tibble: 1 x 8
##     .y. group1 group2      p  p.adj p.format p.signif method
##   <chr>  <chr>  <chr>  <dbl>  <dbl>    <chr>    <chr>  <chr>
## 1   len     OJ     VC 0.0606 0.0606    0.061       ns T-test</code></pre>
<ul>
<li>Create a box plot with p-values. Use the option <code>method = "t.test"</code> or <code>method = "wilcox.test"</code>. Default is wilcoxon test.</li>
</ul>
<pre class="r"><code># Create a simple box plot and add p-values
p <- ggplot(ToothGrowth, aes(supp, len)) +
  geom_boxplot(aes(color = supp)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800"))
p + stat_compare_means(method = "t.test")

# Display the significance level instead of the p-value
# Adjust label position
p + stat_compare_means(
  aes(label = ..p.signif..), label.x = 1.5, label.y = 40
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-box-plot-with-p-values-compare-means-two-independent-groups-1.png" width="307.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-box-plot-with-p-values-compare-means-two-independent-groups-2.png" width="307.2" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Compare two paired samples</strong>. Use <code>ggpaired()</code> [ggpubr] to create the paired box plot.</li>
</ol>
<pre class="r"><code>ggpaired(ToothGrowth, x = "supp", y = "len",
         color = "supp", line.color = "gray", line.size = 0.4,
         palette = "jco")+
  stat_compare_means(paired = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-box-plot-with-p-value-compare-means-paired-tests-1.png" width="307.2" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Compare more than two groups</strong>. If the grouping variable contains more than two levels, then pairwise tests will be performed automatically. The default method is “wilcox.test”. You can change this to “t.test”.</li>
</ol>
<pre class="r"><code># Perorm pairwise comparisons
compare_means(len ~ dose,  data = ToothGrowth)</code></pre>
<pre><code>## # A tibble: 3 x 8
##     .y. group1 group2        p    p.adj p.format p.signif   method
##   <chr>  <chr>  <chr>    <dbl>    <dbl>    <chr>    <chr>    <chr>
## 1   len    0.5      1 7.02e-06 1.40e-05  7.0e-06     **** Wilcoxon
## 2   len    0.5      2 8.41e-08 2.52e-07  8.4e-08     **** Wilcoxon
## 3   len      1      2 1.77e-04 1.77e-04  0.00018      *** Wilcoxon</code></pre>
<pre class="r"><code># Visualize: Specify the comparisons you want
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
ggboxplot(ToothGrowth, x = "dose", y = "len",
          color = "dose", palette = "jco")+ 
  stat_compare_means(comparisons = my_comparisons)+ 
  stat_compare_means(label.y = 50)                   </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-box-plot-with-p-values-pairwise-comparisons-1.png" width="336" /></p>
<ol start="4" style="list-style-type: decimal">
<li><strong>Multiple grouping variables</strong>:</li>
</ol>
<ul>
<li>(1/2). Create a multi-panel box plots facetted by group (here, “dose”):</li>
</ul>
<pre class="r"><code># Use only p.format as label. Remove method name.
ggplot(ToothGrowth, aes(supp, len)) +
  geom_boxplot(aes(color = supp))+
  facet_wrap(~dose) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  stat_compare_means(label = "p.format")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-facet-1.png" width="624" /></p>
<ul>
<li>(2/2). Create one single panel with all box plots. Plot y = “len” by x = “dose” and color by “supp”. Specify the option <code>group</code> in <code>stat_compare_means()</code>:</li>
</ul>
<pre class="r"><code>ggplot(ToothGrowth, aes(dose, len)) +
  geom_boxplot(aes(color = supp))+
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  stat_compare_means(aes(group = supp), label = "p.signif")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-compare-means-interaction-1.png" width="528" /></p>
<ul>
<li>Paired comparisons for multiple groups:</li>
</ul>
<pre class="r"><code># Box plot facetted by "dose"
p <- ggpaired(ToothGrowth, x = "supp", y = "len",
          color = "supp", palette = "jco", 
          line.color = "gray", line.size = 0.4,
          facet.by = "dose", short.panel.labs = FALSE)
# Use only p.format as label. Remove method name.
p + stat_compare_means(label = "p.format", paired = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-facet-paired-comparisons-1.png" width="624" /></p>
<p>Read more at: <a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/">Add P-values and Significance Levels to ggplots</a></p>
</div>
</div>
<div id="conclusion" class="section level2">
<h2>Conclusion</h2>
<ol style="list-style-type: decimal">
<li><strong>Visualize the distribution of a grouped continuous variable</strong>: the grouping variable on x-axis and the continuous variable on y axis.</li>
</ol>
<p>The possible ggplot2 layers include:</p>
<ul>
<li><code>geom_boxplot()</code> for box plot</li>
<li><code>geom_violin()</code> for violin plot</li>
<li><code>geom_dotplot()</code> for dot plot</li>
<li><code>geom_jitter()</code> for stripchart</li>
<li><code>geom_line()</code> for line plot</li>
<li><code>geom_bar()</code> for bar plot</li>
</ul>
<p>Examples of R code: start by creating a plot, named <code>e</code>, and then finish it by adding a layer:</p>
<pre class="r"><code>ToothGrowth$dose <- as.factor(ToothGrowth$dose)
e <- ggplot(ToothGrowth, aes(x = dose, y = len))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-visualize-grouped-data-discrete-x-continuous-y-1.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-visualize-grouped-data-discrete-x-continuous-y-2.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-visualize-grouped-data-discrete-x-continuous-y-3.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-visualize-grouped-data-discrete-x-continuous-y-4.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-visualize-grouped-data-discrete-x-continuous-y-5.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-visualize-grouped-data-discrete-x-continuous-y-6.png" width="153.6" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Create mean and median plots with error bars</strong>: the grouping variable on x-axis and the summarized continuous variable (mean/median) on y-axis.</li>
</ol>
<ul>
<li>Compute summary statistics and initialize ggplot with summary data:</li>
</ul>
<pre class="r"><code># Summary statistics
library(dplyr)
df.summary <- ToothGrowth %>%
  group_by(dose) %>%
  summarise(
    sd = sd(len, na.rm = TRUE),
    len = mean(len)
  )
# Initialize ggplot with data
f <- ggplot(
  df.summary, 
  aes(x = dose, y = len, ymin = len-sd, ymax = len+sd)
  )</code></pre>
<ul>
<li>Possible error plots:</li>
</ul>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-error-plots-1.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-error-plots-2.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-error-plots-3.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/006-plot-grouped-data-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-error-plots-4.png" width="153.6" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Combine error bars with violin plots, dot plots, line and bar plots</strong>:</li>
</ol>
<pre class="r"><code># Combine with violin plots
ggplot(ToothGrowth, aes(dose, len))+
  geom_violin(trim = FALSE) +
  geom_pointrange(aes(ymin = len-sd, ymax = len + sd),
                  data = df.summary)

# Combine with dot plots
ggplot(ToothGrowth, aes(dose, len))+
  geom_dotplot(stackdir = "center", binaxis = "y",
               fill = "lightgray", dotsize = 1) +
  geom_pointrange(aes(ymin = len-sd, ymax = len + sd),
                  data = df.summary)

# Combine with line plot
ggplot(df.summary, aes(dose, len))+
  geom_line(aes(group = 1)) +
  geom_pointrange(aes(ymin = len-sd, ymax = len + sd))

# Combine with bar plots
ggplot(df.summary, aes(dose, len))+
  geom_bar(stat = "identity", fill = "lightgray") +
  geom_pointrange(aes(ymin = len-sd, ymax = len + sd))</code></pre>
</div>
<div id="see-also" class="section level2">
<h2>See also</h2>
<ul>
<li>ggpubr: Publication Ready Plots. <a href="https://goo.gl/7uySha" class="uri">https://goo.gl/7uySha</a></li>
<li>Facilitating Exploratory Data Visualization: Application to TCGA Genomic Data. <a href="https://goo.gl/9LNsRi" class="uri">https://goo.gl/9LNsRi</a></li>
<li>Add P-values and Significance Levels to ggplots. <a href="https://goo.gl/VH7Yq7" class="uri">https://goo.gl/VH7Yq7</a></li>
<li>Plot Means/Medians and Error Bars. <a href="https://goo.gl/zRwAeV" class="uri">https://goo.gl/zRwAeV</a></li>
</ul>
</div>
<div id="references" class="section level2 unnumbered">
<h2>References</h2>
<div id="refs" class="references">
<div id="ref-sidiropoulos2015">
<p>Sidiropoulos, Nikos, Sina Hadi Sohi, Nicolas Rapin, and Frederik Otzen Bagger. 2015. “SinaPlot: An Enhanced Chart for Simple and Truthful Representation of Single Observations over Multiple Classes.” <em>bioRxiv</em>. Cold Spring Harbor Laboratory. doi:<a href="https://doi.org/10.1101/028191">10.1101/028191</a>.</p>
</div>
</div>
</div>


</div><!--end rdoc-->



<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 18:16:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Plot Two Continuous Variables: Scatter Graph and Alternatives]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/131-plot-two-continuous-variables-scatter-graph-and-alternatives/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/131-plot-two-continuous-variables-scatter-graph-and-alternatives/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">

<p><strong>Scatter plots</strong> are used to display the relationship between two continuous variables x and y. In this article, we’ll start by showing how to create beautiful scatter plots in R. </p>
<p>We’ll use helper functions in the <a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/">ggpubr R package</a> to display automatically the <strong>correlation coefficient</strong> and the <strong>significance level</strong> on the plot.</p>
<p>We’ll also describe how to color points by groups and to add concentration ellipses around each group. Additionally, we’ll show how to create <strong>bubble charts</strong>, as well as, how to add <strong>marginal plots</strong> (histogram, density or box plot) to a scatter plot.</p>
<p>We continue by showing show some alternatives to the standard scatter plots, including rectangular binning, hexagonal binning and 2d density estimation. These plot types are useful in a situation where you have a large data set containing thousands of records.</p>
<p><strong>R codes for zooming</strong>, in a scatter plot, are also provided. Finally, you’ll learn how to add fitted <strong>regression trend lines</strong> and <strong>equations</strong> to a scatter graph.</p>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#basic-scatter-plots">Basic scatter plots</a></li>
<li><a href="#multiple-groups">Multiple groups</a></li>
<li><a href="#add-point-text-labels">Add point text labels</a></li>
<li><a href="#bubble-chart">Bubble chart</a></li>
<li><a href="#color-by-a-continuous-variable">Color by a continuous variable</a></li>
<li><a href="#add-marginal-density-plots">Add marginal density plots</a></li>
<li><a href="#continuous-bivariate-distribution">Continuous bivariate distribution</a></li>
<li><a href="#zoom-in-a-scatter-plot">Zoom in a scatter plot</a></li>
<li><a href="#add-trend-lines-and-equations">Add trend lines and equations</a></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#see-also">See also</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="prerequisites" class="section level2">
<h2>Prerequisites</h2>
<ol style="list-style-type: decimal">
<li><strong>Install cowplot package</strong>. Used to arrange multiple plots. Will be used here to create a scatter plot with marginal density plots. Install the latest developmental version as follow:</li>
</ol>
<pre class="r"><code>devtools::install_github("wilkelab/cowplot")</code></pre>
<ol start="2" style="list-style-type: decimal">
<li><strong>Install ggpmisc</strong> for adding the equation of a fitted regression line on a scatter plot:</li>
</ol>
<pre class="r"><code>install.packages("ggpmisc")</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><strong>Load required packages and set ggplot themes</strong>:</li>
</ol>
<ul>
<li>Load ggplot2 and ggpubr R packages</li>
<li>Set the default theme to <code>theme_minimal()</code> [in ggplot2]</li>
</ul>
<pre class="r"><code>library(ggplot2)
library(ggpubr)
theme_set(
  theme_minimal() +
    theme(legend.position = "top")
  )</code></pre>
<ol start="4" style="list-style-type: decimal">
<li><strong>Prepare demo data sets</strong>:</li>
</ol>
<p>Dataset: <a href="https://www.sthda.com/english/wiki/r-built-in-data-sets#mtcars-motor-trend-car-road-tests">mtcars</a>. The variable <code>cyl</code> is used as grouping variable.</p>
<pre class="r"><code># Load data
data("mtcars")
df <- mtcars

# Convert cyl as a grouping variable
df$cyl <- as.factor(df$cyl)

# Inspect the data
head(df[, c("wt", "mpg", "cyl", "qsec")], 4)</code></pre>
<pre><code>##                  wt  mpg cyl qsec
## Mazda RX4      2.62 21.0   6 16.5
## Mazda RX4 Wag  2.88 21.0   6 17.0
## Datsun 710     2.32 22.8   4 18.6
## Hornet 4 Drive 3.21 21.4   6 19.4</code></pre>
</div>
<div id="basic-scatter-plots" class="section level2">
<h2>Basic scatter plots</h2>
<p>Key functions:</p>
<ul>
<li><code>geom_point()</code>: Create scatter plots. Key arguments: <code>color</code>, <code>size</code> and <code>shape</code> to change point color, size and shape.</li>
<li><code>geom_smooth()</code>: Add smoothed conditional means / regression line. Key arguments:
<ul>
<li><code>color</code>, <code>size</code> and <code>linetype</code>: Change the line color, size and type.</li>
<li><code>fill</code>: Change the fill color of the confidence region.</li>
</ul></li>
</ul>
<pre class="r"><code>b <- ggplot(df, aes(x = wt, y = mpg))

# Scatter plot with regression line
b + geom_point()+
  geom_smooth(method = "lm") 
     
# Add a loess smoothed fit curve
b + geom_point()+
  geom_smooth(method = "loess") </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-basic-scatter-plots-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-basic-scatter-plots-2.png" width="316.8" /></p>
<div class="warning">
<p>
To remove the confidence region around the regression line, specify the argument <code>se = FALSE</code> in the function <code>geom_smooth()</code>.
</p>
</div>
<p>Change the point shape, by specifying the argument <code>shape</code>, for example:</p>
<pre class="r"><code>b + geom_point(shape = 18)</code></pre>
<p>To see the different point shapes commonly used in R, type this:</p>
<pre class="r"><code>ggpubr::show_point_shapes()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-point-shapes-1.png" width="259.2" /></p>
<p>Create easily a scatter plot using <code>ggscatter()</code> [in ggpubr]. Use <code>stat_cor()</code> [ggpubr] to add the correlation coefficient and the significance level.</p>
<pre class="r"><code># Add regression line and confidence interval
# Add correlation coefficient: stat_cor()
ggscatter(df, x = "wt", y = "mpg",
          add = "reg.line", conf.int = TRUE,    
          add.params = list(fill = "lightgray"),
          ggtheme = theme_minimal()
          )+
  stat_cor(method = "pearson", 
           label.x = 3, label.y = 30) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-basic-scatter-plot-1.png" width="480" /></p>
</div>
<div id="multiple-groups" class="section level2">
<h2>Multiple groups</h2>
<ul>
<li>Change point colors and shapes by groups.</li>
<li>Add marginal rug: <code>geom_rug()</code>.</li>
</ul>
<pre class="r"><code># Change color and shape by groups (cyl)
b + geom_point(aes(color = cyl, shape = cyl))+
  geom_smooth(aes(color = cyl, fill = cyl), method = "lm") +
  geom_rug(aes(color =cyl)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))

# Remove confidence region (se = FALSE)
# Extend the regression lines: fullrange = TRUE
b + geom_point(aes(color = cyl, shape = cyl)) +
  geom_rug(aes(color =cyl)) +
  geom_smooth(aes(color = cyl), method = lm, 
              se = FALSE, fullrange = TRUE)+
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  ggpubr::stat_cor(aes(color = cyl), label.x = 3)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-with-multiple-groups-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-with-multiple-groups-2.png" width="316.8" /></p>
<ul>
<li>Split the plot into multiple panels. Use the function <code>facet_wrap()</code>:</li>
</ul>
<pre class="r"><code>b + geom_point(aes(color = cyl, shape = cyl))+
  geom_smooth(aes(color = cyl, fill = cyl), 
              method = "lm", fullrange = TRUE) +
  facet_wrap(~cyl) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
  theme_bw()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-facet-multi-panel-1.png" width="624" /></p>
<ul>
<li>Add concentration ellipse around groups. R function <code>stat_ellipse()</code>. Key arguments:
<ul>
<li><code>type</code>: The type of ellipse. The default “t” assumes a multivariate t-distribution, and “norm” assumes a multivariate normal distribution. “euclid” draws a circle with the radius equal to level, representing the euclidean distance from the center.</li>
<li><code>level</code>: The confidence level at which to draw an ellipse (default is 0.95), or, if type=“euclid”, the radius of the circle to be drawn.</li>
</ul></li>
</ul>
<pre class="r"><code>b + geom_point(aes(color = cyl, shape = cyl))+
  stat_ellipse(aes(color = cyl), type = "t")+
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-with-concentration-ellipses-1.png" width="384" /></p>
<p>Instead of drawing the concentration ellipse, you can: i) plot a convex hull of a set of points; ii) add the mean points and the confidence ellipse of each group. Key R functions: <code>stat_chull()</code>, <code>stat_conf_ellipse()</code> and <code>stat_mean()</code> [in ggpubr]:</p>
<pre class="r"><code># Convex hull of groups
b + geom_point(aes(color = cyl, shape = cyl)) +
  stat_chull(aes(color = cyl, fill = cyl), 
             alpha = 0.1, geom = "polygon") +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) 

# Add mean points and confidence ellipses
b + geom_point(aes(color = cyl, shape = cyl)) +
  stat_conf_ellipse(aes(color = cyl, fill = cyl), 
                    alpha = 0.1, geom = "polygon") +
  stat_mean(aes(color = cyl, shape = cyl), size = 2) + 
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-with-group-convex-hull-and-confidence-ellipses-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-with-group-convex-hull-and-confidence-ellipses-2.png" width="316.8" /></p>
<ul>
<li>Easy alternative using <code>ggpubr</code>. See this article: <a href="https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/79-plot-meansmedians-and-error-bars/">Perfect Scatter Plots with Correlation and Marginal Histograms</a></li>
</ul>
<pre class="r"><code># Add group mean points and stars
ggscatter(df, x = "wt", y = "mpg",
          color = "cyl", palette = "npg",
          shape = "cyl", ellipse = TRUE, 
          mean.point = TRUE, star.plot = TRUE,
          ggtheme = theme_minimal())

# Change the ellipse type to &amp;#39;convex&amp;#39;
ggscatter(df, x = "wt", y = "mpg",
          color = "cyl", palette = "npg",
          shape = "cyl",
          ellipse = TRUE, ellipse.type = "convex",
          ggtheme = theme_minimal())</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-concentration-ellipses-1.png" width="307.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-concentration-ellipses-2.png" width="307.2" /></p>
</div>
<div id="add-point-text-labels" class="section level2">
<h2>Add point text labels</h2>

<p>Key functions:</p>
<ul>
<li><code>geom_text()</code> and <code>geom_label()</code>: ggplot2 standard functions to add text to a plot.</li>
<li><code>geom_text_repel()</code> and <code>geom_label_repel()</code> [in ggrepel package]. Repulsive textual annotations. Avoid text overlapping.</li>
</ul>
<p>First install <code>ggrepel</code> (<code>ìnstall.packages("ggrepel")</code>), then type this:</p>
<pre class="r"><code>library(ggrepel)

# Add text to the plot
.labs <- rownames(df)
b + geom_point(aes(color = cyl)) +
  geom_text_repel(aes(label = .labs,  color = cyl), size = 3)+
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-with-point-text-1.png" width="576" /></p>
<pre class="r"><code># Draw a rectangle underneath the text, making it easier to read.
b + geom_point(aes(color = cyl)) +
  geom_label_repel(aes(label = .labs,  color = cyl), size = 3)+
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-with-point-text-2.png" width="576" /></p>
</div>
<div id="bubble-chart" class="section level2">
<h2>Bubble chart</h2>
<p>In a bubble chart, points <code>size</code> is controlled by a continuous variable, here <code>qsec</code>. In the R code below, the argument alpha is used to control color transparency. alpha should be between 0 and 1. </p>
<pre class="r"><code>b + geom_point(aes(color = cyl, size = qsec), alpha = 0.5) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
  scale_size(range = c(0.5, 12))  # Adjust the range of points size</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-bubble-chart-1.png" width="528" /></p>
</div>
<div id="color-by-a-continuous-variable" class="section level2">
<h2>Color by a continuous variable</h2>
<ul>
<li>Color points according to the values of the continuous variable: “mpg”.</li>
<li>Change the default blue gradient color using the function <code>scale_color_gradientn()</code> [in ggplot2], by specifying two or more colors.</li>
</ul>
<pre class="r"><code>b + geom_point(aes(color = mpg), size = 3) +
  scale_color_gradientn(colors = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-color-by-continuous-variable-1.png" width="384" /></p>
</div>
<div id="add-marginal-density-plots" class="section level2">
<h2>Add marginal density plots</h2>
<p>The function <code>ggMarginal()</code> [in ggExtra package] <span class="citation">(Attali 2017)</span>, can be used to easily add a marginal histogram, density or box plot to a scatter plot. </p>
<p>First, install the ggExtra package as follow: <code>install.packages("ggExtra")</code>; then type the following R code:</p>
<pre class="r"><code># Create a scatter plot
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_point(aes(color = Species), size = 3, alpha = 0.6) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))

# Add density distribution as marginal plot
library("ggExtra")
ggMarginal(p, type = "density")

# Change marginal plot type
ggMarginal(p, type = "boxplot")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggextra-marginal-plot-1.png" width="672" /></p>
<p>One limitation of ggExtra is that it can’t cope with multiple groups in the scatter plot and the marginal plots.</p>
<p>A solution is provided in the function <code>ggscatterhist()</code> [ggpubr]:</p>
<pre class="r"><code>library(ggpubr)
# Grouped Scatter plot with marginal density plots
ggscatterhist(
  iris, x = "Sepal.Length", y = "Sepal.Width",
  color = "Species", size = 3, alpha = 0.6,
  palette = c("#00AFBB", "#E7B800", "#FC4E07"),
  margin.params = list(fill = "Species", color = "black", size = 0.2)
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-marginal-distributions-1.png" width="384" /></p>
<pre class="r"><code># Use box plot as marginal plots
ggscatterhist(
  iris, x = "Sepal.Length", y = "Sepal.Width",
  color = "Species", size = 3, alpha = 0.6,
  palette = c("#00AFBB", "#E7B800", "#FC4E07"),
  margin.plot = "boxplot",
  ggtheme = theme_bw()
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggpubr-marginal-distributions-2.png" width="384" /></p>
</div>
<div id="continuous-bivariate-distribution" class="section level2">
<h2>Continuous bivariate distribution</h2>
<p>In this section, we’ll present some alternatives to the standard scatter plots.  These include:</p>
<ul>
<li>Rectangular binning. Rectangular heatmap of 2d bin counts</li>
<li>Hexagonal binning: Hexagonal heatmap of 2d bin counts.</li>
<li>2d density estimation</li>
</ul>
<ol style="list-style-type: decimal">
<li><strong>Rectangular binning</strong>:</li>
</ol>
<p>Rectangular binning is a very useful alternative to the standard scatter plot in a situation where you have a large data set containing thousands of records.</p>
<p>Rectangular binning helps to handle overplotting. Rather than plotting each point, which would appear highly dense, it divides the plane into rectangles, counts the number of cases in each rectangle, and then plots a heatmap of 2d bin counts. In this plot, many small hexagon are drawn with a color intensity corresponding to the number of cases in that bin.</p>
<p>Key function: <code>geom_bin2d()</code>: Creates a heatmap of 2d bin counts. Key arguments: <code>bins</code>, numeric vector giving number of bins in both vertical and horizontal directions. Set to 30 by default.</p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Hexagonal binning</strong>: Similar to rectangular binning, but divides the plane into regular hexagons. Hexagon bins avoid the visual artefacts sometimes generated by the very regular alignment of `geom_bin2d().</li>
</ol>
<p>Key function: <code>geom_hex()</code></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Contours of a 2d density estimate</strong>. Perform a 2D kernel density estimation and display results as contours overlaid on the scatter plot. This can be also useful for dealing with overplotting.</li>
</ol>
<p>Key function: <code>geom_density_2d()</code></p>
<ul>
<li><strong>Create a scatter plot with rectangular and hexagonal binning</strong>:</li>
</ul>
<pre class="r"><code># Rectangular binning
ggplot(diamonds, aes(carat, price)) +
  geom_bin2d(bins = 20, color ="white")+
  scale_fill_gradient(low =  "#00AFBB", high = "#FC4E07")+
  theme_minimal()

# Hexagonal binning
ggplot(diamonds, aes(carat, price)) +
  geom_hex(bins = 20, color = "white")+
  scale_fill_gradient(low =  "#00AFBB", high = "#FC4E07")+
  theme_minimal()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_bin2d-heatmap-of-2d-bin-counts-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_bin2d-heatmap-of-2d-bin-counts-2.png" width="316.8" /></p>
<ul>
<li><strong>Create a scatter plot with 2d density estimation</strong>:</li>
</ul>
<pre class="r"><code># Add 2d density estimation
sp <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_point(color = "lightgray")
sp + geom_density_2d()
    

# Use different geometry and change the gradient color
sp + stat_density_2d(aes(fill = ..level..), geom = "polygon") +
  scale_fill_gradientn(colors = c("#FFEDA0", "#FEB24C", "#F03B20"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_density_contours-of-a-2d-density-estimate-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_density_contours-of-a-2d-density-estimate-2.png" width="316.8" /></p>
</div>
<div id="zoom-in-a-scatter-plot" class="section level2">
<h2>Zoom in a scatter plot</h2>

<ul>
<li>Key function: <code>facet_zomm()</code> [in ggforce] <span class="citation">(Pedersen 2016)</span>.</li>
<li>Demo data set: <code>iris</code>. The R code below zoom the points where <code>Species == "versicolor"</code>.</li>
</ul>
<pre class="r"><code>library(ggforce)
ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
  geom_point() +
  ggpubr::color_palette("jco") + 
  facet_zoom(x = Species == "versicolor")+
  theme_bw()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggforce-zoom-in-a-ggplot-1.png" width="499.2" /></p>
<p>To zoom the points, where <code>Petal.Length < 2.5</code>, type this:</p>
<pre class="r"><code>ggplot(iris, aes(Petal.Length, Petal.Width, colour = Species)) +
  geom_point() +
  ggpubr::color_palette("jco") + 
  facet_zoom(x = Petal.Length < 2.5)+
  theme_bw()</code></pre>
</div>
<div id="add-trend-lines-and-equations" class="section level2">
<h2>Add trend lines and equations</h2>
<p>In this section, we’ll describe how to add trend lines to a scatter plot and labels (equation, R2, BIC, AIC) for a fitted lineal model. </p>
<ol style="list-style-type: decimal">
<li><strong>Load packages and create a basic scatter plot facetted by groups</strong>:</li>
</ol>
<pre class="r"><code># Load packages and set theme
library(ggpubr)
library(ggpmisc)

theme_set(
  theme_bw() +
    theme(legend.position = "top")
  )

# Scatter plot
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_point(aes(color = Species), size = 3, alpha = 0.6) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  facet_wrap(~Species)</code></pre>
<ol start="2" style="list-style-type: decimal">
<li><strong>Add regression line, correlation coefficient and equantions of the fitted line</strong>. Key functions:
<ul>
<li><code>stat_smooth()</code> [ggplot2]</li>
<li><code>stat_cor()</code> [ggpubr]</li>
<li><code>stat_poly_eq()</code>[ggpmisc]</li>
</ul></li>
</ol>
<pre class="r"><code>formula <- y ~ x
p + 
  stat_smooth( aes(color = Species, fill = Species), method = "lm") +
  stat_cor(aes(color = Species), label.y = 4.4)+
  stat_poly_eq(
    aes(color = Species, label = ..eq.label..),
    formula = formula, label.y = 4.2, parse = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-regression-line-and-correlation-coefficient-and-equation-scatter-plot-1.png" width="672" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Fit polynomial equation</strong>:</li>
</ol>
<ul>
<li>Create some data:</li>
</ul>
<pre class="r"><code>set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x, y, group = c("A", "B"), 
                      y2 = y * c(0.5,2), block = c("a", "a", "b", "b"))</code></pre>
<ul>
<li>Fit polynomial regression line and add labels:</li>
</ul>
<pre class="r"><code># Polynomial regression. Sow equation and adjusted R2
formula <- y ~ poly(x, 3, raw = TRUE)
p <- ggplot(my.data, aes(x, y2, color = group)) +
  geom_point() +
  geom_smooth(aes(fill = group), method = "lm", formula = formula) +
  stat_poly_eq(
    aes(label =  paste(..eq.label.., ..adj.rr.label.., sep = "~~~~")),
    formula = formula, parse = TRUE
    )
ggpar(p, palette = "jco")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-polynomial-regression-line-with-equation-and-adjusted-r2-1.png" width="528" /></p>
<div class="notice">
<p>
Note that, you can also display the AIC and the BIC values using <code>..AIC.label..</code> and <code>..BIC.label..</code> in the above equation.
</p>
<p>
Other arguments (label.x, label.y) are available in the function <code>stat_poly_eq()</code> to adjust label positions.
</p>
<p>
For more examples, type this R code: <code>browseVignettes(“ggpmisc”)</code>.
</p>
</div>
</div>
<div id="conclusion" class="section level2">
<h2>Conclusion</h2>
<ol style="list-style-type: decimal">
<li>Create a basic scatter plot:</li>
</ol>
<pre class="r"><code>b <- ggplot(mtcars, aes(x = wt, y = mpg))</code></pre>
<p>Possible layers, include:</p>
<ul>
<li><code>geom_point()</code> for scatter plot</li>
<li><code>geom_smooth()</code> for adding smoothed line such as regression line</li>
<li><code>geom_rug()</code> for adding a marginal rug</li>
<li><code>geom_text()</code> for adding textual annotations</li>
</ul>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-two-continuous-variable-1.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-two-continuous-variable-2.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-two-continuous-variable-3.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-two-continuous-variable-4.png" width="153.6" /></p>
<ol start="2" style="list-style-type: decimal">
<li>Continuous bivariate distribution:</li>
</ol>
<pre class="r"><code>c <- ggplot(diamonds, aes(carat, price))</code></pre>
<p>Possible layers include:</p>
<ul>
<li><code>geom_bin2d()</code>: Rectangular binning.</li>
<li><code>geom_hex()</code>: Hexagonal binning.</li>
<li><code>geom_density_2d()</code>: Contours from a 2d density estimate</li>
</ul>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-Continuous-bivariate-distribution-1.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-Continuous-bivariate-distribution-2.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/007-plot-two-continuous-variables-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-graphics-Continuous-bivariate-distribution-3.png" width="153.6" /></p>
</div>
<div id="see-also" class="section level2">
<h2>See also</h2>
<ul>
<li>ggpubr: Publication Ready Plots. <a href="https://goo.gl/7uySha" class="uri">https://goo.gl/7uySha</a></li>
<li>Perfect Scatter Plots with Correlation and Marginal Histograms. <a href="https://goo.gl/3o4ddg" class="uri">https://goo.gl/3o4ddg</a></li>
</ul>
</div>
<div id="references" class="section level2 unnumbered">
<h2>References</h2>
<div id="refs" class="references">
<div id="ref-R-ggExtra">
<p>Attali, Dean. 2017. <em>GgExtra: Add Marginal Histograms to ’Ggplot2’, and More ’Ggplot2’ Enhancements</em>. <a href="https://github.com/daattali/ggExtra" class="uri">https://github.com/daattali/ggExtra</a>.</p>
</div>
<div id="ref-R-ggforce">
<p>Pedersen, Thomas Lin. 2016. <em>Ggforce: Accelerating ’Ggplot2’</em>. <a href="https://github.com/thomasp85/ggforce" class="uri">https://github.com/thomasp85/ggforce</a>.</p>
</div>
</div>
</div>


</div><!--end rdoc-->


<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 16:48:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Plot Multivariate Continuous Data]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/130-plot-multivariate-continuous-data/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/130-plot-multivariate-continuous-data/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">
<p>When you have a bivariate data, you can easily visualize the relationship between the two variables by plotting a simple scatter plot.</p>
<p>For a data set containing three continuous variables, you can create a <strong>3d scatter plot</strong>.</p>
<p>For a small data set with more than three variables, it’s possible to visualize the relationship between each pairs of variables by creating a <strong>scatter plot matrix</strong>. You can also compute a correlation analysis between each pairs of variables.</p>
<p>For a large multivariate data set, it is more difficult to visualize their relationships. Discovering knowledge from these data requires specific statistical techniques. <strong>Multivariate analysis</strong> (MVA) refers to a set of approaches used for analyzing a data set containing multiple variables.</p>
<p>Among these techniques, there are:</p>
<ul>
<li><a href="https://www.sthda.com/english/articles/25-cluster-analysis-in-r-practical-guide/">Cluster analysis</a> for identifying groups of observations with similar profile according to a specific criteria.</li>
<li><a href="https://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/">Principal component methods</a>, which consist of summarizing and visualizing the most important information contained in a multivariate data set.</li>
</ul>
<p>In this chapter we provide an overview of methods for visualizing multivariate data sets containing only continuous variables.</p>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#demo-data-set-and-r-package">Demo data set and R package</a></li>
<li><a href="#create-a-3d-scatter-plot">Create a 3d scatter plot</a></li>
<li><a href="#create-a-scatter-plot-matrix">Create a scatter plot matrix</a></li>
<li><a href="#correlation-analysis">Correlation analysis</a></li>
<li><a href="#principal-component-analysis">Principal component analysis</a></li>
<li><a href="#cluster-analysis">Cluster analysis</a></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#see-also">See also</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="demo-data-set-and-r-package" class="section level2">
<h2>Demo data set and R package</h2>
<pre class="r"><code>library("magrittr") # for piping %>%
head(iris, 3)</code></pre>
<pre><code>##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa</code></pre>
</div>
<div id="create-a-3d-scatter-plot" class="section level2">
<h2>Create a 3d scatter plot</h2>
<p>You can create a 3d scatter plot using the R package <strong>scatterplot3d</strong> <span class="citation">(Ligges, Maechler, and Schnackenberg 2017)</span>, which contains a function of the same name. </p>
<ul>
<li><p>Install: <code>install.packages("scatterplot3d")</code></p></li>
<li><p>Create a basic 3d scatter plot:</p></li>
</ul>
<pre class="r"><code>library(scatterplot3d)
scatterplot3d(
  iris[,1:3], pch = 19, color = "steelblue",
   grid = TRUE, box = FALSE,
   mar = c(3, 3, 0.5, 3)        
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/008-plot-multivariate-continuous-data-r-graphics-cookbook-and-examples-for-great-data-visualization-r-3d-scatter-plot-scatterplot3d-1.png" width="480" /></p>
<ul>
<li>See more examples at: <a href="https://www.sthda.com/english/wiki/3d-graphics" class="uri">https://www.sthda.com/english/wiki/3d-graphics</a></li>
</ul>
</div>
<div id="create-a-scatter-plot-matrix" class="section level2">
<h2>Create a scatter plot matrix</h2>
<p>To create a scatter plot of each possible pairs of variables, you can use the function <strong>ggpairs</strong>() [in <code>GGally</code> package, an extension of ggplot2]<span class="citation">(Schloerke et al. 2016)</span> . It produces a pairwise comparison of multivariate data.</p>
<ul>
<li><p>Install: <code>install.packages("GGally")</code></p></li>
<li>Create a simple scatter plot matrix. The plot contains the:
<ul>
<li>Scatter plot and the correlation coefficient between each pair of variables</li>
<li>Density distribution of each variable</li>
</ul></li>
</ul>
<pre class="r"><code>library(GGally)
library(ggplot2)
ggpairs(iris[,-5])+ theme_bw()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/008-plot-multivariate-continuous-data-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-matrix-ggpairs-1.png" width="547.2" /></p>
<ul>
<li>Create a scatter plot matrix by groups. The plot contains the :
<ul>
<li>Scatter plot and the correlation coefficient, between each pair of variables, colored by groups</li>
<li>Density distribution and the box plot, of each continuous variable, colored by groups</li>
</ul></li>
</ul>
<pre class="r"><code>p <- ggpairs(iris, aes(color = Species))+ theme_bw()
# Change color manually.
# Loop through each plot changing relevant scales
for(i in 1:p$nrow) {
  for(j in 1:p$ncol){
    p[i,j] <- p[i,j] + 
        scale_fill_manual(values=c("#00AFBB", "#E7B800", "#FC4E07")) +
        scale_color_manual(values=c("#00AFBB", "#E7B800", "#FC4E07"))  
  }
}
p</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/008-plot-multivariate-continuous-data-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-matrix-by-groups-ggpairs-1.png" width="672" /></p>
<p>An alternative to the function <code>ggpairs()</code> is provided by the R base plot function <code>chart.Correlation()</code> [in PerformanceAnalytics packages]. It displays the correlation coefficient and the significance levels as stars.</p>
<p>For example, type the following R code, after installing the <code>PerformanceAnalytics</code> package:</p>
<pre class="r"><code># install.packages("PerformanceAnalytics")
library("PerformanceAnalytics")
my_data <- mtcars[, c(1,3,4,5,6,7)]
chart.Correlation(my_data, histogram=TRUE, pch=19)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/008-plot-multivariate-continuous-data-r-graphics-cookbook-and-examples-for-great-data-visualization-scatter-plot-matrix-chart-Correlation-1.png" width="576" /></p>
</div>
<div id="correlation-analysis" class="section level2">
<h2>Correlation analysis</h2>
<p>Recall that, correlation analysis is used to investigate the association between two or more variables. Read more at: <a href="https://www.sthda.com/english/wiki/correlation-analyses-in-r">Correlation analyses in R</a>. </p>
<ol style="list-style-type: decimal">
<li>Compute correlation matrix between pairs of variables using the R base function <code>cor()</code></li>
<li>Visualize the output. Two possibilities:
<ul>
<li>Use the function <code>ggcorrplot()</code> [in ggcorplot package]. Extension to the ggplot2 system. See more examples at: <a href="https://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlation-matrix-using-ggplot2" class="uri">https://www.sthda.com/english/wiki/ggcorrplot-visualization-of-a-correlation-matrix-using-ggplot2</a>.</li>
<li>Use the function <code>corrplot()</code> [in corrplot package]. R base plotting system. See examples at: <a href="https://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram" class="uri">https://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram</a>.</li>
</ul></li>
</ol>
<p>Here, we’ll present only the <code>ggcorrplot</code> package <span class="citation">(Kassambara 2016)</span>, which can be installed as follow: <code>install.packages("ggcorrplot")</code>.</p>
<pre class="r"><code>library("ggcorrplot")
# Compute a correlation matrix
my_data <- mtcars[, c(1,3,4,5,6,7)]
corr <- round(cor(my_data), 1)
# Visualize
ggcorrplot(corr, p.mat = cor_pmat(my_data),
           hc.order = TRUE, type = "lower",
           color = c("#FC4E07", "white", "#00AFBB"),
           outline.col = "white", lab = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/008-plot-multivariate-continuous-data-r-graphics-cookbook-and-examples-for-great-data-visualization-correlation-analysis-1.png" width="384" /></p>
<p>In the plot above:</p>
<ul>
<li>Positive correlations are shown in blue and negative correlation in red</li>
<li>Variables that are associated are grouped together.</li>
<li>Non-significant correlation are marked by a cross (X)</li>
</ul>
</div>
<div id="principal-component-analysis" class="section level2">
<h2>Principal component analysis</h2>
<p>Principal component analysis (PCA) is a multivariate data analysis approach that allows us to summarize and visualize the most important information contained in a multivariate data set. </p>
<p>PCA reduces the data into few new dimensions (or axes), which are a linear combination of the original variables. You can visualize a multivariate data by drawing a scatter plot of the first two dimensions, which contain the most important information in the data. Read more at: <a href="https://goo.gl/kabVHq" class="uri">https://goo.gl/kabVHq</a></p>
<ul>
<li>Demo data set: <code>iris</code></li>
<li>Compute PCA using the R base function <code>prcomp()</code></li>
<li>Visualize the output using the <code>factoextra</code> R package (an extension to ggplot2) <span class="citation">(Kassambara and Mundt 2017)</span></li>
</ul>
<pre class="r"><code>library("factoextra")
my_data <- iris[, -5] # Remove the grouping variable
res.pca <- prcomp(my_data, scale = TRUE)
fviz_pca_biplot(res.pca, col.ind = iris$Species,
                palette = "jco", geom = "point")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/008-plot-multivariate-continuous-data-r-graphics-cookbook-and-examples-for-great-data-visualization-principal-component-analysis-1.png" width="480" /></p>
<p>In the plot above:</p>
<ul>
<li>Dimension (Dim.) 1 and 2 retained about 96% (73% + 22.9%) of the total information contained in the data set.</li>
<li>Individuals with a similar profile are grouped together</li>
<li>Variables that are positively correlated are on the same side of the plots. Variables that are negatively correlated are on the opposite side of the plots.</li>
</ul>
</div>
<div id="cluster-analysis" class="section level2">
<h2>Cluster analysis</h2>
<p>Cluster analysis is one of the important data mining methods for discovering knowledge in multidimensional data. The goal of clustering is to identify pattern or groups of similar objects within a data set of interest. Read more at: <a href="https://www.sthda.com/english/articles/25-cluster-analysis-in-r-practical-guide/" class="uri">https://www.sthda.com/english/articles/25-cluster-analysis-in-r-practical-guide/</a>. </p>
<p>This section describes how to compute and visualize hierarchical clustering, which output is a tree called dendrogram showing groups of similar individuals.</p>
<ul>
<li>Computation. R function: <code>hclust()</code>. It takes a dissimilarity matrix as an input, which is calculated using the function <code>dist()</code>.</li>
<li>Visualization: <code>fviz_dend()</code> [in factoextra]</li>
<li>Demo data sets: <code>USArrests</code></li>
</ul>
<p>Before cluster analysis, it’s recommended to scale (or normalize) the data, to make the variables comparable. R function: <code>scale()</code>, applies scaling on the column of the data (variables).</p>
<pre class="r"><code>library(factoextra)
USArrests %>%
  scale() %>%                           # Scale the data
  dist() %>%                            # Compute distance matrix
  hclust(method = "ward.D2") %>%        # Hierarchical clustering
  fviz_dend(cex = 0.5, k = 4, palette = "jco") # Visualize and cut 
                                              # into 4 groups</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/008-plot-multivariate-continuous-data-r-graphics-cookbook-and-examples-for-great-data-visualization-cluster-analysis-1.png" width="480" /></p>
<p>A heatmap is another way to visualize hierarchical clustering. It’s also called a false colored image, where data values are transformed to color scale. Heat maps allow us to simultaneously visualize groups of samples and features. You can easily create a pretty heatmap using the R package <code>pheatmap</code>. </p>
<p>In heatmap, generally, columns are samples and rows are variables. Therefore we start by scaling and then transpose the data before creating the heatmap.</p>
<pre class="r"><code>library(pheatmap)
USArrests %>%
  scale() %>%                  # Scale variables
  t() %>%                      # Transpose 
  pheatmap(cutree_cols = 4)    # Create the heatmap</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/images/r-graphics-essentials/plot-multivariate-data-heatmap.png" alt="Multivariate data Heatmap" /></p>
</div>
<div id="conclusion" class="section level2">
<h2>Conclusion</h2>
<p>For a multivariate continuous data, you can perform the following analysis or visualization depending on the complexity of your data:</p>
<ul>
<li>3D scatter plot : scatterplot3d() [scatterplot3d]</li>
<li>Create a scatter plot matrix: ggpairs [GGally]</li>
<li>Correlation matrix analysis and visualization: cor()[stats] and ggcorrplot() [ggcorrplot] for the visualization.</li>
<li>Principal component analysis: prcomp() [stats] and fviz_pca() [factoextra]</li>
<li>Cluster analysis: hclust() [stats] and fviz_dend() [factoextra]</li>
</ul>
</div>
<div id="see-also" class="section level2">
<h2>See also</h2>
<p><a href="https://www.sthda.com/english/articles/32-r-graphics-essentials/129-visualizing-multivariate-categorical-data/">Visualizing Multivariate Categorical Data</a></p>
</div>
<div id="references" class="section level2 unnumbered">
<h2>References</h2>
<div id="refs" class="references">
<div id="ref-R-ggcorrplot">
<p>Kassambara, Alboukadel. 2016. <em>Ggcorrplot: Visualization of a Correlation Matrix Using ’Ggplot2’</em>. <a href="https://www.sthda.com/english/wiki/ggcorrplot" class="uri">https://www.sthda.com/english/wiki/ggcorrplot</a>.</p>
</div>
<div id="ref-R-factoextra">
<p>Kassambara, Alboukadel, and Fabian Mundt. 2017. <em>Factoextra: Extract and Visualize the Results of Multivariate Data Analyses</em>. <a href="https://www.sthda.com/english/rpkgs/factoextra" class="uri">https://www.sthda.com/english/rpkgs/factoextra</a>.</p>
</div>
<div id="ref-R-scatterplot3d">
<p>Ligges, Uwe, Martin Maechler, and Sarah Schnackenberg. 2017. <em>Scatterplot3d: 3D Scatter Plot</em>. <a href="https://CRAN.R-project.org/package=scatterplot3d" class="uri">https://CRAN.R-project.org/package=scatterplot3d</a>.</p>
</div>
<div id="ref-R-GGally">
<p>Schloerke, Barret, Jason Crowley, Di Cook, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg, and Joseph Larmarange. 2016. <em>GGally: Extension to ’Ggplot2’</em>. <a href="https://CRAN.R-project.org/package=GGally" class="uri">https://CRAN.R-project.org/package=GGally</a>.</p>
</div>
</div>
</div>


</div><!--end rdoc-->


<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 16:29:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Visualizing Multivariate Categorical Data]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/129-visualizing-multivariate-categorical-data/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/129-visualizing-multivariate-categorical-data/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">





<p>To visualize a small data set containing multiple categorical (or qualitative) variables, you can create either a bar plot, a balloon plot or a mosaic plot.</p>
<p>For a large multivariate categorical data, you need specialized statistical techniques dedicated to categorical data analysis, such as <a href="https://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/">simple and multiple correspondence analysis</a>. These methods make it possible to analyze and visualize the association (i.e. correlation) between a large number of qualitative variables.</p>
<p>Here, you’ll learn some examples of graphs, in R programming language, for visualizing the frequency distribution of categorical variables contained in small contingency tables. We provide also the R code for computing the simple correspondence analysis.</p>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#bar-plots-of-contingency-tables">Bar plots of contingency tables</a></li>
<li><a href="#balloon-plot">Balloon plot</a></li>
<li><a href="#mosaic-plot">Mosaic plot</a></li>
<li><a href="#correspondence-analysis">Correspondence analysis</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="prerequisites" class="section level2">
<h2>Prerequisites</h2>
<p>Load required R packages and set the default theme:</p>
<pre class="r"><code>library(ggplot2)
library(ggpubr)
theme_set(theme_pubr())</code></pre>
</div>
<div id="bar-plots-of-contingency-tables" class="section level2">
<h2>Bar plots of contingency tables</h2>
<p>Demo data set: <code>HairEyeColor</code> (distribution of hair and eye color and sex in 592 statistics students)</p>
<ul>
<li>Prepare and inspect the data:</li>
</ul>
<pre class="r"><code>data("HairEyeColor")
df <- as.data.frame(HairEyeColor)
head(df)</code></pre>
<pre><code>##    Hair   Eye  Sex Freq
## 1 Black Brown Male   32
## 2 Brown Brown Male   53
## 3   Red Brown Male   10
## 4 Blond Brown Male    3
## 5 Black  Blue Male   11
## 6 Brown  Blue Male   50</code></pre>
<ul>
<li>Create the bar graph:
<ul>
<li>Hair color on x-axis</li>
<li>Change bar fill by Eye color</li>
<li>Split the graph into multiple panel by Sex</li>
</ul></li>
</ul>
<pre class="r"><code>ggplot(df, aes(x = Hair, y = Freq))+
  geom_bar(
    aes(fill = Eye), stat = "identity", color = "white",
    position = position_dodge(0.9)
    )+
  facet_wrap(~Sex) + 
  fill_palette("jco")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/009-visualizing-multivariate-categorical-data-r-graphics-cookbook-and-examples-for-great-data-visualization-bar-plots-of-contingency-table-1.png" width="672" /></p>
</div>
<div id="balloon-plot" class="section level2">
<h2>Balloon plot</h2>
<p>Balloon plot is an alternative to bar plot for visualizing a large categorical data. We’ll use the function <code>ggballoonplot()</code> [in ggpubr], which draws a graphical matrix of a contingency table, where each cell contains a dot whose size reflects the relative magnitude of the corresponding component. </p>
<p>Demo data sets: <code>Housetasks</code> (a contingency table containing the frequency of execution of 13 house tasks in the couple.)</p>
<pre class="r"><code>housetasks <- read.delim(
  system.file("demo-data/housetasks.txt", package = "ggpubr"),
  row.names = 1
  )
head(housetasks, 4)</code></pre>
<pre><code>##            Wife Alternating Husband Jointly
## Laundry     156          14       2       4
## Main_meal   124          20       5       4
## Dinner       77          11       7      13
## Breakfeast   82          36      15       7</code></pre>
<ul>
<li>Create a simple balloon plot of a contingency table. Change the fill color by the values in the cells.</li>
</ul>
<pre class="r"><code>ggballoonplot(housetasks, fill = "value")+
  scale_fill_viridis_c(option = "C")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/009-visualizing-multivariate-categorical-data-r-graphics-cookbook-and-examples-for-great-data-visualization-balloonplot-of-contingency-table-1.png" width="364.8" /></p>
<ul>
<li>Visualize a grouped frequency table. Demo data set: <code>HairEyeColor</code>. Create a multi-panel plot by Sex</li>
</ul>
<pre class="r"><code>df <- as.data.frame(HairEyeColor)
ggballoonplot(df, x = "Hair", y = "Eye", size = "Freq",
              fill = "Freq", facet.by = "Sex",
              ggtheme = theme_bw()) +
  scale_fill_viridis_c(option = "C")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/009-visualizing-multivariate-categorical-data-r-graphics-cookbook-and-examples-for-great-data-visualization-balloon-plot-of-a-grouped-frequency-table-1.png" width="528" /></p>
</div>
<div id="mosaic-plot" class="section level2">
<h2>Mosaic plot</h2>
<p>A mosaic plot is basically an area-proportional visualization of observed frequencies, composed of tiles (corresponding to the cells) created by recursive vertical and horizontal splits of a rectangle. The area of each tile is proportional to the corresponding cell entry, given the dimensions of previous splits. </p>
<p>Mosaic graph can be created using either the function <code>mosaicplot()</code> [in graphics] or the function <code>mosaic()</code> [in vcd package]. Read more at: <a href="https://cran.r-project.org/web/packages/vcd/vignettes/strucplot.pdf">Visualizing Multi-way Contingency Tables with vcd</a>.</p>
<p>Example of mosaic plot:</p>
<pre class="r"><code>library(vcd)
mosaic(HairEyeColor, shade = TRUE, legend = TRUE) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/009-visualizing-multivariate-categorical-data-r-graphics-cookbook-and-examples-for-great-data-visualization-mosaic-plot-1.png" width="576" /></p>
</div>
<div id="correspondence-analysis" class="section level2">
<h2>Correspondence analysis</h2>
<p>Correspondence analysis can be used to summarize and visualize the information contained in a large contingency table formed by two categorical variables. </p>
<p>Required package: FactoMineR for the analysis and factoextra for the visualization</p>
<pre class="r"><code>library(FactoMineR)
library(factoextra)
res.ca <- CA(housetasks, graph = FALSE)
fviz_ca_biplot(res.ca, repel = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/009-visualizing-multivariate-categorical-data-r-graphics-cookbook-and-examples-for-great-data-visualization-correspondence-analysis-1.png" width="480" /></p>
<p>From the graphic above, it’s clear that:</p>
<ul>
<li>Housetasks such as dinner, breakfeast, laundry are done more often by the wife</li>
<li>Driving and repairs are done more frequently by the husband</li>
</ul>
<p>Read more at: <a href="https://goo.gl/7CnpXq">Correspondence analysis in R</a></p>
</div>


</div><!--end rdoc-->


<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 15:50:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Plot Time Series Data Using GGPlot]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/128-plot-time-series-data-using-ggplot/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/128-plot-time-series-data-using-ggplot/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">





<p>In this chapter, we start by describing how to plot simple and multiple time series data using the R function <code>geom_line()</code> [in ggplot2].</p>
<p>Next, we show how to set date axis limits and add trend smoothed line to a time series graphs. Finally, we introduce some extensions to the ggplot2 package for easily handling and analyzing time series objects.</p>
<p>Additionally, you’ll learn how to detect peaks (maxima) and valleys (minima) in time series data.</p>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#basic-ggplot-of-time-series">Basic ggplot of time series</a></li>
<li><a href="#plot-multiple-time-series-data">Plot multiple time series data</a></li>
<li><a href="#set-date-axis-limits">Set date axis limits</a></li>
<li><a href="#format-date-axis-labels">Format date axis labels</a></li>
<li><a href="#add-trend-smoothed-line">Add trend smoothed line</a></li>
<li><a href="#ggplot2-extensions-for-ts-objects">ggplot2 extensions for ts objects</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="basic-ggplot-of-time-series" class="section level2">
<h2>Basic ggplot of time series</h2>
<ul>
<li>Plot types: line plot with dates on x-axis</li>
<li>Demo data set: <code>economics</code> [ggplot2] time series data sets are used.</li>
</ul>
<p>In this section we’ll plot the variables <code>psavert</code> (personal savings rate) and <code>uempmed</code> (number of unemployed in thousands) by <code>date</code> (x-axis).</p>
<ul>
<li>Load required packages and set the default theme:</li>
</ul>
<pre class="r"><code>library(ggplot2)
theme_set(theme_minimal())
# Demo dataset
head(economics)</code></pre>
<pre><code>## # A tibble: 6 x 6
##         date   pce    pop psavert uempmed unemploy
##       <date> <dbl>  <int>   <dbl>   <dbl>    <int>
## 1 1967-07-01   507 198712    12.5     4.5     2944
## 2 1967-08-01   510 198911    12.5     4.7     2945
## 3 1967-09-01   516 199113    11.7     4.6     2958
## 4 1967-10-01   513 199311    12.5     4.9     3143
## 5 1967-11-01   518 199498    12.5     4.7     3066
## 6 1967-12-01   526 199657    12.1     4.8     3018</code></pre>
<ul>
<li>Create basic line plots</li>
</ul>
<pre class="r"><code># Basic line plot
ggplot(data = economics, aes(x = date, y = pop))+
  geom_line(color = "#00AFBB", size = 2)

# Plot a subset of the data
ss <- subset(economics, date > as.Date("2006-1-1"))
ggplot(data = ss, aes(x = date, y = pop)) + 
  geom_line(color = "#FC4E07", size = 2)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_line-line-plot-time-series-data-visualization-1.png" width="307.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-geom_line-line-plot-time-series-data-visualization-2.png" width="307.2" /></p>
<ul>
<li>Control line size by the value of a continuous variable:</li>
</ul>
<pre class="r"><code>ggplot(data = economics, aes(x = date, y = pop)) +
  geom_line(aes(size = unemploy/pop), color = "#FC4E07")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-line-plot-time-serie-data-line-size-data-visualization-1.png" width="480" /></p>
</div>
<div id="plot-multiple-time-series-data" class="section level2">
<h2>Plot multiple time series data</h2>
<p>Here, we’ll plot the variables <code>psavert</code> and <code>uempmed</code> by dates. You should first reshape the data using the <code>tidyr</code> package: - Collapse <code>psavert</code> and <code>uempmed</code> values in the same column (new column). R function: <code>gather()[tidyr]</code> - Create a grouping variable that with levels = <code>psavert</code> and <code>uempmed</code></p>
<pre class="r"><code>library(tidyr)
library(dplyr)
df <- economics %>%
  select(date, psavert, uempmed) %>%
  gather(key = "variable", value = "value", -date)
head(df, 3)</code></pre>
<pre><code>## # A tibble: 3 x 3
##         date variable value
##       <date>    <chr> <dbl>
## 1 1967-07-01  psavert  12.5
## 2 1967-08-01  psavert  12.5
## 3 1967-09-01  psavert  11.7</code></pre>
<pre class="r"><code># Multiple line plot
ggplot(df, aes(x = date, y = value)) + 
  geom_line(aes(color = variable), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  theme_minimal()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-multiple-time-series-1.png" width="576" /></p>
<pre class="r"><code># Area plot
ggplot(df, aes(x = date, y = value)) + 
  geom_area(aes(color = variable, fill = variable), 
            alpha = 0.5, position = position_dodge(0.8)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-multiple-time-series-2.png" width="576" /></p>
</div>
<div id="set-date-axis-limits" class="section level2">
<h2>Set date axis limits</h2>
<p>Key R function: <code>scale_x_date()</code></p>
<pre class="r"><code># Base plot with date axis
p <- ggplot(data = economics, aes(x = date, y = psavert)) + 
     geom_line(color = "#00AFBB", size = 1)
p

# Set axis limits c(min, max)
min <- as.Date("2002-1-1")
max <- NA
p + scale_x_date(limits = c(min, max))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-set-date-axis-limits-scale_x_date-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-set-date-axis-limits-scale_x_date-2.png" width="316.8" /></p>
</div>
<div id="format-date-axis-labels" class="section level2">
<h2>Format date axis labels</h2>
<p>Key function: <code>scale_x_date()</code>.</p>
<p>To format date axis labels, you can use different combinations of days, weeks, months and years:</p>
<ul>
<li>Weekday name: use <code>%a</code> and <code>%A</code> for abbreviated and full weekday name, respectively</li>
<li>Month name: use <code>%b</code> and <code>%B</code> for abbreviated and full month name, respectively</li>
<li><code>%d</code>: day of the month as decimal number</li>
<li><code>%Y</code>: Year with century.</li>
<li>See more options in the documentation of the function <code>?strptime</code></li>
</ul>
<pre class="r"><code># Format : month/year
p + scale_x_date(date_labels = "%b/%Y")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-format-date-axis-scale_x_date-1.png" width="480" /></p>
</div>
<div id="add-trend-smoothed-line" class="section level2">
<h2>Add trend smoothed line</h2>
<p>Key function: <code>stat_smooth()</code></p>
<pre class="r"><code>p + stat_smooth(
  color = "#FC4E07", fill = "#FC4E07",
  method = "loess"
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-trend-smoothed-line-1.png" width="480" /></p>
</div>
<div id="ggplot2-extensions-for-ts-objects" class="section level2">
<h2>ggplot2 extensions for ts objects</h2>
<p>The <code>ggfortify</code> package is an extension to ggplot2 that makes it easy to plot time series objects <span class="citation">(Horikoshi and Tang 2017)</span>. It can handle the output of many time series packages, including: zoo::zooreg(), xts::xts(), timeSeries::timSeries(), tseries::irts(), forecast::forecast(), vars:vars().</p>
<p>Another interesting package is the <code>ggpmisc</code> package <span class="citation">(Aphalo 2017)</span>, which provides two useful methods for time series object:</p>
<ul>
<li><code>stat_peaks()</code> finds at which x positions local y maxima are located, and</li>
<li><code>stat_valleys()</code> finds at which x positions local y minima are located.</li>
</ul>
<p>Here, we’ll show how to easily:</p>
<ul>
<li>Visualize a time series object, using the data set <code>AirPassengers</code> (monthly airline passenger numbers 1949-1960).</li>
<li>Identify shifts in mean and/or variance in a time series using the <code>changepoint</code> package.</li>
<li>Detect jumps in a data using the <code>strucchange</code> package and the data set <code>Nile</code> (Measurements of the annual flow of the river Nile at Aswan).</li>
<li>Detect peaks and valleys using the <code>ggpmisc</code> package and the data set <code>lynx</code> (Annual Canadian Lynx trappings 1821–1934).</li>
</ul>
<p>First, install required R packages:</p>
<pre class="r"><code>install.packages(
  c("ggfortify", "changepoint",
    "strucchange", "ggpmisc")
)</code></pre>
<p>Then use the <code>autoplot.ts()</code> function to visualize time series objects, as follow:</p>
<pre class="r"><code>library(ggfortify)
library(magrittr) # for piping %>%

# Plot ts objects
autoplot(AirPassengers)

# Identify change points in mean and variance
AirPassengers %>%
  changepoint:: cpt.meanvar() %>%  # Identify change points
  autoplot()

# Detect jump in a data
strucchange::breakpoints(Nile ~ 1) %>%
  autoplot()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-visualize-time-serie-objects-ggfortify-1.png" width="336" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-visualize-time-serie-objects-ggfortify-2.png" width="336" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-visualize-time-serie-objects-ggfortify-3.png" width="336" /></p>
<p>Detect peaks and valleys:</p>
<pre class="r"><code>library(ggpmisc)
ggplot(lynx, as.numeric = FALSE) + geom_line() + 
  stat_peaks(colour = "red") +
  stat_peaks(geom = "text", colour = "red", 
             vjust = -0.5, x.label.fmt = "%Y") +
  stat_valleys(colour = "blue") +
  stat_valleys(geom = "text", colour = "blue", angle = 45,
               vjust = 1.5, hjust = 1,  x.label.fmt = "%Y")+
  ylim(-500, 7300)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/010-plot-time-series-data-r-graphics-cookbook-and-examples-for-great-data-visualization-detect-peaks-and-valleys-1.png" width="576" /></p>
</div>
<div id="references" class="section level2 unnumbered">
<h2>References</h2>
<div id="refs" class="references">
<div id="ref-R-ggpmisc">
<p>Aphalo, Pedro J. 2017. <em>Ggpmisc: Miscellaneous Extensions to ’Ggplot2’</em>. <a href="https://CRAN.R-project.org/package=ggpmisc" class="uri">https://CRAN.R-project.org/package=ggpmisc</a>.</p>
</div>
<div id="ref-R-ggfortify">
<p>Horikoshi, Masaaki, and Yuan Tang. 2017. <em>Ggfortify: Data Visualization Tools for Statistical Analysis Results</em>. <a href="https://CRAN.R-project.org/package=ggfortify" class="uri">https://CRAN.R-project.org/package=ggfortify</a>.</p>
</div>
</div>
</div>


</div><!--end rdoc-->


<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 15:28:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[GGPlot Facet: Quick Reference]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/127-ggplot-facet-quick-reference/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/127-ggplot-facet-quick-reference/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">





<p>This chapter provides a quick reference to <strong>facet_wrap</strong>() and <strong>facet_grid</strong>() for faceting a ggplot into multiple panels.</p>
<p>Facets divide a ggplot into subplots based on the values of one or more categorical variables. There are two main functions for faceting:</p>
<ol style="list-style-type: decimal">
<li><code>facet_grid()</code>, which layouts panels in a grid. It creates a matrix of panels defined by row and column faceting variables</li>
<li><code>facet_wrap()</code>, which wraps a 1d sequence of panels into 2d. This is generally a better use of screen space than facet_grid() because most displays are roughly rectangular.</li>
</ol>
<p>Here, you’ll learn how to:</p>
<ul>
<li>Create a facet wrap and facet grid panels.</li>
<li>Make the scales of facets free (independent).</li>
<li>Change facet labels text and appearance.</li>
</ul>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#key-r-functions-facet_grid-and-facet_wrap">Key R functions: facet_grid and facet_wrap</a></li>
<li><a href="#using-facet_grid">Using facet_grid</a></li>
<li><a href="#using-facet_wrap">Using facet_wrap</a></li>
<li><a href="#facet-scales">Facet scales</a></li>
<li><a href="#change-facet-labels">Change facet labels</a></li>
<li><a href="#see-also">See also</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="prerequisites" class="section level2">
<h2>Prerequisites</h2>
<p>Load required packages and set the theme function <code>theme_light()</code> [ggplot2] as the default theme:</p>
<pre class="r"><code>library(ggplot2)
theme_set(
  theme_light() + theme(legend.position = "top")
  )</code></pre>
<p>Create a box plot filled by groups:</p>
<pre class="r"><code># Load data and convert dose to a factor variable
data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Box plot
p <- ggplot(ToothGrowth, aes(x = dose, y = len)) + 
  geom_boxplot(aes(fill = supp), position = position_dodge(0.9)) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800"))
p</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-basic-boxplot-facets-1.png" width="288" /></p>
</div>
<div id="key-r-functions-facet_grid-and-facet_wrap" class="section level2">
<h2>Key R functions: facet_grid and facet_wrap</h2>
<p>The following functions can be used for facets:</p>
<ul>
<li>p + <strong>facet_grid</strong>(supp ~ .): Facet in vertical direction based on the levels of <em>supp</em> variable.

</li>
<li>p + <strong>facet_grid</strong>(. ~ supp): Facet in horizontal direction based on the levels of <em>supp</em> variable.

</li>
<li>p + <strong>facet_grid</strong>(dose ~ supp): Facet in horizontal and vertical directions based on two variables: <em>dose</em> and <em>supp</em>.</li>
<li>p + <strong>facet_wrap</strong>(~ fl): Place facet side by side into a rectangular layout</li>
</ul>
</div>
<div id="using-facet_grid" class="section level2">
<h2>Using facet_grid</h2>
<ol style="list-style-type: decimal">
<li><strong>Facet with one discrete variable</strong>. Split by the levels of the group “supp”</li>
</ol>
<pre class="r"><code># Split in vertical direction
p + facet_grid(supp ~ .)

# Split in horizontal direction
p + facet_grid(. ~ supp)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-facet_grid-facet-with-one-variable-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-facet_grid-facet-with-one-variable-2.png" width="316.8" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Facet with multiple variables</strong>. Split by the levels of two grouping variables: “dose” and “supp”</li>
</ol>
<pre class="r"><code># Facet by two variables: dose and supp.
# Rows are dose and columns are supp
p + facet_grid(dose ~ supp)

# Facet by two variables: reverse the order of the 2 variables
# Rows are supp and columns are dose
p + facet_grid(supp ~ dose)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-facet_grid-with-two-variable-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-facet_grid-with-two-variable-2.png" width="316.8" /></p>
<p>Note that, you can use the argument <code>margins</code> to add additional facets which contain all the data for each of the possible values of the faceting variables</p>
<pre class="r"><code>p + facet_grid(dose ~ supp, margins=TRUE)</code></pre>
</div>
<div id="using-facet_wrap" class="section level2">
<h2>Using facet_wrap</h2>
<p><strong>facet_wrap</strong>: Facets can be placed side by side using the function <code>facet_wrap()</code> as follow :</p>
<pre class="r"><code>p + facet_wrap(~ dose)

p + facet_wrap(~ dose, ncol=2)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-facet-wrap-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-facet-wrap-2.png" width="316.8" /></p>
</div>
<div id="facet-scales" class="section level2">
<h2>Facet scales</h2>
<p>By default, all the panels have the same scales (<code>scales="fixed"</code>). They can be made independent, by setting scales to <code>free</code>, <code>free_x</code>, or <code>free_y</code>.</p>
<pre class="r"><code>p + facet_grid(dose ~ supp, scales = &amp;#39;free&amp;#39;)</code></pre>
</div>
<div id="change-facet-labels" class="section level2">
<h2>Change facet labels</h2>
<p><strong>Change facet labels</strong>. The argument <code>labeller</code> can be used to change facet labels. Should be a function.</p>
<ul>
<li>In the following R code, facets are labelled by combining the name of the grouping variable with group levels. The labeller function <code>label_both</code> is used.</li>
</ul>
<pre class="r"><code>p + facet_grid(dose ~ supp, labeller = label_both)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-modify-facet-labels-1.png" width="336" /></p>
<ul>
<li>A simple way to modify facet label text, is to provide new labels as a named character vector:</li>
</ul>
<pre class="r"><code># New facet label names for dose variable
dose.labs <- c("D0.5", "D1", "D2")
names(dose.labs) <- c("0.5", "1", "2")

# New facet label names for supp variable
supp.labs <- c("Orange Juice", "Vitamin C")
names(supp.labs) <- c("OJ", "VC")

# Create the plot
p + facet_grid(
  dose ~ supp, 
  labeller = labeller(dose = dose.labs, supp = supp.labs)
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-change-facet-labels-1.png" width="336" /></p>
<ul>
<li>An alternative solution to change the facet labels, is to modify the data:</li>
</ul>
<pre class="r"><code>df <- ToothGrowth
# Modify the data
df$dose <- factor(df$dose, levels = c("0.5", "1", "2"), 
                  labels = c("D0.5", "D1", "D2"))
df$supp <- factor(df$supp, levels = c("OJ", "VC"),
                  labels = c("Orange Juice", "Vitamin C")
                  )
# Create the plot
ggplot(df, aes(x = dose, y = len)) + 
  geom_boxplot(aes(fill = supp)) +
  facet_grid(dose ~ supp)</code></pre>
<p><strong>Change facet labels appearance</strong>:</p>
<pre class="r"><code># Change facet text font. Possible values for the font style:
  #&amp;#39;plain&amp;#39;, &amp;#39;italic&amp;#39;, &amp;#39;bold&amp;#39;, &amp;#39;bold.italic&amp;#39;.
p + facet_grid(dose ~ supp)+
    theme(
      strip.text.x = element_text(
        size = 12, color = "red", face = "bold.italic"
        ),
      strip.text.y = element_text(
        size = 12, color = "red", face = "bold.italic"
        )
      )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-change-facet-labels-appearance-1.png" width="316.8" /></p>
<p><strong>Change facet background color</strong>. The rectangle around facet labels can be modified using the function <code>element_rect()</code>.</p>
<pre class="r"><code>p + facet_grid(dose ~ supp)+
 theme(
   strip.background = element_rect(
     color="black", fill="#FC4E07", size=1.5, linetype="solid"
     )
   )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/011-ggplot-facet-r-graphics-cookbook-and-examples-for-great-data-visualization-change-facet-background-color-1.png" width="316.8" /></p>
</div>
<div id="see-also" class="section level2">
<h2>See also</h2>
<ul>
<li>Create and Customize Multi-panel ggplots: Easy Guide to Facet. <a href="https://goo.gl/eRKHV7" class="uri">https://goo.gl/eRKHV7</a></li>
</ul>
</div>


</div><!--end rdoc-->


<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 15:11:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Combine Multiple GGPlots in One Graph]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/126-combine-multiple-ggplots-in-one-graph/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/126-combine-multiple-ggplots-in-one-graph/</guid>
			<description><![CDATA[<!-- START HTML -->


  <div id="rdoc">





<p>This chapter describes, step by step, how to combine <strong>multiple ggplots</strong> in one graph, as well as, over multiple pages, using helper functions available in the <em>ggpubr</em> R package. We’ll also describe how to save the arranged plots and how to save multiple ggplots in one pdf file.</p>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#arrange-on-one-page">Arrange on one page</a></li>
<li><a href="#annotate-the-arranged-figure">Annotate the arranged figure</a></li>
<li><a href="#change-column-and-row-span-of-a-plot">Change column and row span of a plot</a></li>
<li><a href="#use-shared-legend-for-combined-ggplots">Use shared legend for combined ggplots</a></li>
<li><a href="#mix-table-text-and-ggplot">Mix table, text and ggplot2 graphs</a></li>
<li><a href="#arrange-over-multiple-pages">Arrange over multiple pages</a></li>
<li><a href="#export-the-arranged-plots">Export the arranged plots</a></li>
<li><a href="#see-also">See also</a></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="prerequisites" class="section level2">
<h2>Prerequisites</h2>
<p>Load required packages and set the theme function <code>theme_pubr()</code> [in ggpubr] as the default theme:</p>
<pre class="r"><code>library(ggplot2)
library(ggpubr)
theme_set(theme_pubr())</code></pre>
</div>
<div id="arrange-on-one-page" class="section level2">
<h2>Arrange on one page</h2>
<ul>
<li><strong>Create some basic plots</strong> as follow:</li>
</ul>
<pre class="r"><code># 0. Define custom color palette and prepare the data
my3cols <- c("#E7B800", "#2E9FDF", "#FC4E07")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)

# 1. Create a box plot (bp)
p <- ggplot(ToothGrowth, aes(x = dose, y = len))
bxp <- p + geom_boxplot(aes(color = dose)) +
  scale_color_manual(values = my3cols)

# 2. Create a dot plot (dp)
dp <- p + geom_dotplot(aes(color = dose, fill = dose), 
                       binaxis=&amp;#39;y&amp;#39;, stackdir=&amp;#39;center&amp;#39;) +
  scale_color_manual(values = my3cols) + 
  scale_fill_manual(values = my3cols)

# 3. Create a line plot
lp <- ggplot(economics, aes(x = date, y = psavert)) + 
  geom_line(color = "#E46726") </code></pre>
<ul>
<li><strong>Combine multiple ggplot on one page</strong>. Use the function <code>ggarrange()</code>[ggpubr package], a wrapper around the function <code>plot_grid()</code> [cowplot package]. Compared to plot_grid(), ggarange() can arrange multiple ggplots over multiple pages.</li>
</ul>
<pre class="r"><code>figure <- ggarrange(bxp, dp, lp,
                    labels = c("A", "B", "C"),
                    ncol = 2, nrow = 2)
figure</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/012-combine-multiple-ggplots-in-one-graph-r-graphics-cookbook-and-examples-for-great-data-visualization-multiple-ggplot-figure-1.png" width="480" /></p>
</div>
<div id="annotate-the-arranged-figure" class="section level2">
<h2>Annotate the arranged figure</h2>
<p>Key R function: <code>annotate_figure()</code> [in ggpubr].</p>
<pre class="r"><code>annotate_figure(
  figure,
  top = text_grob("Visualizing len",
                  color = "red", face = "bold", size = 14),
  bottom = text_grob("Data source: \n ToothGrowth", color = "blue",
                     hjust = 1, x = 1, face = "italic", size = 10),
  left = text_grob("Fig arranged using ggpubr",
                   color = "green", rot = 90),
  right = "I&amp;#39;m done, thanks :-)!",
  fig.lab = "Figure 1", fig.lab.face = "bold"
  )</code></pre>
</div>
<div id="change-column-and-row-span-of-a-plot" class="section level2">
<h2>Change column and row span of a plot</h2>
<p>We’ll use nested <code>ggarrange()</code> functions to change column/row span of plots. For example, using the R code below:</p>
<ul>
<li>the line plot (lp) will live in the first row and spans over two columns</li>
<li>the box plot (bxp) and the dot plot (dp) will be first arranged and will live in the second row with two different columns</li>
</ul>
<pre class="r"><code>ggarrange(
  lp,                # First row with line plot
  # Second row with box and dot plots
  ggarrange(bxp, dp, ncol = 2, labels = c("B", "C")), 
  nrow = 2, 
  labels = "A"       # Label of the line plot
  ) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/012-combine-multiple-ggplots-in-one-graph-r-graphics-cookbook-and-examples-for-great-data-visualization-multiple-ggplot-figure-column-row-span-1.png" width="480" /></p>
</div>
<div id="use-shared-legend-for-combined-ggplots" class="section level2">
<h2>Use shared legend for combined ggplots</h2>
<p>To place a common unique legend in the margin of the arranged plots, the function <code>ggarrange()</code> [in ggpubr] can be used with the following arguments:</p>
<ul>
<li><code>common.legend = TRUE</code>: place a common legend in a margin</li>
<li><code>legend</code>: specify the legend position. Allowed values include one of c(“top”, “bottom”, “left”, “right”)</li>
</ul>
<pre class="r"><code>ggarrange(
  bxp, dp, labels = c("A", "B"),
  common.legend = TRUE, legend = "bottom"
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/012-combine-multiple-ggplots-in-one-graph-r-graphics-cookbook-and-examples-for-great-data-visualization-shared-legend-for-multiple-ggplots-1.png" width="480" /></p>
</div>
<div id="mix-table-text-and-ggplot" class="section level2">
<h2>Mix table, text and ggplot2 graphs</h2>
<p>In this section, we’ll show how to plot a table and text alongside a chart. The iris data set will be used.</p>
<p>We start by creating the following plots:</p>
<ol style="list-style-type: decimal">
<li>a <strong>density plot</strong> of the variable “Sepal.Length”. R function: <strong>ggdensity</strong>() [in ggpubr]</li>
<li>a plot of the <strong>summary table</strong> containing the descriptive statistics (mean, sd, … ) of Sepal.Length.
<ul>
<li>R function for computing descriptive statistics: <strong>desc_statby</strong>() [in ggpubr].</li>
<li>R function to draw a textual table: <strong>ggtexttable</strong>() [in ggpubr].</li>
</ul></li>
<li>a plot of a text <strong>paragraph</strong>. R function: <strong>ggparagraph</strong>() [in ggpubr].</li>
</ol>
<p>We finish by arranging/combining the three plots using the function <strong>ggarrange</strong>() [in ggpubr]</p>
<pre class="r"><code># Density plot of "Sepal.Length"
#::::::::::::::::::::::::::::::::::::::
density.p <- ggdensity(iris, x = "Sepal.Length", 
                       fill = "Species", palette = "jco")

# Draw the summary table of Sepal.Length
#::::::::::::::::::::::::::::::::::::::
# Compute descriptive statistics by groups
stable <- desc_statby(iris, measure.var = "Sepal.Length",
                      grps = "Species")
stable <- stable[, c("Species", "length", "mean", "sd")]
# Summary table plot, medium orange theme
stable.p <- ggtexttable(stable, rows = NULL, 
                        theme = ttheme("mOrange"))

# Draw text
#::::::::::::::::::::::::::::::::::::::
text <- paste("iris data set gives the measurements in cm",
              "of the variables sepal length and width",
              "and petal length and width, respectively,",
              "for 50 flowers from each of 3 species of iris.",
             "The species are Iris setosa, versicolor, and virginica.",
             sep = " ")
text.p <- ggparagraph(text = text, face = "italic", size = 11, color = "black")

# Arrange the plots on the same page
ggarrange(density.p, stable.p, text.p, 
          ncol = 1, nrow = 3,
          heights = c(1, 0.5, 0.3))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/012-combine-multiple-ggplots-in-one-graph-r-graphics-cookbook-and-examples-for-great-data-visualization-add-table-add-text-data-visualization-1.png" width="528" /></p>
</div>
<div id="arrange-over-multiple-pages" class="section level2">
<h2>Arrange over multiple pages</h2>
<p>If you have a long list of ggplots, say n = 20 plots, you may want to arrange the plots and to place them on multiple pages. With 4 plots per page, you need 5 pages to hold the 20 plots.</p>
<p>The function <code>ggarrange()</code> [ggpubr] provides a convenient solution to arrange multiple ggplots over multiple pages. After specifying the arguments <code>nrow</code> and <code>ncol,</code>ggarrange()` computes automatically the number of pages required to hold the list of the plots. It returns a list of arranged ggplots.</p>
<p>For example the following R code,</p>
<pre class="r"><code>multi.page <- ggarrange(bxp, dp, lp, bxp,
                        nrow = 1, ncol = 2)</code></pre>
<p>returns a list of two pages with two plots per page. You can visualize each page as follow:</p>
<pre class="r"><code>multi.page[[1]] # Visualize page 1
multi.page[[2]] # Visualize page 2</code></pre>
<p>You can also export the arranged plots to a pdf file using the function <code>ggexport()</code> [ggpubr]:</p>
<pre class="r"><code>ggexport(multi.page, filename = "multi.page.ggplot2.pdf")</code></pre>
<p>See the PDF file: <a href="//www.slideshare.net/kassambara/multipageggplot2">Multi.page.ggplot2</a></p>
</div>
<div id="export-the-arranged-plots" class="section level2">
<h2>Export the arranged plots</h2>
<p>R function: <code>ggexport()</code> [in ggpubr].</p>
<ul>
<li>Export the arranged figure to a pdf, eps or png file (one figure per page).</li>
</ul>
<pre class="r"><code>ggexport(figure, filename = "figure1.pdf")</code></pre>
<ul>
<li>It’s also possible to arrange the plots (2 plot per page) when exporting them.</li>
</ul>
<p>Export individual plots to a pdf file (one plot per page):</p>
<pre class="r"><code>ggexport(bxp, dp, lp, bxp, filename = "test.pdf")</code></pre>
<p>Arrange and export. Specify the nrow and ncol arguments to display multiple plots on the same page:</p>
<pre class="r"><code>ggexport(bxp, dp, lp, bxp, filename = "test.pdf",
         nrow = 2, ncol = 1)</code></pre>
</div>
<div id="see-also" class="section level2">
<h2>See also</h2>
<ul>
<li>ggplot2 - Easy Way to Mix Multiple Graphs on The Same Page. <a href="https://goo.gl/WrieY4" class="uri">https://goo.gl/WrieY4</a></li>
</ul>
</div>


</div><!--end rdoc-->



<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 11:26:00 +0100</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[GGPlot Cheat Sheet for Great Customization]]></title>
			<link>https://www.sthda.com/english/articles/32-r-graphics-essentials/125-ggplot-cheat-sheet-for-great-customization/</link>
			<guid>https://www.sthda.com/english/articles/32-r-graphics-essentials/125-ggplot-cheat-sheet-for-great-customization/</guid>
			<description><![CDATA[<!-- START HTML -->

  <div id="rdoc">





<p>This chapter provides a cheat sheet to change the global appearance of a ggplot.</p>
<p>You will learn how to:</p>
<ul>
<li>Add title, subtitle, caption and change axis labels</li>
<li>Change the appearance - color, size and face - of titles</li>
<li>Set the axis limits</li>
<li>Set a logarithmic axis scale</li>
<li>Rotate axis text labels</li>
<li>Change the legend title and position, as well, as the color and the size</li>
<li>Change a ggplot theme and modify the background color</li>
<li>Add a background image to a ggplot</li>
<li>Use different color palettes: custom color palettess, color-blind friendly palettes, RColorBrewer palettes, viridis color palettes and scientific journal color palettes.</li>
<li>Change point shapes (plotting symbols) and line types</li>
<li>Rotate a ggplot</li>
<li>Annotate a ggplot by adding straight lines, arrows, rectangles and text.</li>
</ul>
<p>Contents:</p>
<div id="TOC">
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#titles-and-axis-labels">Titles and axis labels</a></li>
<li><a href="#axes-limits-ticks-and-log">Axes: Limits, Ticks and Log</a><ul>
<li><a href="#axis-limits-and-scales">Axis limits and scales</a></li>
<li><a href="#log-scale">Log scale</a></li>
<li><a href="#axis-ticks-set-and-rotate-text-labels">Axis Ticks: Set and Rotate Text Labels</a></li>
</ul></li>
<li><a href="#legends-title-position-and-appearance">Legends: Title, Position and Appearance</a><ul>
<li><a href="#change-legend-title-and-position">Change legend title and position</a></li>
<li><a href="#change-the-appearance-of-legends">Change the appearance of legends</a></li>
<li><a href="#rename-legend-labels-and-change-the-order-of-items">Rename legend labels and change the order of items</a></li>
</ul></li>
<li><a href="#themes-gallery">Themes gallery</a><ul>
<li><a href="#use-themes-in-ggplot2-package">Use themes in ggplot2 package</a></li>
</ul></li>
<li><a href="#background-color-and-grid-lines">Background color and grid lines</a></li>
<li><a href="#add-background-image-to-ggplot2-graphs">Add background image to ggplot2 graphs</a></li>
<li><a href="#colors">Colors</a></li>
<li><a href="#points-shape-color-and-size">Points shape, color and size</a></li>
<li><a href="#line-types">Line types</a></li>
<li><a href="#rotate-a-ggplot">Rotate a ggplot</a></li>
<li><a href="#plot-annotation">Plot annotation</a><ul>
<li><a href="#add-straight-lines">Add straight lines</a></li>
<li><a href="#text-annotation">Text annotation</a></li>
</ul></li>
</ul>
</div>
<br/>
<div class = "small-block content-privileged-friends r-graphics-essentials-book">
  <p>The Book:</p>
        <a href = "https://www.sthda.com/english/web/5-bookadvisor/52-r-graphics-essentials-for-great-data-visualization-200-practical-examples-you-want-to-know-for-data-science/">
          <img src = "https://www.sthda.com/english/upload/r-graphics-essentials-cookbook-200.png" /><br/>
      R Graphics Essentials for Great Data Visualization: +200 Practical Examples You Want to Know for Data Science
      </a>
</div>
<div class="spacer"></div>


<div id="prerequisites" class="section level2">
<h2>Prerequisites</h2>
<ol style="list-style-type: decimal">
<li>Load packages and set the default theme:</li>
</ol>
<pre class="r"><code>library(ggplot2)
library(ggpubr)
theme_set(
  theme_pubr() +
    theme(legend.position = "right")
  )</code></pre>
<ol start="2" style="list-style-type: decimal">
<li>Create a box plot (bxp) and a scatter plot (sp) that we’ll customize in the next section:</li>
</ol>
<ul>
<li>Box plot using the <code>ToothGrowth</code> dataset:</li>
</ul>
<pre class="r"><code># Convert the variable dose from numeric to factor variable
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
bxp <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
  geom_boxplot(aes(color = dose)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<ul>
<li>Scatter plot using the <code>cars</code> dataset</li>
</ul>
<pre class="r"><code>sp <- ggplot(cars, aes(x = speed, y = dist)) + 
  geom_point()</code></pre>
</div>
<div id="titles-and-axis-labels" class="section level2">
<h2>Titles and axis labels</h2>
<p>Key function: <code>labs()</code>. Used to change the main title, the subtitle, the axis labels and captions.</p>
<ol style="list-style-type: decimal">
<li><strong>Add a title, subtitle, caption and change axis labels</strong></li>
</ol>
<pre class="r"><code>bxp <- bxp + labs(title = "Effect of Vitamin C on Tooth Growth",
              subtitle = "Plot of length by dose",
              caption = "Data source: ToothGrowth",
              x = "Dose (mg)", y = "Teeth length")
bxp</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-add-titles-and-axis-labels-1.png" width="384" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Change the appearance of titles</strong></li>
</ol>
<ul>
<li>Key functions: <code>theme()</code> and <code>element_text()</code>:</li>
</ul>
<pre class="r"><code>theme(
  plot.title = element_text(),
  plot.subtitle.title = element_text(),
  plot.caption = element_text()
)</code></pre>
<ul>
<li>Arguments of the function <code>element_text()</code> include:
<ul>
<li><code>color</code>, <code>size</code>, <code>face</code>, <code>family</code>: to change the text font color, size, face (“plain”, “italic”, “bold”, “bold.italic”) and family.</li>
<li><code>lineheight</code>: change space between two lines of text elements. Number between 0 and 1. Useful for multi-line plot titles.</li>
<li><code>hjust</code> and <code>vjust</code>: number in [0, 1], for horizontal and vertical adjustment of titles, respectively.
<ul>
<li><code>hjust = 0.5</code>: Center the plot titles.</li>
<li><code>hjust = 1</code>: Place the plot title on the right</li>
<li><code>hjust = 0</code>: Place the plot title on the left</li>
</ul></li>
</ul></li>
<li>Examples of R code:
<ul>
<li>Center main title and subtitle (<code>hjust = 0.5</code>)</li>
<li>Change color, size and face</li>
</ul></li>
</ul>
<pre class="r"><code>bxp + theme(
  plot.title = element_text(color = "red", size = 12, 
                            face = "bold", hjust = 0.5),
  plot.subtitle = element_text(color = "blue", hjust = 0.5),
  plot.caption = element_text(color = "green", face = "italic")
)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-the-appearance-of-titles-and-axis-labels-1.png" width="384" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Case of long titles</strong>. If the title is too long, you can split it into multiple lines using \n. In this case you can adjust the space between text lines by specifying the argument <code>lineheight</code> in the theme function <code>element_text()</code>:</li>
</ol>
<pre class="r"><code>bxp + labs(title = "Effect of Vitamin C on Tooth Growth \n in Guinea Pigs")+
  theme(plot.title = element_text(lineheight = 0.9))</code></pre>
</div>
<div id="axes-limits-ticks-and-log" class="section level2">
<h2>Axes: Limits, Ticks and Log</h2>
<div id="axis-limits-and-scales" class="section level3">
<h3>Axis limits and scales</h3>
<p><strong>3 Key functions to set the axis limits and scales</strong>:</p>
<ol style="list-style-type: decimal">
<li>Without clipping (preferred). Cartesian coordinates. The Cartesian coordinate system is the most common type of coordinate system. It will zoom the plot, without clipping the data.</li>
</ol>
<pre class="r"><code>sp + coord_cartesian(xlim = c(5, 20), ylim = (0, 50))</code></pre>
<ol start="2" style="list-style-type: decimal">
<li>With clipping the data (removes unseen data points). Observations not in this range will be dropped completely and not passed to any other layers.</li>
</ol>
<pre class="r"><code># Use this
sp + scale_x_continuous(limits = c(5, 20)) + 
  scale_y_continuous(limits = c(0, 50))

# Or this shothand functions
sp + xlim(5, 20) + ylim(0, 50)</code></pre>
<div class="warning">
<p>
Note that, <code>scale_x_continuous()</code> and <code>scale_y_continuous()</code> remove all data points outside the given range and, the <code>coord_cartesian()</code> function only adjusts the visible area.
</p>
<p>
In most cases you would not see the difference, but if you fit anything to the data the functions <code>scale_x_continuous() / scale_y_continuous()</code> would probably change the fitted values.
</p>
</div>
<ol start="3" style="list-style-type: decimal">
<li>Expand the plot limits to ensure that a given value is included in all panels or all plots.</li>
</ol>
<pre class="r"><code># set the intercept of x and y axes at (0,0)
sp + expand_limits(x = 0, y = 0)

# Expand plot limits
sp + expand_limits(x = c(5, 50), y = c(0, 150))</code></pre>
<p><strong>Examples of R code</strong>:</p>
<pre class="r"><code># Default plot
print(sp)

# Change axis limits using coord_cartesian()
sp + coord_cartesian(xlim =c(5, 20), ylim = c(0, 50))

# set the intercept of x and y axis at (0,0)
sp + expand_limits(x = 0, y = 0)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-axis-limits-and-scales-1.png" width="211.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-axis-limits-and-scales-2.png" width="211.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-axis-limits-and-scales-3.png" width="211.2" /></p>
</div>
<div id="log-scale" class="section level3">
<h3>Log scale</h3>
<p><strong>Key functions to set a logarithmic axis scale</strong>:</p>
<ol style="list-style-type: decimal">
<li>Scale functions. Allowed value for the argument trans: <code>log2</code> and <code>log10</code>.</li>
</ol>
<pre class="r"><code>sp + scale_x_continuous(trans = "log2")

sp + scale_y_continuous(trans = "log2")</code></pre>
<ol start="2" style="list-style-type: decimal">
<li>Transformed Cartesian coordinate system. Possible values for x and y are “log2”, “log10”, “sqrt”, …</li>
</ol>
<pre class="r"><code>sp + coord_trans(x = "log2", y = "log2")</code></pre>
<ol start="3" style="list-style-type: decimal">
<li>Display log scale ticks. Make sens only for log10 scale:</li>
</ol>
<pre class="r"><code>sp + scale_y_log10() + annotation_logticks()</code></pre>
<div class="warning">
<p>
Note that, the scale functions transform the data. If you fit anything to the data it would probably change the fitted values.
</p>
<p>
An alternative is to use the function <em>coord_trans</em>(), which occurs after statistical transformation and will affect only the visual appearance of geoms.
</p>
</div>
<p><strong>Example of R code</strong></p>
<pre class="r"><code># Set axis into log2 scale
# Possible values for trans : &amp;#39;log2&amp;#39;, &amp;#39;log10&amp;#39;,&amp;#39;sqrt&amp;#39;
sp + scale_x_continuous(trans = &amp;#39;log2&amp;#39;) +
  scale_y_continuous(trans = &amp;#39;log2&amp;#39;)

# Format y axis tick mark labels to show exponents
require(scales)
sp + scale_y_continuous(
  trans = log2_trans(),
  breaks = trans_breaks("log2", function(x) 2^x),
  labels = trans_format("log2", math_format(2^.x))
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-log2-scale-scale_x_continuous-and-scale_y_continuous-1.png" width="288" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-log2-scale-scale_x_continuous-and-scale_y_continuous-2.png" width="288" /></p>
</div>
<div id="axis-ticks-set-and-rotate-text-labels" class="section level3">
<h3>Axis Ticks: Set and Rotate Text Labels</h3>
<p>Start by creating a box plot:</p>
<pre class="r"><code>bxp <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
  geom_boxplot(aes(color = dose)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  theme(legend.position = "none")</code></pre>
<ol style="list-style-type: decimal">
<li><strong>Change the style and the orientation angle of axis tick labels</strong>. For a vertical rotation of x axis labels use <code>angle = 90</code>.</li>
</ol>
<pre class="r"><code># Rotate x and y axis text by 45 degree
# face can be "plain", "italic", "bold" or "bold.italic"
bxp + theme(axis.text.x = element_text(face = "bold", color = "#993333", 
                           size = 12, angle = 45),
          axis.text.y = element_text(face = "bold", color = "blue", 
                           size = 12, angle = 45))

# Remove axis ticks and tick mark labels
bxp + theme(
  axis.text.x = element_blank(), # Remove x axis tick labels
  axis.text.y = element_blank(), # Remove y axis tick labels
  axis.ticks = element_blank()   # Remove ticks 
  ) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-set-axis-ticks-mark-labels-1.png" width="220.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-set-axis-ticks-mark-labels-2.png" width="220.8" /></p>
<p>To adjust the position of the axis text, you can specify the argument <code>hjust</code> and <code>vjust</code>, which values should be comprised between 0 and 1.</p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Change axis lines</strong>:
<ul>
<li>Remove the y-axis line</li>
<li>Change the color, the size and the line type of the x-axis line:</li>
</ul></li>
</ol>
<pre class="r"><code>bxp + theme( 
  axis.line.y = element_blank(),
  axis.line = element_line(
    color = "gray", size = 1, linetype = "solid"
    )
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-axis-line-1.png" width="288" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Customize discrete axis</strong>. Use the function <code>scale_x_discrete()</code> or <code>scale_y_discrete()</code> depending on the axis you want to change.</li>
</ol>
<p>Here, we’ll customize the x-axis of the box plot:</p>
<pre class="r"><code># Change x axis label and the order of items
bxp + scale_x_discrete(name ="Dose (mg)", 
                    limits = c("2","1","0.5"))

# Rename / Change tick mark labels
bxp + scale_x_discrete(breaks = c("0.5","1","2"),
        labels = c("D0.5", "D1", "D2"))

# Choose which items to display
bxp + scale_x_discrete(limits = c("0.5", "2"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-customize-discrete-x-axis-order-of-items-1.png" width="192" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-customize-discrete-x-axis-order-of-items-2.png" width="192" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-customize-discrete-x-axis-order-of-items-3.png" width="192" /></p>
<ol start="4" style="list-style-type: decimal">
<li><strong>Customize continuous axis</strong>. Change axis ticks interval.</li>
</ol>
<pre class="r"><code># Default scatter plot
sp <- ggplot(cars, aes(x = speed, y = dist)) + 
  geom_point()
sp

# Break y axis by a specified value
# a tick mark is shown on every 50
sp + scale_y_continuous(breaks=seq(0, 150, 50))

# Tick marks can be spaced randomly
sp + scale_y_continuous(breaks=c(0, 50, 65, 75, 150))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-axis-ticks-interval-scale_y_continuous-breaks-1.png" width="192" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-axis-ticks-interval-scale_y_continuous-breaks-2.png" width="192" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-axis-ticks-interval-scale_y_continuous-breaks-3.png" width="192" /></p>
</div>
</div>
<div id="legends-title-position-and-appearance" class="section level2">
<h2>Legends: Title, Position and Appearance</h2>
<p>Start by creating a box plot using the <code>ToothGrowth</code> data set. Change the box plot fill color according to the grouping variable <code>dose</code>.</p>
<pre class="r"><code>library(ggplot2)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
bxp <- ggplot(ToothGrowth, aes(x = dose, y = len))+ 
  geom_boxplot(aes(fill = dose)) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<div id="change-legend-title-and-position" class="section level3">
<h3>Change legend title and position</h3>
<ol style="list-style-type: decimal">
<li><strong>Legend title</strong>. Use <code>labs()</code> to changes the legend title for a given aesthetics (fill, color, size, shape, . . . ). For example:</li>
</ol>
<ul>
<li>Use <code>p + labs(fill = "dose")</code> for geom_boxplot(aes(fill = dose))</li>
<li>Use <code>p + labs(color = "dose")</code> for geom_boxplot(aes(color = dose))</li>
<li>and so on for linetype, shape, etc</li>
</ul>
<ol start="2" style="list-style-type: decimal">
<li><strong>Legend position</strong>. The default legend position is “right”. Use the function <code>theme()</code> with the argument <code>legend.position</code> to specify the legend position.</li>
</ol>
<p>Allowed values for the legend position include: “left”, “top”, “right”, “bottom”, “none”.</p>
<p>Legend location can be also a numeric vector c(x,y), where x and y are the coordinates of the legend box. Their values should be between 0 and 1. c(0,0) corresponds to the “bottom left” and c(1,1) corresponds to the “top right” position. This makes it possible to place the legend inside the plot.</p>
<p>Examples:</p>
<pre class="r"><code># Default plot
bxp

# Change legend title and position
bxp +
  labs(fill = "Dose (mg)") +
  theme(legend.position = "top")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-legend-title-and-position-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-legend-title-and-position-2.png" width="316.8" /></p>
<div class="warning">
<p>
To remove legend, use <code>p + theme(legend.position = “none”)</code>.
</p>
</div>
</div>
<div id="change-the-appearance-of-legends" class="section level3">
<h3>Change the appearance of legends</h3>
<ul>
<li>Change legend text color and size</li>
<li>Change the legend box background color</li>
</ul>
<pre class="r"><code># Change the appearance of legend title and text labels
bxp + theme(
  legend.title = element_text(color = "blue", size = 10),
  legend.text = element_text(color = "red")
  )

# Change legend background color, key size and width
bxp + theme(
  # Change legend background color
  legend.background = element_rect(fill = "darkgray"),
  legend.key = element_rect(fill = "lightblue", color = NA),
  # Change legend key size and key width
  legend.key.size = unit(1.5, "cm"),
  legend.key.width = unit(0.5,"cm") 
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-appearance-of-legend-text-and-background-color-1.png" width="307.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-appearance-of-legend-text-and-background-color-2.png" width="307.2" /></p>
</div>
<div id="rename-legend-labels-and-change-the-order-of-items" class="section level3">
<h3>Rename legend labels and change the order of items</h3>
<pre class="r"><code># Change the order of legend items
bxp + scale_x_discrete(limits=c("2", "0.5", "1"))

# Edit legend title and labels for the fill aesthetics
bxp + scale_fill_manual(
  values = c("#00AFBB", "#E7B800", "#FC4E07"),
  name = "Dose", 
  breaks = c("0.5", "1", "2"),
  labels = c("D0.5", "D1", "D2")
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-legend-item-order-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-legend-item-order-2.png" width="316.8" /></p>
<p>Other manual scales to set legends for a given aesthetic:</p>
<pre class="r"><code># Color of lines and points
scale_color_manual(name, labels, limits, breaks)
# For linetypes
scale_linetype_manual(name, labels, limits, breaks)
# For point shapes
scale_shape_manual(name, labels, limits, breaks)
# For point size
scale_size_manual(name, labels, limits, breaks)
# Opacity/transparency
scale_alpha_manual(name, labels, limits, breaks)</code></pre>
</div>
</div>
<div id="themes-gallery" class="section level2">
<h2>Themes gallery</h2>
<p>Start by creating a simple box plot:</p>
<pre class="r"><code>bxp <- ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + 
  geom_boxplot()</code></pre>
<div id="use-themes-in-ggplot2-package" class="section level3">
<h3>Use themes in ggplot2 package</h3>
<p>Several simple functions are available in ggplot2 package to set easily a ggplot theme. These include:</p>
<ul>
<li><code>theme_gray()</code>: Gray background color and white grid lines. Put the data forward to make comparisons easy.</li>
<li><code>theme_bw()</code>: White background and gray grid lines. May work better for presentations displayed with a projector.</li>
<li><code>theme_linedraw()</code>: A theme with black lines of various widths on white backgrounds, reminiscent of a line drawings.</li>
<li><code>theme_light()</code>: A theme similar to <code>theme_linedraw()</code> but with light grey lines and axes, to direct more attention towards the data.</li>
</ul>
<pre class="r"><code>bxp + theme_gray(base_size = 14) 

bxp + theme_bw()

bxp + theme_linedraw()

bxp + theme_light()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-theme-gray-bw-linedraw-light-1.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-theme-gray-bw-linedraw-light-2.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-theme-gray-bw-linedraw-light-3.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-theme-gray-bw-linedraw-light-4.png" width="153.6" /></p>
<ul>
<li><code>theme_dark()</code>: Same as theme_light but with a dark background. Useful to make thin colored lines pop out.</li>
<li><code>theme_minimal()</code>: A minimal theme with no background annotations</li>
<li><code>theme_classic()</code>: A classic theme, with x and y axis lines and no grid lines.</li>
<li><code>theme_void()</code>: a completely empty theme, useful for plots with non-standard coordinates or for drawings.</li>
</ul>
<pre class="r"><code>bxp + theme_dark() 

bxp + theme_minimal()

bxp + theme_classic()

bxp + theme_void()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-theme-minimal-dark-classic-void-1.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-theme-minimal-dark-classic-void-2.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-theme-minimal-dark-classic-void-3.png" width="153.6" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-theme-minimal-dark-classic-void-4.png" width="153.6" /></p>
<div class="warning">
<p>
Note that, additional themes are available in the <a href="https://cran.r-project.org/web/packages/ggthemes/vignettes/ggthemes.html">ggthemes R package</a>.
</p>
</div>
</div>
</div>
<div id="background-color-and-grid-lines" class="section level2">
<h2>Background color and grid lines</h2>
<ul>
<li>Create a simple box plot:</li>
</ul>
<pre class="r"><code>p <- ggplot(ToothGrowth, aes(factor(dose), len)) +
  geom_boxplot()</code></pre>
<ul>
<li>Change the panel background (1) and the plot background (2) colors:</li>
</ul>
<pre class="r"><code># 1. Change plot panel background color to lightblue
# and the color of major/grid lines to white
p + theme(
  panel.background = element_rect(fill = "#BFD5E3", colour = "#6D9EC1",
                                size = 2, linetype = "solid"),
  panel.grid.major = element_line(size = 0.5, linetype = &amp;#39;solid&amp;#39;,
                                colour = "white"), 
  panel.grid.minor = element_line(size = 0.25, linetype = &amp;#39;solid&amp;#39;,
                                colour = "white")
  )

# 2. Change the plot background color (not the panel)
p + theme(plot.background = element_rect(fill = "#BFD5E3"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-theme-background-colors-and-grid-lines-1.png" width="240" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-theme-background-colors-and-grid-lines-2.png" width="240" /></p>
</div>
<div id="add-background-image-to-ggplot2-graphs" class="section level2">
<h2>Add background image to ggplot2 graphs</h2>
<ol style="list-style-type: decimal">
<li><p><strong>Import the background image</strong>. Use either the function <code>readJPEG()</code> [in <em>jpeg</em> package] or the function `readPNG()[in <em>png</em> package] depending on the format of the background image.</p></li>
<li><p><strong>Combine a ggplot with the background image</strong>. R function: <code>background_image()</code> [in ggpubr].</p></li>
</ol>
<pre class="r"><code># Import the image
img.file <- system.file("https://www.sthda.com/english/sthda-upload/images/r-graphics-essentials/background-image.png",
                        package = "ggpubr")
img.file <- "https://www.sthda.com/english/sthda-upload/images/r-graphics-essentials/background-image.png"

# Combine with ggplot
library(ggpubr)
ggplot(iris, aes(Species, Sepal.Length))+
  background_image(img)+
  geom_boxplot(aes(fill = Species), color = "white", alpha = 0.5)+
  fill_palette("jco")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-overlay-ggplot2-with-background-image-1.png" width="432" /></p>
</div>
<div id="colors" class="section level2">
<h2>Colors</h2>
<p>A color can be specified either by name (e.g.: “red”) or by hexadecimal code (e.g. : “#FF1234”). In this section, you will learn how to change ggplot colors by groups and how to set gradient colors.</p>
<ol start="0" style="list-style-type: decimal">
<li><strong>Set ggplot theme</strong> to <code>theme_minimal()</code>:</li>
</ol>
<pre class="r"><code>theme_set(
  theme_minimal() +
    theme(legend.position = "top")
  )</code></pre>
<ol style="list-style-type: decimal">
<li><strong>Initialize ggplots</strong> using the <code>iris</code> data set:</li>
</ol>
<pre class="r"><code># Box plot
bp <- ggplot(iris, aes(Species, Sepal.Length))

# Scatter plot
sp <- ggplot(iris, aes(Sepal.Length, Sepal.Width))</code></pre>
<ol start="2" style="list-style-type: decimal">
<li><strong>Specify a single color</strong>. Change the fill color (in box plots) and points color (in scatter plots).</li>
</ol>
<pre class="r"><code># Box plot
bp + geom_boxplot(fill = "#FFDB6D", color = "#C4961A") 

# Scatter plot
sp + geom_point(color = "#00AFBB")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-unique-color-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-unique-color-2.png" width="316.8" /></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>Change colors by groups</strong>.</li>
</ol>
<p>You can change colors according to a grouping variable by:</p>
<ul>
<li>Mapping the argument <code>color</code> to the variable of interest. This will be applied to points, lines and texts</li>
<li>Mapping the argument <code>fill</code> to the variable of interest. This will change the fill color of areas, such as in box plot, bar plot, histogram, density plots, etc.</li>
</ul>
<p>It’s possible to specify manually the color palettes by using the functions:</p>
<ul>
<li><code>scale_fill_manual()</code> for box plot, bar plot, violin plot, dot plot, etc</li>
<li><code>scale_color_manual()</code> or <code>scale_colour_manual()</code> for lines and points</li>
</ul>
<pre class="r"><code># Box plot
bp <- bp + geom_boxplot(aes(fill = Species)) 
bp + scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))

# Scatter plot
sp <- sp + geom_point(aes(color = Species)) 
sp + scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-custom-color-fill-color-manual-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-custom-color-fill-color-manual-2.png" width="316.8" /></p>
<p>Find below, two color-blind-friendly palettes, one with gray, and one with black (source: <a href="http://jfly.iam.u-tokyo.ac.jp/color/" class="uri">http://jfly.iam.u-tokyo.ac.jp/color/</a>).</p>
<pre class="r"><code># The palette with grey:
cbp1 <- c("#999999", "#E69F00", "#56B4E9", "#009E73",
          "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

# The palette with black:
cbp2 <- c("#000000", "#E69F00", "#56B4E9", "#009E73",
          "#F0E442", "#0072B2", "#D55E00", "#CC79A7")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-colorblind-frindly-color-palette-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-colorblind-frindly-color-palette-2.png" width="316.8" /></p>
<ol start="4" style="list-style-type: decimal">
<li><strong>Use viridis color palettes</strong>. The <code>viridis</code> R package provides color palettes to make beautiful plots that are: printer-friendly, perceptually uniform and easy to read by those with colorblindness. Key functions <code>scale_color_viridis()</code> and <code>scale_fill_viridis()</code></li>
</ol>
<pre class="r"><code>library(viridis)
# Gradient color
ggplot(iris, aes(Sepal.Length, Sepal.Width))+
  geom_point(aes(color = Sepal.Length)) +
  scale_color_viridis(option = "D")

# Discrete color. use the argument discrete = TRUE
ggplot(iris, aes(Sepal.Length, Sepal.Width))+
  geom_point(aes(color = Species)) +
  geom_smooth(aes(color = Species, fill = Species), method = "lm") + 
  scale_color_viridis(discrete = TRUE, option = "D")+
  scale_fill_viridis(discrete = TRUE) </code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-viridis-discrete-color-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot2-viridis-discrete-color-2.png" width="316.8" /></p>
<ol start="5" style="list-style-type: decimal">
<li><strong>Use RColorBrewer palettes</strong>. Two color scale functions are available in ggplot2 for using the colorbrewer palettes:

</li>
</ol>
<ul>
<li><code>scale_fill_brewer()</code> for box plot, bar plot, violin plot, dot plot, etc</li>
<li><code>scale_color_brewer()</code> for lines and points</li>
</ul>
<p>For example:</p>
<pre class="r"><code># Box plot
bp + scale_fill_brewer(palette = "Dark2")

# Scatter plot
sp + scale_color_brewer(palette = "Dark2")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-group-color-rcolorbrewer-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-group-color-rcolorbrewer-2.png" width="316.8" /></p>
<p>To display colorblind-friendly brewer palettes, use this R code:</p>
<pre class="r"><code>library(RColorBrewer)
display.brewer.all(colorblindFriendly = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-rcolorbrewer-palettes-colorblind-friendly-1.png" width="288" /></p>
<ol start="6" style="list-style-type: decimal">
<li><strong>Other discrete color palettes</strong>:
<ul>
<li><strong>Scientific journal color palettes in the ggsci R package</strong>. Contains a collection of high-quality color palettes inspired by colors used in scientific journals, data visualization libraries, and more. For example:
<ul>
<li><code>scale_color_npg()</code> and <code>scale_fill_npg()</code>: Nature Publishing Group</li>
<li><code>scale_color_aaas()</code> and <code>scale_fill_aaas()</code>: American Association for the Advancement of Science</li>
<li><code>scale_color_lancet()</code> and <code>scale_fill_lancet()</code>: Lancet journal</li>
<li><code>scale_color_jco()</code> and <code>scale_fill_jco()</code>: Journal of Clinical Oncology</li>
</ul></li>
<li><strong>Wes Anderson color palettes in the wesanderson R package</strong>. Contains 16 color palettes from Wes Anderson movies.</li>
</ul></li>
</ol>
<p>For example:</p>
<pre class="r"><code># jco color palette from the ggsci package
bp + ggsci::scale_fill_jco()

# Discrete color from wesanderson package
library(wesanderson)
bp + scale_fill_manual(
  values = wes_palette("GrandBudapest1", n = 3)
  )</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggsci-and-wesanderson-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggsci-and-wesanderson-2.png" width="316.8" /></p>
<p>You can find more examples at <a href="https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html">ggsci package vignettes</a> and at <a href="https://github.com/karthik/wesanderson">wesanderson github page</a></p>
<ol start="7" style="list-style-type: decimal">
<li><strong>Set gradient colors</strong>. For gradient colors, you should map the map the argument <code>color</code> and/or <code>fill</code> to a continuous variable. In the following example, we color points according to the variable: <code>Sepal.Length</code>.</li>
</ol>
<pre class="r"><code>ggplot(iris, aes(Sepal.Length, Sepal.Width))+
  geom_point(aes(color = Sepal.Length)) +
  scale_color_gradientn(colours = c("blue", "yellow", "red"))+
  theme(legend.position = "right")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-gradient-colors-1.png" width="480" /></p>
<ol start="8" style="list-style-type: decimal">
<li><strong>Design and use the power of color palette</strong> at <a href="https://goo.gl/F5g3Lb" class="uri">https://goo.gl/F5g3Lb</a></li>
</ol>
</div>
<div id="points-shape-color-and-size" class="section level2">
<h2>Points shape, color and size</h2>
<ol style="list-style-type: decimal">
<li><strong>Common point shapes available in R</strong>:</li>
</ol>
<pre class="r"><code>ggpubr::show_point_shapes()+
  theme_void()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-point-shapes-available-inr-r-1.png" width="288" /></p>
<div class="warning">
<p>
Note that, the point shape options from pch 21 to 25 are open symbols that can be filled by a color. Therefore, you can use the fill argument in geom_point() for these symbols.
</p>
</div>
<ol start="2" style="list-style-type: decimal">
<li><strong>Change ggplot point shapes</strong>. The argument <code>shape</code> is used, in the function <code>geom_point()</code> [ggplot2], for specifying point shapes.</li>
</ol>
<p>It’s also possible to change point shapes and colors by groups. In this case, ggplot2 will use automatically a default color palette and point shapes. You can change manually the appearance of points using the following functions:</p>
<ul>
<li><code>scale_shape_manual()</code> : to change manually point shapes</li>
<li><code>scale_color_manual()</code> : to change manually point colors</li>
<li><code>scale_size_manual()</code> : to change manually the size of points</li>
</ul>
<p>Create a scatter plot and change points shape, color and size:</p>
<pre class="r"><code># Create a simple scatter plot
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_point(shape = 18, color = "#FC4E07", size = 3)+
  theme_minimal()

# Change point shapes and colors by groups
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_point(aes(shape = Species, color = Species), size = 3) +
  scale_shape_manual(values = c(5, 16, 17)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
  theme_minimal() +
  theme(legend.position = "top")</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-ggplot-point-shapes-1.png" width="316.8" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-ggplot-point-shapes-2.png" width="316.8" /></p>
</div>
<div id="line-types" class="section level2">
<h2>Line types</h2>
<ol style="list-style-type: decimal">
<li><strong>Common line types available in R</strong>:</li>
</ol>
<pre class="r"><code>ggpubr::show_line_types()+
  theme_gray()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-ggplot-line-types-in-linetype-1.png" width="336" /></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Change line types</strong>. To change a single line, use for example <code>linetype = "dashed"</code>.</li>
</ol>
<p>In the following R code, we’ll change line types and colors by groups. To modify the default colors and line types, the function <code>scale_color_manual()</code> and <code>scale_linetype_manual()</code> can be used.</p>
<pre class="r"><code># Create some data.
# # Compute the mean of `len` grouped by dose and supp
library(dplyr)
df2 <- ToothGrowth %>%
  group_by(dose, supp) %>%
  summarise(len.mean = mean(len))
df2</code></pre>
<pre><code>## # A tibble: 6 x 3
## # Groups:   dose [?]
##     dose   supp len.mean
##   <fctr> <fctr>    <dbl>
## 1    0.5     OJ    13.23
## 2    0.5     VC     7.98
## 3      1     OJ    22.70
## 4      1     VC    16.77
## 5      2     OJ    26.06
## 6      2     VC    26.14</code></pre>
<pre class="r"><code># Change manually line type and color manually
ggplot(df2, aes(x = dose, y = len.mean, group = supp)) +
  geom_line(aes(linetype = supp, color = supp))+
  geom_point(aes(color = supp))+
  scale_linetype_manual(values=c("solid", "dashed"))+
  scale_color_manual(values=c("#00AFBB","#FC4E07"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-change-line-types-by-groups-1.png" width="384" /></p>
</div>
<div id="rotate-a-ggplot" class="section level2">
<h2>Rotate a ggplot</h2>
<p>Key functions:</p>
<ul>
<li><code>coord_flip()</code>: creates horizontal plots</li>
<li><code>scale_x_reverse()</code> and <code>scale_y_reverse()</code>: reverse the axis</li>
</ul>
<pre class="r"><code># Horizontal box plot
ggplot(ToothGrowth, aes(factor(dose), len)) +
  geom_boxplot(fill = "lightgray") +
  theme_bw() +
  coord_flip()

# Reverse y axis
ggplot(mtcars, aes(mpg))+
  geom_density(fill = "lightgray") +
  xlim(0, 40) +
  theme_bw()+
  scale_y_reverse()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-rotate-a-ggplot-1.png" width="288" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-rotate-a-ggplot-2.png" width="288" /></p>
</div>
<div id="plot-annotation" class="section level2">
<h2>Plot annotation</h2>
<div id="add-straight-lines" class="section level3">
<h3>Add straight lines</h3>
<p>Key R functions:</p>
<ul>
<li><strong>geom_hline</strong>(yintercept, linetype, color, size): add horizontal lines</li>
<li><strong>geom_vline</strong>(xintercept, linetype, color, size): add vertical lines</li>
<li><strong>geom_abline</strong>(intercept, slope, linetype, color, size): add regression lines</li>
<li><strong>geom_segment()</strong>: add segments</li>
</ul>
<p>Create a simple scatter plot:</p>
<ul>
<li><strong>Creating a simple scatter plot</strong></li>
</ul>
<pre class="r"><code>sp <- ggplot(data = mtcars, aes(x = wt, y = mpg)) + 
  geom_point()+theme_bw()</code></pre>
<ul>
<li><strong>Add straight lines and segments</strong></li>
</ul>
<pre class="r"><code># Add horizontal line at y = 2O; and vertical line at x = 3
sp + geom_hline(yintercept = 20, linetype = "dashed", color = "red") + 
  geom_vline(xintercept = 3, color = "blue", size = 1)

# Add regression line
sp + geom_abline(intercept = 37, slope = -5, color="blue")+
  labs(title = "y = -5X + 37")

# Add a vertical line segment from
# point A(4, 15) to point B(4, 27)
sp + geom_segment(x = 4, y = 15, xend = 4, yend = 27)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-straight-line-1.png" width="211.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-straight-line-2.png" width="211.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-straight-line-3.png" width="211.2" /></p>
<ul>
<li><strong>Add arrows, curves and rectangles</strong>:</li>
</ul>
<pre class="r"><code># Add arrow at the end of the segment
require(grid)
sp + geom_segment(x = 5, y = 30, xend = 3.5, yend = 25,
                 arrow = arrow(length = unit(0.5, "cm")))

# Add curves
sp + geom_curve(aes(x = 2, y = 15, xend = 3, yend = 15))

# Add rectangles
ggplot(data = mtcars, aes(x = wt, y = mpg)) + 
  geom_rect(xmin = 3, ymin = -Inf, xmax = 4, ymax = Inf,
            fill = "lightgray") +
  geom_point() + theme_bw()</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-add-arrows-curves-and-rectangles-1.png" width="211.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-add-arrows-curves-and-rectangles-2.png" width="211.2" /><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-add-arrows-curves-and-rectangles-3.png" width="211.2" /></p>
</div>
<div id="text-annotation" class="section level3">
<h3>Text annotation</h3>
<p>Key ggplot2 function:</p>
<ul>
<li><strong>geom_text</strong>(): adds text directly to the plot</li>
<li><strong>geom_label</strong>(): draws a rectangle underneath the text, making it easier to read.</li>
<li><strong>annotate</strong>(): useful for adding small text annotations at a particular location on the plot</li>
<li><strong>annotation_custom</strong>(): Adds static annotations that are the same in every panel</li>
</ul>
<pre class="r"><code># Add text at a particular coordinate
sp + annotate("text", x = 3, y = 30, 
              label = "Scatter plot",
              color = "red", fontface = 2)</code></pre>
<p><img src="https://www.sthda.com/english/sthda-upload/figures/r-graphics-essentials/013-ggplot-cheatsheet-for-great-customization-r-graphics-cookbook-and-examples-for-great-data-visualization-text-annotation-1.png" width="240" /></p>
</div>
</div>


</div><!--end rdoc-->


<!-- END HTML -->]]></description>
			<pubDate>Fri, 17 Nov 2017 10:55:00 +0100</pubDate>
			
		</item>
		
	</channel>
</rss>
