Articles - ggpubr: Publication Ready Plots

Plot Means/Medians and Error Bars

  |   1406  |  Post a comment  |  ggpubr: Publication Ready Plots

In this article, we’ll describe how to plot easily means or medians with error bars. We’ll use ggplot2 based helper functions available in the ggpubr R package.

Contents:

Prerequisites

Required R package

You need to install the R package ggpubr, to easily create ggplot2-based publication ready plots.

We recommend to install the latest developmental version from GitHub as follow:

if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")

If the installation from Github failed, then try to install from CRAN as follow:

install.packages("ggpubr")

Load ggpubr:

library(ggpubr)

Demo data sets

Data: ToothGrowth and mtcars data sets.

# ToothGrowth
data("ToothGrowth")
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
# mtcars 
data("mtcars")
head(mtcars[, c("wt", "mpg", "cyl")])
##                     wt  mpg cyl
## Mazda RX4         2.62 21.0   6
## Mazda RX4 Wag     2.88 21.0   6
## Datsun 710        2.32 22.8   4
## Hornet 4 Drive    3.21 21.4   6
## Hornet Sportabout 3.44 18.7   8
## Valiant           3.46 18.1   6

Error plots

R function: ggerrorplot() [in ggpubr].

Simplified format:

ggerrorplot(data, x, y, desc_stat = "mean_se")
  • data: a data frame
  • x, y: x and y variables for plotting
  • desc_stat: descriptive statistics to be used for visualizing errors. Default value is “mean_se”. Allowed values are one of , “mean”, “mean_se”, “mean_sd”, “mean_ci”, “mean_range”, “median”, “median_iqr”, “median_mad”, “median_range”

For example, the following R code uses the ToothGrowth data set and plots y = “len” by x = “dose”.

# Mean +/- standard deviation
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd")
# Change error plot type and add mean points
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd",
            error.plot = "errorbar",            # Change error plot type
            add = "mean"                        # Add mean points
            )

It’s also possible to add jitter points (representing individual points), dot plots and violin plots:

# Add jittered points
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", color = "black",
            add = "jitter", add.params = list(color = "darkgray")
            )
# Add dot plots
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", color = "black",
            add = "dotplot", add.params = list(color = "darkgray")
            )
# Add violin plots
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", color = "black",
            add = "violin", add.params = list(color = "darkgray")
            )

To add p-values comparing means, use this:

# Specify the comparisons you want
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
ggerrorplot(ToothGrowth, x = "dose", y = "len",
            desc_stat = "mean_sd", color = "black",
            add = "violin", add.params = list(color = "darkgray"))+ 
  stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
  stat_compare_means(label.y = 50)                  # Add global p-value

Read more at : Add P-values and Significance Levels to ggplots.

Color by a grouping variable:

# Color by "dose" (same variable used on x-axis)
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", 
            color = "dose", palette = "jco")
# Color by another grouping variable "supp"
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", 
            color = "supp", palette = "jco",
            position = position_dodge(0.3)     # Adjust the space between bars
            )

Line plots

You can create a line plot of mean +/- error using the function ggline()[in ggpubr]. The format is as follow:

# Basic line plots of means +/- se with jittered points
ggline(ToothGrowth, x = "dose", y = "len", 
       add = c("mean_se", "jitter"))

Color by groups:

ggline(ToothGrowth, x = "dose", y = "len", 
       add = c("mean_se", "jitter"),
       color = "supp", palette = "jco")

Bar plots

R function ggbarplot()[in ggpubr]. The format is as follow:

# Basic bar plots of means +/- se with jittered points
ggbarplot(ToothGrowth, x = "dose", y = "len", 
       add = c("mean_se", "jitter"))

Color by groups:

ggbarplot(ToothGrowth, x = "dose", y = "len", 
          add = c("mean_se", "jitter"),
          color = "supp", palette = "jco",
          position = position_dodge(0.8))

Add labels

In this section we’ll plot group means with individual information.

Data: mtcars.

# Prepare the data set
# Use row names as individual names
df <- as.data.frame(mtcars[, c("am", "hp")])
df$name <- rownames(df)
head(df)
##                   am  hp              name
## Mazda RX4          1 110         Mazda RX4
## Mazda RX4 Wag      1 110     Mazda RX4 Wag
## Datsun 710         1  93        Datsun 710
## Hornet 4 Drive     0 110    Hornet 4 Drive
## Hornet Sportabout  0 175 Hornet Sportabout
## Valiant            0 105           Valiant

Create a bar plot with individual labels.

set.seed(123)
# Bar plot of mean +/- se, add individual points
ggbarplot(df, x = "am", y = "hp",
          add = c("mean_se", "point"),
          color = "am", fill = "am", alpha = 0.5,
          palette = "jco")+
   ggrepel::geom_text_repel(aes(label = name))

Application to gene expression data

In our previous article - Facilitating Exploratory Data Visualization: Application to TCGA Genomic Data - we described how to visualize gene expression data using box plots, violin plots, dot plots and stripcharts. We also demonstrated how to combine the plot of multiples variables (genes) in the same plot.

Here we provide some R code to visualize the mean expression profile of one or multiple genes. We’ll use the gene expression data set described in our previous tutorial: Facilitating Exploratory Data Visualization: Application to TCGA Genomic Data.

expr <- read.delim("https://raw.githubusercontent.com/kassambara/data/master/expr_tcga.txt",
                   stringsAsFactors = FALSE)

The data set contains the mRNA expression for five genes of interest - GATA3, PTEN, XBP1, ESR1 and MUC1 - from 3 different data sets:

  • Breast invasive carcinoma (BRCA),
  • Ovarian serous cystadenocarcinoma (OV) and
  • Lung squamous cell carcinoma (LUSC)

The R code below displays the mean expression of three genes - “GATA3”, “PTEN” and “XBP1”.

ggline(expr, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      add = "mean_sd")

You can also add other geometries on the mean plot such as jitter points, dotplot or violin. To add a violin plot, type this:

ggline(expr, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      color = "gray",                                     # Line color
      add = c("mean_sd", "violin"),                     
      add.params = list(color = "dataset"),
      palette = "jco"
      )

To add jitter points, we’ll use a small subset of data for readability:

# Subset 50 random rows
set.seed(123)
random_rows <- sample(1:nrow(expr), 50)
expr2 <- expr[random_rows, ]
# Visualize
ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      color = "gray",                           
      add = c("mean_sd", "jitter"),                     
      add.params = list(color = "dataset", size = 0.5),
      palette = "jco"
      )

As previously shown, you can merge the three plots as follow:

# Merge the three plot
ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = "mean_sd",
      palette = "jco")
# Add  jitter
ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = c("mean_sd", "jitter"),              # Add jitter points
      palette = "jco")

Show line labels:

ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression",                   
      add = "mean",   
      show.line.label = TRUE,
      repel = TRUE,
      legend = "none",
      palette = rep("black", 3)   # Black color for each line
      )

Let’s plot a complex plot with point labels:

# Add  jitter
ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = c("mean_se", "jitter"),              # Add mean_se and jitter points
      add.params = list(size = 0.7),             # Add point size
      label = "bcr_patient_barcode",             # Add point labels
      label.select = list(top.up = 2),           # show only labels for the top 2 points
      font.label = list(color = ".y."),          # Color labels by .y., here gene names
      repel = TRUE,                              # Use repel to avoid labels overplotting
      palette = "jco")

Plot a bar plot of means:

# Create bar plots
ggbarplot(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      add = "mean_se")

# Merge bar plots
ggbarplot(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = "mean_se", palette = "jco")

Error plots:

# Create error plots
ggerrorplot(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      add = "mean_sd")

# Merge error plots
ggerrorplot(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = "mean_sd", palette = "jco",
      position = position_dodge(0.3)
      )