Plot Means/Medians and Error Bars

kassambara | 01/09/2017 | 116523 | Comments (3) | ggpubr: Publication Ready Plots

In this article, we’ll describe how to plot easily means or medians with error bars. We’ll use ggplot2 based helper functions available in the ggpubr R package.

Contents:

Prerequisites
- Required R package
- Demo data sets
Error plots
Line plots
Bar plots
Add labels
Application to gene expression data

Prerequisites

Required R package

You need to install the R package ggpubr, to easily create ggplot2-based publication ready plots.

We recommend to install the latest developmental version from GitHub as follow:

if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")

If the installation from Github failed, then try to install from CRAN as follow:

install.packages("ggpubr")

Load ggpubr:

library(ggpubr)

Demo data sets

Data: ToothGrowth and mtcars data sets.

# ToothGrowth
data("ToothGrowth")
head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

# mtcars 
data("mtcars")
head(mtcars[, c("wt", "mpg", "cyl")])

##                     wt  mpg cyl
## Mazda RX4         2.62 21.0   6
## Mazda RX4 Wag     2.88 21.0   6
## Datsun 710        2.32 22.8   4
## Hornet 4 Drive    3.21 21.4   6
## Hornet Sportabout 3.44 18.7   8
## Valiant           3.46 18.1   6

Error plots

R function: ggerrorplot() [in ggpubr].

Simplified format:

ggerrorplot(data, x, y, desc_stat = "mean_se")

data: a data frame
x, y: x and y variables for plotting
desc_stat: descriptive statistics to be used for visualizing errors. Default value is “mean_se”. Allowed values are one of , “mean”, “mean_se”, “mean_sd”, “mean_ci”, “mean_range”, “median”, “median_iqr”, “median_mad”, “median_range”

For example, the following R code uses the ToothGrowth data set and plots y = “len” by x = “dose”.

# Mean +/- standard deviation
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd")
# Change error plot type and add mean points
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd",
            error.plot = "errorbar",            # Change error plot type
            add = "mean"                        # Add mean points
            )

It’s also possible to add jitter points (representing individual points), dot plots and violin plots:

# Add jittered points
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", color = "black",
            add = "jitter", add.params = list(color = "darkgray")
            )
# Add dot plots
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", color = "black",
            add = "dotplot", add.params = list(color = "darkgray")
            )
# Add violin plots
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", color = "black",
            add = "violin", add.params = list(color = "darkgray")
            )

To add p-values comparing means, use this:

# Specify the comparisons you want
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
ggerrorplot(ToothGrowth, x = "dose", y = "len",
            desc_stat = "mean_sd", color = "black",
            add = "violin", add.params = list(color = "darkgray"))+ 
  stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
  stat_compare_means(label.y = 50)                  # Add global p-value

Color by a grouping variable:

# Color by "dose" (same variable used on x-axis)
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", 
            color = "dose", palette = "jco")
# Color by another grouping variable "supp"
ggerrorplot(ToothGrowth, x = "dose", y = "len", 
            desc_stat = "mean_sd", 
            color = "supp", palette = "jco",
            position = position_dodge(0.3)     # Adjust the space between bars
            )

Line plots

You can create a line plot of mean +/- error using the function ggline()[in ggpubr]. The format is as follow:

# Basic line plots of means +/- se with jittered points
ggline(ToothGrowth, x = "dose", y = "len", 
       add = c("mean_se", "jitter"))

Color by groups:

ggline(ToothGrowth, x = "dose", y = "len", 
       add = c("mean_se", "jitter"),
       color = "supp", palette = "jco")

Bar plots

R function ggbarplot()[in ggpubr]. The format is as follow:

# Basic bar plots of means +/- se with jittered points
ggbarplot(ToothGrowth, x = "dose", y = "len", 
       add = c("mean_se", "jitter"))

Color by groups:

ggbarplot(ToothGrowth, x = "dose", y = "len", 
          add = c("mean_se", "jitter"),
          color = "supp", palette = "jco",
          position = position_dodge(0.8))

Add labels

In this section we’ll plot group means with individual information.

Data: mtcars.

# Prepare the data set
# Use row names as individual names
df <- as.data.frame(mtcars[, c("am", "hp")])
df$name <- rownames(df)
head(df)

##                   am  hp              name
## Mazda RX4          1 110         Mazda RX4
## Mazda RX4 Wag      1 110     Mazda RX4 Wag
## Datsun 710         1  93        Datsun 710
## Hornet 4 Drive     0 110    Hornet 4 Drive
## Hornet Sportabout  0 175 Hornet Sportabout
## Valiant            0 105           Valiant

Create a bar plot with individual labels.

set.seed(123)
# Bar plot of mean +/- se, add individual points
ggbarplot(df, x = "am", y = "hp",
          add = c("mean_se", "point"),
          color = "am", fill = "am", alpha = 0.5,
          palette = "jco")+
   ggrepel::geom_text_repel(aes(label = name))

Application to gene expression data

In our previous article - Facilitating Exploratory Data Visualization: Application to TCGA Genomic Data - we described how to visualize gene expression data using box plots, violin plots, dot plots and stripcharts. We also demonstrated how to combine the plot of multiples variables (genes) in the same plot.

Here we provide some R code to visualize the mean expression profile of one or multiple genes. We’ll use the gene expression data set described in our previous tutorial: Facilitating Exploratory Data Visualization: Application to TCGA Genomic Data.

expr <- read.delim("https://raw.githubusercontent.com/kassambara/data/master/expr_tcga.txt",
                   stringsAsFactors = FALSE)

The data set contains the mRNA expression for five genes of interest - GATA3, PTEN, XBP1, ESR1 and MUC1 - from 3 different data sets:

Breast invasive carcinoma (BRCA),
Ovarian serous cystadenocarcinoma (OV) and
Lung squamous cell carcinoma (LUSC)

The R code below displays the mean expression of three genes - “GATA3”, “PTEN” and “XBP1”.

ggline(expr, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      add = "mean_sd")

You can also add other geometries on the mean plot such as jitter points, dotplot or violin. To add a violin plot, type this:

ggline(expr, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      color = "gray",                                     # Line color
      add = c("mean_sd", "violin"),                     
      add.params = list(color = "dataset"),
      palette = "jco"
      )

To add jitter points, we’ll use a small subset of data for readability:

# Subset 50 random rows
set.seed(123)
random_rows <- sample(1:nrow(expr), 50)
expr2 <- expr[random_rows, ]
# Visualize
ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      color = "gray",                           
      add = c("mean_sd", "jitter"),                     
      add.params = list(color = "dataset", size = 0.5),
      palette = "jco"
      )

As previously shown, you can merge the three plots as follow:

# Merge the three plot
ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = "mean_sd",
      palette = "jco")
# Add  jitter
ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = c("mean_sd", "jitter"),              # Add jitter points
      palette = "jco")

Show line labels:

ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression",                   
      add = "mean",   
      show.line.label = TRUE,
      repel = TRUE,
      legend = "none",
      palette = rep("black", 3)   # Black color for each line
      )

Let’s plot a complex plot with point labels:

# Add  jitter
ggline(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = c("mean_se", "jitter"),              # Add mean_se and jitter points
      add.params = list(size = 0.7),             # Add point size
      label = "bcr_patient_barcode",             # Add point labels
      label.select = list(top.up = 2),           # show only labels for the top 2 points
      font.label = list(color = ".y."),          # Color labels by .y., here gene names
      repel = TRUE,                              # Use repel to avoid labels overplotting
      palette = "jco")

Plot a bar plot of means:

# Create bar plots
ggbarplot(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      add = "mean_se")

# Merge bar plots
ggbarplot(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = "mean_se", palette = "jco")

Error plots:

# Create error plots
ggerrorplot(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      combine = TRUE,
      ylab = "Expression", 
      add = "mean_sd")

# Merge error plots
ggerrorplot(expr2, x = "dataset",
      y = c("GATA3", "PTEN", "XBP1"),
      merge = TRUE,
      ylab = "Expression", 
      add = "mean_sd", palette = "jco",
      position = position_dodge(0.3)
      )

Comments

You are not authorized to post a comment

Comment

Visitor

#620 10/06/2018 at 06h40

Hi, super useful codes.

However, when I use the function add = "mean_se" to build multi grouped bar plot, it always shows the SE of the whole group. (Like SE of the same dose value)

how to solve it?

Philippe

#600 09/12/2018 at 10h01

Hi Kassambara,
Very useful pages ! However, the code in "Line Plots, colour by groups" section only produces one error bar in between the lines instead of two error bars on both lines. I used the command "facet.by" and then the two plots produced are ok! Note that my ggplot2 and ggpubr are up-to-date. Could you please help?
Thanks a lot!

#598 09/05/2018 at 14h41

Can't thank you enough, this is incredibly helpful, especially for a lost PhD student new to R.

STAY UPDATED

Articles - ggpubr: Publication Ready Plots