Articles - R Graphics Essentials

Plot Time Series Data Using GGPlot

  |   170  |  Post a comment  |  R Graphics Essentials

In this chapter, we start by describing how to plot simple and multiple time series data using the R function geom_line() [in ggplot2].

Next, we show how to set date axis limits and add trend smoothed line to a time series graphs. Finally, we introduce some extensions to the ggplot2 package for easily handling and analyzing time series objects.

Additionally, you’ll learn how to detect peaks (maxima) and valleys (minima) in time series data.

Contents:


Basic ggplot of time series

  • Plot types: line plot with dates on x-axis
  • Demo data set: economics [ggplot2] time series data sets are used.

In this section we’ll plot the variables psavert (personal savings rate) and uempmed (number of unemployed in thousands) by date (x-axis).

  • Load required packages and set the default theme:
library(ggplot2)
theme_set(theme_minimal())
# Demo dataset
head(economics)
## # A tibble: 6 x 6
##         date   pce    pop psavert uempmed unemploy
##                    
## 1 1967-07-01   507 198712    12.5     4.5     2944
## 2 1967-08-01   510 198911    12.5     4.7     2945
## 3 1967-09-01   516 199113    11.7     4.6     2958
## 4 1967-10-01   513 199311    12.5     4.9     3143
## 5 1967-11-01   518 199498    12.5     4.7     3066
## 6 1967-12-01   526 199657    12.1     4.8     3018
  • Create basic line plots
# Basic line plot
ggplot(data = economics, aes(x = date, y = pop))+
  geom_line(color = "#00AFBB", size = 2)
# Plot a subset of the data
ss <- subset(economics, date > as.Date("2006-1-1"))
ggplot(data = ss, aes(x = date, y = pop)) + 
  geom_line(color = "#FC4E07", size = 2)

  • Control line size by the value of a continuous variable:
ggplot(data = economics, aes(x = date, y = pop)) +
  geom_line(aes(size = unemploy/pop), color = "#FC4E07")

Plot multiple time series data

Here, we’ll plot the variables psavert and uempmed by dates. You should first reshape the data using the tidyr package: - Collapse psavert and uempmed values in the same column (new column). R function: gather()[tidyr] - Create a grouping variable that with levels = psavert and uempmed

library(tidyr)
library(dplyr)
df <- economics %>%
  select(date, psavert, uempmed) %>%
  gather(key = "variable", value = "value", -date)
head(df, 3)
## # A tibble: 3 x 3
##         date variable value
##            
## 1 1967-07-01  psavert  12.5
## 2 1967-08-01  psavert  12.5
## 3 1967-09-01  psavert  11.7
# Multiple line plot
ggplot(df, aes(x = date, y = value)) + 
  geom_line(aes(color = variable), size = 1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  theme_minimal()

# Area plot
ggplot(df, aes(x = date, y = value)) + 
  geom_area(aes(color = variable, fill = variable), 
            alpha = 0.5, position = position_dodge(0.8)) +
  scale_color_manual(values = c("#00AFBB", "#E7B800")) +
  scale_fill_manual(values = c("#00AFBB", "#E7B800"))

Set date axis limits

Key R function: scale_x_date()

# Base plot with date axis
p <- ggplot(data = economics, aes(x = date, y = psavert)) + 
     geom_line(color = "#00AFBB", size = 1)
p
# Set axis limits c(min, max)
min <- as.Date("2002-1-1")
max <- NA
p + scale_x_date(limits = c(min, max))

Format date axis labels

Key function: scale_x_date().

To format date axis labels, you can use different combinations of days, weeks, months and years:

  • Weekday name: use %a and %A for abbreviated and full weekday name, respectively
  • Month name: use %b and %B for abbreviated and full month name, respectively
  • %d: day of the month as decimal number
  • %Y: Year with century.
  • See more options in the documentation of the function ?strptime
# Format : month/year
p + scale_x_date(date_labels = "%b/%Y")

Add trend smoothed line

Key function: stat_smooth()

p + stat_smooth(
  color = "#FC4E07", fill = "#FC4E07",
  method = "loess"
  )

ggplot2 extensions for ts objects

The ggfortify package is an extension to ggplot2 that makes it easy to plot time series objects (Horikoshi and Tang 2017). It can handle the output of many time series packages, including: zoo::zooreg(), xts::xts(), timeSeries::timSeries(), tseries::irts(), forecast::forecast(), vars:vars().

Another interesting package is the ggpmisc package (Aphalo 2017), which provides two useful methods for time series object:

  • stat_peaks() finds at which x positions local y maxima are located, and
  • stat_valleys() finds at which x positions local y minima are located.

Here, we’ll show how to easily:

  • Visualize a time series object, using the data set AirPassengers (monthly airline passenger numbers 1949-1960).
  • Identify shifts in mean and/or variance in a time series using the changepoint package.
  • Detect jumps in a data using the strucchange package and the data set Nile (Measurements of the annual flow of the river Nile at Aswan).
  • Detect peaks and valleys using the ggpmisc package and the data set lynx (Annual Canadian Lynx trappings 1821–1934).

First, install required R packages:

install.packages(
  c("ggfortify", "changepoint",
    "strucchange", "ggpmisc")
)

Then use the autoplot.ts() function to visualize time series objects, as follow:

library(ggfortify)
library(magrittr) # for piping %>%
# Plot ts objects
autoplot(AirPassengers)
# Identify change points in mean and variance
AirPassengers %>%
  changepoint:: cpt.meanvar() %>%  # Identify change points
  autoplot()
# Detect jump in a data
strucchange::breakpoints(Nile ~ 1) %>%
  autoplot()

Detect peaks and valleys:

library(ggpmisc)
ggplot(lynx, as.numeric = FALSE) + geom_line() + 
  stat_peaks(colour = "red") +
  stat_peaks(geom = "text", colour = "red", 
             vjust = -0.5, x.label.fmt = "%Y") +
  stat_valleys(colour = "blue") +
  stat_valleys(geom = "text", colour = "blue", angle = 45,
               vjust = 1.5, hjust = 1,  x.label.fmt = "%Y")+
  ylim(-500, 7300)

References

Aphalo, Pedro J. 2017. Ggpmisc: Miscellaneous Extensions to ’Ggplot2’. https://CRAN.R-project.org/package=ggpmisc.

Horikoshi, Masaaki, and Yuan Tang. 2017. Ggfortify: Data Visualization Tools for Statistical Analysis Results. https://CRAN.R-project.org/package=ggfortify.