GRanges and GRangesList

This analysis was performed using R (ver. 3.1.0).

GRanges

The GenomicRanges package, which is an extension of IRanges to the genomic space. GRanges object contains the sequence names (here chromosome Z), the Iranges, the strand information and the sequence lengths. If we print out the GRanges object, we see that we have two ranges, zero metadata columns. And it gives the sequence names as an rle which we’ll discuss later. It gives the IRanges, and the strand also as an rle, and the bottom it prints the sequence lengths. We’ve specified that chromosome z is 100 base pairs long.

library(GenomicRanges)
gr <- GRanges(seqnames="chrZ", ranges=IRanges(start=c(5,10),end=c(35,45)),
              strand="+", seqlengths=c(chrZ=100L))
gr
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
##   ---
##   seqlengths:
##    chrZ
##     100

Like with IRanges, we can shift the GRanges, and it will move the starts and ends by 10 base pairs to the right. We can also shift by 80. But notice that if we shift by 80, these will go off the end of the chromosome. GenomicRanges package gives us an error that says that the ranges contain values outside of the sequence bounds. If we wrap this in a trim function it will make sure that these end at the chromosome end and then don’t go over it.

shift(gr, 10)
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ  [15, 45]      +
##   [2]     chrZ  [20, 55]      +
##   ---
##   seqlengths:
##    chrZ
##     100
shift(gr, 80)
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ [85, 115]      +
##   [2]     chrZ [90, 125]      +
##   ---
##   seqlengths:
##    chrZ
##     100
trim(shift(gr, 80))
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ [85, 100]      +
##   [2]     chrZ [90, 100]      +
##   ---
##   seqlengths:
##    chrZ
##     100

The metadata columns we mentioned previously, can be accessed by using the function mcols for metadata columns. Before we have zero columns here. We can add columns by using mcols plus the dollar sign. Now we have an additional column which is a numeric and has two values.

mcols(gr)
## DataFrame with 2 rows and 0 columns
mcols(gr)$value <- c(-1,4)
gr
## GRanges with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |     value
##              | 
##   [1]     chrZ  [ 5, 35]      + |        -1
##   [2]     chrZ  [10, 45]      + |         4
##   ---
##   seqlengths:
##    chrZ
##     100

GRangesList

There’s an additional class in the GRanges package, which is called GRangesList. GRangesList is an object which groups GRanges together. The most obvious example of a GRangesList would be grouping exons by gene, or grouping exons by transcript.

#Create a second GRanges
gr2 <- GRanges("chrZ",IRanges(11:13,51:53))
mcols(gr)$value <- NULL
#Create GRangesList : This object contains two GRanges
grl <- GRangesList(gr,gr2)
grl
## GRangesList of length 2:
## [[1]] 
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
## 
## [[2]] 
## GRanges with 3 ranges and 0 metadata columns:
##       seqnames   ranges strand
##   [1]     chrZ [11, 51]      *
##   [2]     chrZ [12, 52]      *
##   [3]     chrZ [13, 53]      *
## 
## ---
## seqlengths:
##  chrZ
##   100
#It's a GRanges list of length two, 
#where the first GRanges has two ranges and the second GRanges has three ranges.
length(grl)
## [1] 2
#Ask for the first element
grl[[1]]
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
##   ---
##   seqlengths:
##    chrZ
##     100
#If you specify metadata columns to the GRanges list,
#these will be assigned to each GRanges object in the list.
mcols(grl)$value <- c(5,7)
grl
## GRangesList of length 2:
## [[1]] 
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
## 
## [[2]] 
## GRanges with 3 ranges and 0 metadata columns:
##       seqnames   ranges strand
##   [1]     chrZ [11, 51]      *
##   [2]     chrZ [12, 52]      *
##   [3]     chrZ [13, 53]      *
## 
## ---
## seqlengths:
##  chrZ
##   100
mcols(grl)
## DataFrame with 2 rows and 1 column
##       value
##   
## 1         5
## 2         7

findOverlaps and %over%

Once we’ve created sets of GRanges or GRangesList objects, one common thing we might need to do is to find overlaps between objects. Let’s create two GRanges objects. The first one, will have five ranges. So 1 to 5, 11 to 15, 21 to 2. And the second object will have two ranges. Both GRanges objects are on the same sequence, chromosome z. We’ll use the findOverlaps function to find the overlaps. The first two arguments of this function, query and subject, are the most important. If you want to count overlaps, you can use the countOverlaps function.

#Creating two GRanges objects
gr1 <- GRanges("chrZ",IRanges(c(1,11,21,31,41),width=5))
gr2 <- GRanges("chrZ",IRanges(c(19,33),c(38,35)))
gr1
## GRanges with 5 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ  [ 1,  5]      *
##   [2]     chrZ  [11, 15]      *
##   [3]     chrZ  [21, 25]      *
##   [4]     chrZ  [31, 35]      *
##   [5]     chrZ  [41, 45]      *
##   ---
##   seqlengths:
##    chrZ
##      NA
gr2
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ  [19, 38]      *
##   [2]     chrZ  [33, 35]      *
##   ---
##   seqlengths:
##    chrZ
##      NA
#Find overlaps
fo <- findOverlaps(query=gr1, subject=gr2)
fo
## Hits of length 3
## queryLength: 5
## subjectLength: 2
##   queryHits subjectHits 
##        
##  1         3           1 
##  2         4           1 
##  3         4           2

The output of the findOverlaps function is a hits object, which has length three, and this gives us the three different overlaps which occurred. The table here tells us that the third element of the query (gr1) intersected with the first element of the subject (gr2). These are given as integer vectors.

Another way to get the overlaps is to use the over function, %over%. It gives a logical vector. For the five ranges in gr1, it gives a logical vector describing which of these have any overlap with the ranges in the second, so gr2. If we use that as a subset, so a logical sub-setting, we returned those ranges in gr1, which had some overlap with gr2.

queryHits(fo)
## [1] 3 4 4
subjectHits(fo)
## [1] 1 1 2
gr1 %over% gr2
## [1] FALSE FALSE  TRUE  TRUE FALSE
gr1[gr1 %over% gr2]
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##             
##   [1]     chrZ  [21, 25]      *
##   [2]     chrZ  [31, 35]      *
##   ---
##   seqlengths:
##    chrZ
##      NA

Rle and Views

Rle is an object which is defined by IRanges. But also there’s a similar object in base r which is a run length encoding. The meaning of this is that if you have a vector, which repeats certain values, you can save memory. By instead of storing each value, you save the number, and then the number of repeats.

If we have such an rle object, and we want to peer into it in different regions, we can construct a views object. Views is a virtual class, which contains the subject, and then a number of views, which are essentially IRanges into that object. You can also use the views constructor for FASTA files, for example, if you want to look into genome sequence or other objects.

r <- Rle(c(1,1,1,0,0,-2,-2,-2,rep(-1,20)))
r
## numeric-Rle of length 28 with 4 runs
##   Lengths:  3  2  3 20
##   Values :  1  0 -2 -1
#Structure of r
str(r)
## Formal class 'Rle' [package "IRanges"] with 4 slots
##   ..@ values         : num [1:4] 1 0 -2 -1
##   ..@ lengths        : int [1:4] 3 2 3 20
##   ..@ elementMetadata: NULL
##   ..@ metadata       : list()
as.numeric(r)
##  [1]  1  1  1  0  0 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
## [24] -1 -1 -1 -1 -1
#creating 2 views of the rle
Views(r, start=c(4,2), end=c(7,6))
## Views on a 28-length Rle subject
## 
## views:
##     start end width
## [1]     4   7     4 [ 0  0 -2 -2]
## [2]     2   6     5 [ 1  1  0  0 -2]

Licence

Licence


Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!





This page has been seen 27754 times