# GRanges and GRangesList

`This analysis was performed using R (ver. 3.1.0).`

## GRanges

The GenomicRanges package, which is an extension of IRanges to the genomic space. **GRanges** object contains the **sequence names** (here chromosome Z), the **Iranges**, the **strand** information and the **sequence lengths**. If we print out the GRanges object, we see that we have two ranges, zero metadata columns. And it gives the sequence names as an rle which we’ll discuss later. It gives the IRanges, and the strand also as an rle, and the bottom it prints the sequence lengths. We’ve specified that chromosome z is 100 base pairs long.

```
library(GenomicRanges)
gr <- GRanges(seqnames="chrZ", ranges=IRanges(start=c(5,10),end=c(35,45)),
strand="+", seqlengths=c(chrZ=100L))
gr
```

```
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [ 5, 35] +
## [2] chrZ [10, 45] +
## ---
## seqlengths:
## chrZ
## 100

Like with IRanges, we can shift the GRanges, and it will move the starts and ends by 10 base pairs to the right. We can also shift by 80. But notice that if we shift by 80, these will go off the end of the chromosome. GenomicRanges package gives us an error that says that the ranges contain values outside of the sequence bounds. If we wrap this in a **trim** function it will make sure that these end at the chromosome end and then don’t go over it.

`shift(gr, 10)`

```
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [15, 45] +
## [2] chrZ [20, 55] +
## ---
## seqlengths:
## chrZ
## 100

`shift(gr, 80)`

```
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [85, 115] +
## [2] chrZ [90, 125] +
## ---
## seqlengths:
## chrZ
## 100

`trim(shift(gr, 80))`

```
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [85, 100] +
## [2] chrZ [90, 100] +
## ---
## seqlengths:
## chrZ
## 100

The metadata columns we mentioned previously, can be accessed by using the function mcols for metadata columns. Before we have zero columns here. We can add columns by using mcols plus the dollar sign. Now we have an additional column which is a numeric and has two values.

`mcols(gr)`

`## DataFrame with 2 rows and 0 columns`

```
mcols(gr)$value <- c(-1,4)
gr
```

```
## GRanges with 2 ranges and 1 metadata column:
## seqnames ranges strand | value
##
``` |
## [1] chrZ [ 5, 35] + | -1
## [2] chrZ [10, 45] + | 4
## ---
## seqlengths:
## chrZ
## 100

## GRangesList

There’s an additional class in the GRanges package, which is called **GRangesList**. GRangesList is an object which groups GRanges together. The most obvious example of a GRangesList would be grouping exons by gene, or grouping exons by transcript.

```
#Create a second GRanges
gr2 <- GRanges("chrZ",IRanges(11:13,51:53))
mcols(gr)$value <- NULL
#Create GRangesList : This object contains two GRanges
grl <- GRangesList(gr,gr2)
grl
```

```
## GRangesList of length 2:
## [[1]]
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [ 5, 35] +
## [2] chrZ [10, 45] +
##
## [[2]]
## GRanges with 3 ranges and 0 metadata columns:
## seqnames ranges strand
## [1] chrZ [11, 51] *
## [2] chrZ [12, 52] *
## [3] chrZ [13, 53] *
##
## ---
## seqlengths:
## chrZ
## 100

```
#It's a GRanges list of length two,
#where the first GRanges has two ranges and the second GRanges has three ranges.
length(grl)
```

`## [1] 2`

```
#Ask for the first element
grl[[1]]
```

```
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [ 5, 35] +
## [2] chrZ [10, 45] +
## ---
## seqlengths:
## chrZ
## 100

```
#If you specify metadata columns to the GRanges list,
#these will be assigned to each GRanges object in the list.
mcols(grl)$value <- c(5,7)
grl
```

```
## GRangesList of length 2:
## [[1]]
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [ 5, 35] +
## [2] chrZ [10, 45] +
##
## [[2]]
## GRanges with 3 ranges and 0 metadata columns:
## seqnames ranges strand
## [1] chrZ [11, 51] *
## [2] chrZ [12, 52] *
## [3] chrZ [13, 53] *
##
## ---
## seqlengths:
## chrZ
## 100

`mcols(grl)`

```
## DataFrame with 2 rows and 1 column
## value
##
```
## 1 5
## 2 7

## findOverlaps and %over%

Once we’ve created sets of GRanges or GRangesList objects, one common thing we might need to do is to find overlaps between objects. Let’s create two GRanges objects. The first one, will have five ranges. So 1 to 5, 11 to 15, 21 to 2. And the second object will have two ranges. Both GRanges objects are on the same sequence, chromosome z. We’ll use the **findOverlaps** function to find the overlaps. The first two arguments of this function, query and subject, are the most important. If you want to count overlaps, you can use the **countOverlaps** function.

```
#Creating two GRanges objects
gr1 <- GRanges("chrZ",IRanges(c(1,11,21,31,41),width=5))
gr2 <- GRanges("chrZ",IRanges(c(19,33),c(38,35)))
gr1
```

```
## GRanges with 5 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [ 1, 5] *
## [2] chrZ [11, 15] *
## [3] chrZ [21, 25] *
## [4] chrZ [31, 35] *
## [5] chrZ [41, 45] *
## ---
## seqlengths:
## chrZ
## NA

`gr2`

```
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [19, 38] *
## [2] chrZ [33, 35] *
## ---
## seqlengths:
## chrZ
## NA

```
#Find overlaps
fo <- findOverlaps(query=gr1, subject=gr2)
fo
```

```
## Hits of length 3
## queryLength: 5
## subjectLength: 2
## queryHits subjectHits
##
```
## 1 3 1
## 2 4 1
## 3 4 2

The output of the findOverlaps function is a hits object, which has length three, and this gives us the three different overlaps which occurred. The table here tells us that the third element of the query (gr1) intersected with the first element of the subject (gr2). These are given as integer vectors.

Another way to get the overlaps is to use the over function, %over%. It gives a logical vector. For the five ranges in gr1, it gives a logical vector describing which of these have any overlap with the ranges in the second, so gr2. If we use that as a subset, so a logical sub-setting, we returned those ranges in gr1, which had some overlap with gr2.

`queryHits(fo)`

`## [1] 3 4 4`

`subjectHits(fo)`

`## [1] 1 1 2`

`gr1 %over% gr2`

`## [1] FALSE FALSE TRUE TRUE FALSE`

`gr1[gr1 %over% gr2]`

```
## GRanges with 2 ranges and 0 metadata columns:
## seqnames ranges strand
##
```
## [1] chrZ [21, 25] *
## [2] chrZ [31, 35] *
## ---
## seqlengths:
## chrZ
## NA

## Rle and Views

Rle is an object which is defined by IRanges. But also there’s a similar object in base r which is a run length encoding. The meaning of this is that if you have a vector, which repeats certain values, you can save memory. By instead of storing each value, you save the number, and then the number of repeats.

If we have such an rle object, and we want to peer into it in different regions, we can construct a views object. Views is a virtual class, which contains the subject, and then a number of views, which are essentially IRanges into that object. You can also use the views constructor for FASTA files, for example, if you want to look into genome sequence or other objects.

```
r <- Rle(c(1,1,1,0,0,-2,-2,-2,rep(-1,20)))
r
```

```
## numeric-Rle of length 28 with 4 runs
## Lengths: 3 2 3 20
## Values : 1 0 -2 -1
```

```
#Structure of r
str(r)
```

```
## Formal class 'Rle' [package "IRanges"] with 4 slots
## ..@ values : num [1:4] 1 0 -2 -1
## ..@ lengths : int [1:4] 3 2 3 20
## ..@ elementMetadata: NULL
## ..@ metadata : list()
```

`as.numeric(r)`

```
## [1] 1 1 1 0 0 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
## [24] -1 -1 -1 -1 -1
```

```
#creating 2 views of the rle
Views(r, start=c(4,2), end=c(7,6))
```

```
## Views on a 28-length Rle subject
##
## views:
## start end width
## [1] 4 7 4 [ 0 0 -2 -2]
## [2] 2 6 5 [ 1 1 0 0 -2]
```

## Licence

## References

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

## Recommended for You!

## Recommended for you

This section contains best data science and self-development resources to help you on your path.

### Coursera - Online Courses and Specialization

#### Data science

- Course: Machine Learning: Master the Fundamentals by Standford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University

#### Popular Courses Launched in 2020

- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services

#### Trending Courses

- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts

### Books - Data Science

#### Our Books

- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

#### Others

- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet