<?xml version="1.0" encoding="UTF-8" ?>
<!-- RSS generated by PHPBoost on Mon, 27 Apr 2026 06:05:01 +0200 -->

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Easy Guides]]></title>
		<atom:link href="https://www.sthda.com/english/syndication/rss/wiki/15" rel="self" type="application/rss+xml"/>
		<link>https://www.sthda.com</link>
		<description><![CDATA[Last articles of the category: Visualizing next geration sequencing data]]></description>
		<copyright>(C) 2005-2026 PHPBoost</copyright>
		<language>en</language>
		<generator>PHPBoost</generator>
		
		
		<item>
			<title><![CDATA[IGV - Integrative Genomics Viewer]]></title>
			<link>https://www.sthda.com/english/wiki/igv-integrative-genomics-viewer</link>
			<guid>https://www.sthda.com/english/wiki/igv-integrative-genomics-viewer</guid>
			<description><![CDATA[<!-- START HTML -->
           
  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<div id="TOC">
<ul>
<li><a href="#load-packages-and-data">Load packages and data</a></li>
<li><a href="#igv---integrative-genomics-viewer">IGV - Integrative Genomics Viewer</a></li>
<li><a href="#video-visualizing-ngs-data-with-igv">Video : Visualizing NGS data with IGV</a></li>
<li><a href="#footnotes">Footnotes</a></li>
<li><a href="#licence">Licence</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>

<hr />
<pre class="warning"><code>This analysis was performed using R (ver. 3.1.0).</code></pre>
<div id="load-packages-and-data" class="section level2">
<h2>Load packages and data</h2>
<p>We’re going to use the RNA sequencing experiment in the passilaBamSubset package.</p>
<pre class="r"><code>#biocLite("pasillaBamSubset")
#biocLite("TxDb.Dmelanogaster.UCSC.dm3.ensGene")
library(pasillaBamSubset)
#Genes annotated in transcript database
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
#The 2 files of interest
fl1 <- untreated1_chr4()
fl2 <- untreated3_chr4()</code></pre>
</div>
<div id="igv---integrative-genomics-viewer" class="section level2">
<h2>IGV - Integrative Genomics Viewer</h2>
<p>IGV is freely available for download <a href="https://www.broadinstitute.org/igv/home">here</a>.</p>
<p>Copy fl1 and fl2 from the R library directory to the current working directory.</p>
<pre class="success"><code>We need to use the `Rsamtools` library to index the BAM files for using IGV.</code></pre>
<pre class="r"><code>file.copy(from=fl1,to=basename(fl1))</code></pre>
<pre><code>## [1] FALSE</code></pre>
<pre class="r"><code>file.copy(from=fl2,to=basename(fl2))</code></pre>
<pre><code>## [1] FALSE</code></pre>
<pre class="r"><code>library(Rsamtools)
indexBam(basename(fl1))</code></pre>
<pre><code>##       untreated1_chr4.bam 
## "untreated1_chr4.bam.bai"</code></pre>
<pre class="r"><code>indexBam(basename(fl2))</code></pre>
<pre><code>##       untreated3_chr4.bam 
## "untreated3_chr4.bam.bai"</code></pre>
<pre class="warning"><code>Note that if you have trouble downloading IGV, another option for visualization is the UCSC Genome Browser: http://genome.ucsc.edu/cgi-bin/hgTracks 
  
The UCSC Genome Browser is a great resource, having many tracks involving gene annotations, conservation over multiple species, and the ENCODE epigentic tracks is already available. However, the UCSC Genome Browser requires that you upload your genomic files to their server, or put your data on a publicly available server. This is not always possible if you are working with confidential data.</code></pre>
<p>Using IGV, look the gene <strong>lgs</strong>.</p>
<p>In this example the data is an RNA sequencing experiment of Drosophila. From IGV, we need to use the Drosophila melanogaster genome, and specifically the dm3 genome.</p>
<p><strong>Load the 2 bam files</strong>: File -> load from file -> select the 2 bam files. Two new tracks are created (one track for each file).</p>
<p>The passilaBamSubset package is a subset of the reads, which map to chromosome 4. Select chromosome 4, type the gene name <strong>lgs</strong> in the search field, and click Go. We can see that the coverage only is on the exons, mostly.</p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/genomics/igv.png" alt="igv lgs gene" /></p>
<p>And if we zoom in (hold Shift, and drag on the top part to zoom) you can see the individual reads. There are reads on the plus strand and minus strand. IGV is a very useful program for visualizing quickly reads from sequencing experiments.</p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/genomics/igv-zoom.png" alt="igv lgs gene zoom" /></p>
<p><strong>Legend:</strong> 1 = Reads on the plus strand
2 = Reads on the minus strand
3 = Mismatch base compared to the reference genome</p>
</div>
<div id="video-visualizing-ngs-data-with-igv" class="section level2">
<h2>Video : Visualizing NGS data with IGV</h2>
<p>(French)
<iframe title="Visualizing NGS data with IGV" width="480" height="390" src="https://www.youtube.com/embed/zrjkNJWofDo" frameborder="0" allowfullscreen></iframe></p>
<pre class="success"><code>In the next unit, I&amp;#39;m going to show how to generate coverage plots-- like the ones in here-- but within Bioconductor.</code></pre>
</div>
<div id="footnotes" class="section level2">
<h2>Footnotes</h2>
<ul>
<li><p>IGV <a href="https://www.broadinstitute.org/igv/home">https://www.broadinstitute.org/igv/home</a></p></li>
<li><p>UCSC Genome Browser: zooms and scrolls over chromosomes, showing the work of annotators worldwide <a href="http://genome.ucsc.edu/">http://genome.ucsc.edu/</a></p></li>
</ul>
</div>
<div id="licence" class="section level2">
<h2>Licence</h2>
<p><a href="https://www.sthda.com/english/english/wiki/mit-licence">Licence</a></p>
</div>
<div id="references" class="section level2">
<h2>References</h2>
<p><a href="https://github.com/genomicsclass">https://github.com/genomicsclass</a></p>
</div>

<script>jQuery(document).ready(function () {jQuery('h1,h2,h3,h4').addClass('formatter-title');});//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Tue, 07 Oct 2014 16:13:55 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Gviz - Visualize genomic data]]></title>
			<link>https://www.sthda.com/english/wiki/gviz-visualize-genomic-data</link>
			<guid>https://www.sthda.com/english/wiki/gviz-visualize-genomic-data</guid>
			<description><![CDATA[<!-- START HTML -->

        
            
  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<div id="TOC">
<ul>
<li><a href="#introduction-to-gviz-package">Introduction to Gviz package</a><ul>
<li><a href="#plot-annotation-track">Plot annotation track</a></li>
<li><a href="#add-genome-axis-track">Add genome axis track</a></li>
<li><a href="#add-chromosome-ideogram">Add chromosome ideogram</a></li>
<li><a href="#add-gene-model">Add gene model</a></li>
<li><a href="#zoom-the-plot">Zoom the plot</a></li>
<li><a href="#add-sequence-track-and-zoom-to-view-sequence">Add sequence track and zoom to view sequence</a></li>
<li><a href="#add-data-track">Add data track</a></li>
</ul></li>
<li><a href="#plotting-parameters">Plotting parameters</a><ul>
<li><a href="#setting-parameters">setting parameters</a></li>
<li><a href="#plotting-direction">Plotting direction</a></li>
</ul></li>
<li><a href="#track-classes">Track classes</a><ul>
<li><a href="#display-parameters-for-genomeaxistrack-objects">Display parameters for GenomeAxisTrack objects</a></li>
<li><a href="#ideogramtrack">IdeogramTrack</a></li>
<li><a href="#datatrack">DataTrack</a></li>
<li><a href="#annotationtrack">AnnotationTrack</a></li>
<li><a href="#generegiontrack">GeneRegionTrack</a></li>
<li><a href="#sequence-track">Sequence Track</a></li>
<li><a href="#alignmentstrack">AlignmentsTrack</a></li>
</ul></li>
<li><a href="#track-highlighting-and-overlays">Track highlighting and overlays</a><ul>
<li><a href="#highlighting">Highlighting</a></li>
<li><a href="#overlays">Overlays</a></li>
</ul></li>
<li><a href="#footnotes">Footnotes</a></li>
<li><a href="#licence">Licence</a></li>
</ul>
</div>

<hr />
<pre class="warning"><code>This analysis was performed using R (ver. 3.1.0).</code></pre>
<div id="introduction-to-gviz-package" class="section level2">
<h2>Introduction to Gviz package</h2>
<p>The Gviz package aims to provide a structured visualization framework to plot any type of data along genomic coordinates. It also allows to integrate publicly available genomic annotation data from sources like UCSC or ENSEMBL.</p>
<p>Individual types of genomic features or data are represented by separate tracks, like most of genome browsers.</p>
<p><span class="warning"> By default, <strong>Gviz</strong> checks all supplied chromosome names for validity in the sense of the UCSC definition (<strong>chromosomes have to start with the chr string</strong>). You may decide to turn this feature off by calling options(ucscChromosomeNames=FALSE) </span></p>
<p>In the following examples, we will make use of the UCSC genome and chromosome 7 (chr7) on mouse mm9 genome.</p>
<div id="plot-annotation-track" class="section level3">
<h3>Plot annotation track</h3>
<p>Please note that the <strong>AnnotationTrack</strong> constructor can accommodate many different types of inputs. For instance, the start and end coordinates of the annotation features could be passed in as individual arguments start and end, as a data.frame or even as an IRanges or GRangesList object.</p>
<pre class="r"><code>library(Gviz)
library(GenomicRanges)
#Load data : class = GRanges
data(cpgIslands)
cpgIslands</code></pre>
<pre><code>## GRanges with 10 ranges and 0 metadata columns:
##        seqnames               ranges strand
##           <Rle>            <IRanges>  <Rle>
##    [1]     chr7 [26549019, 26550183]      *
##    [2]     chr7 [26564119, 26564500]      *
##    [3]     chr7 [26585667, 26586158]      *
##    [4]     chr7 [26591772, 26593309]      *
##    [5]     chr7 [26594192, 26594570]      *
##    [6]     chr7 [26623835, 26624150]      *
##    [7]     chr7 [26659284, 26660352]      *
##    [8]     chr7 [26721294, 26721717]      *
##    [9]     chr7 [26821518, 26823297]      *
##   [10]     chr7 [26991322, 26991841]      *
##   ---
##   seqlengths:
##    chr7
##      NA</code></pre>
<pre class="r"><code>#Annotation track, title ="CpG"
atrack <- AnnotationTrack(cpgIslands, name = "CpG")
plotTracks(atrack)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-annotation-track.png" title="plot of chunk annotation-track" alt="plot of chunk annotation-track" width="672" /></p>
</div>
<div id="add-genome-axis-track" class="section level3">
<h3>Add genome axis track</h3>
<p>This step is to indicate the genomic coordinates we are currently looking at.</p>
<pre class="r"><code>## genomic coordinates
gtrack <- GenomeAxisTrack()
plotTracks(list(gtrack, atrack))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-genome-axis-track.png" title="plot of chunk genome-axis-track" alt="plot of chunk genome-axis-track" width="672" /></p>
</div>
<div id="add-chromosome-ideogram" class="section level3">
<h3>Add chromosome ideogram</h3>
<p>To add chromosome ideogram, we have to indicate a valid UCSC genome (e.g : “hg19”) chromosome name (e.g : “chr7”).</p>
<pre class="warning"><code>Internet connection is required as the function fetches data from UCSC and it can take quite long time.</code></pre>
<pre class="r"><code>#genome : "hg19" 
gen<-genome(cpgIslands)
#Chromosme name : "chr7"
chr <- as.character(unique(seqnames(cpgIslands)))
#Ideogram track
itrack <- IdeogramTrack(genome = gen, chromosome = chr)
plotTracks(list(itrack, gtrack, atrack))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-chromosome-ideogram.png" title="plot of chunk chromosome-ideogram" alt="plot of chunk chromosome-ideogram" width="672" /></p>
<pre class="success"><code>Ideogram tracks are the one exception in all of Gviz &amp;#39;s track objects in the sense that they are not really displayed on the same coordinate system like all the other tracks. Instead, the current genomic location is indicated on the chromosome by a red box (or, as in this case, a red line if the width is too small to fit a box).</code></pre>
</div>
<div id="add-gene-model" class="section level3">
<h3>Add gene model</h3>
<p>We can utilize gene model information from an existing local source. Alternatively, we could download such data from one of the available online resources like UCSC or ENSEBML, and there are constructor functions to handle these tasks.</p>
<p>For this example we are going to load gene model data from a stored data.frame.</p>
<pre class="r"><code>#Load data
data(geneModels)
head(geneModels)</code></pre>
<pre><code>##   chromosome    start      end width strand feature            gene            exon      transcript     symbol
## 1       chr7 26591441 26591829   389      + lincRNA ENSG00000233760 ENSE00001693369 ENST00000420912 AC004947.2
## 2       chr7 26591458 26591829   372      + lincRNA ENSG00000233760 ENSE00001596777 ENST00000457000 AC004947.2
## 3       chr7 26591515 26591829   315      + lincRNA ENSG00000233760 ENSE00001601658 ENST00000430426 AC004947.2
## 4       chr7 26594428 26594538   111      + lincRNA ENSG00000233760 ENSE00001792454 ENST00000457000 AC004947.2
## 5       chr7 26594428 26596819  2392      + lincRNA ENSG00000233760 ENSE00001618328 ENST00000420912 AC004947.2
## 6       chr7 26594641 26594733    93      + lincRNA ENSG00000233760 ENSE00001716169 ENST00000457000 AC004947.2</code></pre>
<pre class="r"><code>#Plot
grtrack <- GeneRegionTrack(geneModels, genome = gen,
                           chromosome = chr, name = "Gene Model")
plotTracks(list(itrack, gtrack, atrack, grtrack))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-gene-model.png" title="plot of chunk gene-model" alt="plot of chunk gene-model" width="672" /></p>
</div>
<div id="zoom-the-plot" class="section level3">
<h3>Zoom the plot</h3>
<p>We often want to zoom in or out on a particular plotting region to see more details or to get a broader overview.</p>
<pre class="r"><code>#Use from and to arguments to zoom
plotTracks(list(itrack, gtrack, atrack, grtrack),
           from = 26700000, to = 26750000)

# Use extend.left and extend.right to zoom
#those arguments are relative to the currently displayed ranges, 
#and can be used to quickly extend the view on one or both ends of the plot.
plotTracks(list(itrack, gtrack, atrack, grtrack),
           extend.left = 0.5, extend.right = 1000000)

# to drop the bounding borders of the exons and 
# to have a nice plot
plotTracks(list(itrack, gtrack, atrack, grtrack),
           extend.left = 0.5, extend.right = 1000000, col = NULL)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-zoom1.png" title="plot of chunk zoom" alt="plot of chunk zoom" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-zoom2.png" title="plot of chunk zoom" alt="plot of chunk zoom" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-zoom3.png" title="plot of chunk zoom" alt="plot of chunk zoom" width="288" /></p>
<pre class="success"><code>A value of 0.5 will cause zooming in to half the currently displayed range.</code></pre>
</div>
<div id="add-sequence-track-and-zoom-to-view-sequence" class="section level3">
<h3>Add sequence track and zoom to view sequence</h3>
<p>The necessary sequence information is drawn from one of the BSgenome packages.</p>
<pre class="r"><code>library(BSgenome.Hsapiens.UCSC.hg19)
strack <- SequenceTrack(Hsapiens, chromosome = chr)
plotTracks(list(itrack, gtrack, atrack, grtrack,
                strack), from = 26591822, to = 26591852, cex = 0.8)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-sequence-track.png" title="plot of chunk sequence-track" alt="plot of chunk sequence-track" width="432" /></p>
</div>
<div id="add-data-track" class="section level3">
<h3>Add data track</h3>
<p>DataTrack object are essentially run-length encoded (Rle) numeric vectors or matrices, and we can use them to add all sorts of numeric data to our genomic coordinate plots. Different visualization options for these tracks are available including dot plots, histograms and box-and-whisker plots.</p>
<pre class="r"><code>#For demonstration purposes we can create a simple DataTrack object from
#randomly sampled data.
set.seed(255)
lim <- c(26700000, 26750000)
coords <- sort(c(lim[1], sample(seq(from = lim[1], 
                                    to = lim[2]), 99), lim[2]))
dat <- runif(100, min = -10, max = 10)
head(dat)</code></pre>
<pre><code>## [1]  3.5382 -9.3984 -2.8372 -1.9871  0.2722  8.2508</code></pre>
<pre class="r"><code>##data track
dtrack <- DataTrack(data = dat, start = coords[-length(coords)],
                    end = coords[-1], chromosome = chr, genome = gen,
                    name = "Uniform")
##Plot data track
plotTracks(list(itrack, gtrack, atrack, grtrack,
                dtrack), from = lim[1], to = lim[2])

#Change plot type to histogram
plotTracks(list(itrack, gtrack, atrack, grtrack,dtrack),
           from = lim[1], to = lim[2], type = "histogram")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-data-track1.png" title="plot of chunk data-track" alt="plot of chunk data-track" width="384" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-data-track2.png" title="plot of chunk data-track" alt="plot of chunk data-track" width="384" /></p>
<pre class="warning"><code>Such a visualization can be particularly helpful when displaying for instance the coverage of NGS reads along a chromosome, or to show the measurement values of mapped probes from a micro array experiment.</code></pre>
</div>
</div>
<div id="plotting-parameters" class="section level2">
<h2>Plotting parameters</h2>
<div id="setting-parameters" class="section level3">
<h3>setting parameters</h3>
<pre class="r"><code>#Annotation of transcript
#Change panel and title background color
grtrack <- GeneRegionTrack(geneModels, genome = gen,
                           chromosome = chr, name = "Gene Model", 
                           transcriptAnnotation = "symbol",
                           background.panel = "#FFFEDB",
                           background.title = "darkblue")
plotTracks(list(itrack, gtrack, atrack, grtrack))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-setting-parameters.png" title="plot of chunk setting-parameters" alt="plot of chunk setting-parameters" width="384" /></p>
</div>
<div id="plotting-direction" class="section level3">
<h3>Plotting direction</h3>
<p>By default all tracks will be plotted in a 5’ -> 3’ direction. It sometimes can be useful to actually show the data relative to the opposite strand.</p>
<pre class="r"><code>plotTracks(list(itrack, gtrack, atrack, grtrack),
           reverseStrand = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-plotting-direction.png" title="plot of chunk plotting-direction" alt="plot of chunk plotting-direction" width="432" /></p>
<pre class="success"><code>As you can see, the fact that the data has been plotted on the reverse strand is also reflected in the GenomeAxis track.</code></pre>
</div>
</div>
<div id="track-classes" class="section level2">
<h2>Track classes</h2>
<p>Several parameters can be used to change the appearance of the different tracks.</p>
<div id="display-parameters-for-genomeaxistrack-objects" class="section level3">
<h3>Display parameters for GenomeAxisTrack objects</h3>
<p>Set the position of labels to below, show IDs, change color</p>
<pre class="r"><code>axisTrack <- GenomeAxisTrack(range = IRanges(start = c(2000000,4000000), 
                                             end = c(3000000, 7000000),
                                             names = rep("N-stretch", 2))
                             )
plotTracks(axisTrack, from = 1000000, to = 9000000, 
           labelPos = "below",showId=TRUE, col="red")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-GenomeAxisTrack-display-parameters.png" title="plot of chunk GenomeAxisTrack-display-parameters" alt="plot of chunk GenomeAxisTrack-display-parameters" width="432" /></p>
</div>
<div id="ideogramtrack" class="section level3">
<h3>IdeogramTrack</h3>
<pre class="r"><code>#Ideogram
ideoTrack <- IdeogramTrack(genome = "hg19", chromosome = "chrX")
plotTracks(ideoTrack, from = 85000000, to = 129000000)

#Show chromosome band ID
plotTracks(ideoTrack, from = 85000000, to = 129000000,
           showId = FALSE, showBandId = TRUE, cex.bands = 0.4)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-11.png" title="plot of chunk unnamed-chunk-1" alt="plot of chunk unnamed-chunk-1" width="432" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-12.png" title="plot of chunk unnamed-chunk-1" alt="plot of chunk unnamed-chunk-1" width="432" /></p>
</div>
<div id="datatrack" class="section level3">
<h3>DataTrack</h3>
<p>Essentially they constitute run-length encoded numeric vectors or matrices associated to a particular genomic coordinate range. There can be multiple samples in a single data set, and the plotting method provides tools to incorporate sample group information.</p>
<p>Thus the starting point for creating DataTrack objects will always be a set of ranges, either in the form of an IRanges or GRanges object, or individually as start and end coordinates or widths. The second ingredient is a numeric vector of the same length as the number of ranges, or a numeric matrix with the same number of columns.</p>
<p>We will load our sample data from an GRanges object that comes as part of the Gviz package.</p>
<pre class="r"><code>#Load data
 data(twoGroups)
head(twoGroups)</code></pre>
<pre><code>## GRanges with 6 ranges and 6 metadata columns:
##       seqnames     ranges strand |           control         control.1          control.2            treated
##          <Rle>  <IRanges>  <Rle> |         <numeric>         <numeric>          <numeric>          <numeric>
##   [1]     chrX [  1,  30]      * | -8.96125989500433 -7.65790161676705   9.87956526689231  -5.84375557024032
##   [2]     chrX [ 42,  71]      * |  -4.2114706709981   4.6882571419701   -1.0533055011183   1.03083667811006
##   [3]     chrX [ 84, 113]      * |  2.28711236733943  8.01326935179532   -7.1219984581694  -4.46718293242157
##   [4]     chrX [125, 154]      * |  9.20983788557351 -6.23242623638362   8.59682233538479  -6.32041404955089
##   [5]     chrX [167, 196]      * | 0.406841854564846 -7.05442394595593 -0.551973707042634   9.36362744309008
##   [6]     chrX [209, 238]      * |  5.90989288408309 -5.10347711388022   1.56467542983592 -0.488725560717285
##               treated.1         treated.2
##               <numeric>         <numeric>
##   [1]  9.71352839842439  9.99328563921154
##   [2] -6.77430204115808 0.593712376430631
##   [3] -4.05887754634023  8.05319488979876
##   [4] -1.56806231010705   3.5114610241726
##   [5] -4.88056596834213  1.55288028530777
##   [6]  6.99816173873842 -2.03484911937267
##   ---
##   seqlengths:
##    chrX
##      NA</code></pre>
<pre class="r"><code>#Plot data track
 dTrack <- DataTrack(twoGroups, name = "uniform")
 plotTracks(dTrack)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-2.png" title="plot of chunk unnamed-chunk-2" alt="plot of chunk unnamed-chunk-2" width="432" /></p>
<p>The default visualization for DataTrack is a dot plot.</p>
<div id="the-different-plot-types" class="section level4">
<h4>The different plot types</h4>
<p>The possible plot types are :</p>
<pre class="r"><code>#dotplot
plotTracks(DataTrack(twoGroups, name = "p"), type="p")
#lines plot
plotTracks(DataTrack(twoGroups, name = "l"), type="l")
#line and dot plot
plotTracks(DataTrack(twoGroups, name = "b"), type="b")
#lines plot of average
plotTracks(DataTrack(twoGroups, name = "a"), type="a")
#histogram lines
plotTracks(DataTrack(twoGroups, name = "h"), type="h")
#histogram histogram (bar width equal to range with)
plotTracks(DataTrack(twoGroups, name = "histogram"), type="histogram")
#&amp;#39;polygon-type&amp;#39; plot relative to a baseline
plotTracks(DataTrack(twoGroups, name = "polygon"), type="polygon")
#box and whisker plot
plotTracks(DataTrack(twoGroups, name = "boxplot"), type="boxplot")
#false color image of the individual values
plotTracks(DataTrack(twoGroups, name = "heatmap"), type="heatmap")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type1.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type2.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type3.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type4.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type5.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type6.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type7.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type8.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-datatrack-plot-type9.png" title="plot of chunk datatrack-plot-type" alt="plot of chunk datatrack-plot-type" width="288" /></p>
<p><strong>Example of DataTrack plots :</strong></p>
<pre class="r"><code>#Combine a boxplot with an average line and a data grid (g):
plotTracks(dTrack, type = c("boxplot", "a", "g"))

#Heatmap and show sample names
plotTracks(dTrack, type = c("heatmap"), showSampleNames = TRUE,
           cex.sampleNames = 0.6)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-31.png" title="plot of chunk unnamed-chunk-3" alt="plot of chunk unnamed-chunk-3" width="432" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-32.png" title="plot of chunk unnamed-chunk-3" alt="plot of chunk unnamed-chunk-3" width="432" /></p>
</div>
<div id="data-grouping" class="section level4">
<h4>Data grouping</h4>
<p>The individual samples can be grouped together using a factor variable.</p>
<pre class="r"><code>plotTracks(dTrack, groups = rep(c("control", "treated"), each = 3),
           type = c("a", "p"), legend=TRUE)
#Boxplot
plotTracks(dTrack, groups = rep(c("control", "treated"), each = 3),
           type = "boxplot")

#Aggregate group. aggregation can be  mean, median, extreme,
#sum, min and max
plotTracks(dTrack, groups = rep(c("control", "treated"),each = 3),
           type = c("b"), aggregateGroups = TRUE,
           aggregation = "max")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-data-grouping1.png" title="plot of chunk data-grouping" alt="plot of chunk data-grouping" width="384" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-data-grouping2.png" title="plot of chunk data-grouping" alt="plot of chunk data-grouping" width="384" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-data-grouping3.png" title="plot of chunk data-grouping" alt="plot of chunk data-grouping" width="384" /></p>
</div>
<div id="building-datatrack-objects-from-files" class="section level4">
<h4>Building DataTrack objects from files</h4>
<p>The DataTrack class supports the most common file types like <strong>wig</strong>, <strong>bigWig</strong>, <strong>bedGraph</strong> and <strong>bam</strong> files.</p>
<pre class="r"><code>bgFile <- system.file("extdata/test.bedGraph", package = "Gviz")
dTrack2 <- DataTrack(range = bgFile, genome = "hg19",
                     type = "l", chromosome = "chr19", name = "bedGraph")
plotTracks(dTrack2)</code></pre>
<pre class="notice"><code>Note that the
Gviz package is using functionality from the rtracklayer package for most of the file import operations</code></pre>
<p>The real power of the file support in the Gviz package comes with streaming from indexed files. Only the relevant part of the data has to be loaded during the plotting operation, so the underlying data files may be quite large without decreasing the performance or causing too big of a memory footprint.</p>
<p>We will exemplify this feature here using a small bam file that is provided with the package. bam files contain alignments of sequences (typically from a next generation sequencing experiment) to a common reference. The most natural representation of such data in a DataTrack is to look at the alignment coverage at a given position only and to encode this in a single elementMetadata column.</p>
<pre class="r"><code>bamFile <- system.file("extdata/test.bam", package = "Gviz")
dTrack4 <- DataTrack(range = bamFile, genome = "hg19",
                     type = "l", name = "Coverage", window = -1, chromosome = "chr1")
plotTracks(dTrack4, from = 189990000, to = 190000000)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-5.png" title="plot of chunk unnamed-chunk-5" alt="plot of chunk unnamed-chunk-5" width="432" /></p>
</div>
</div>
<div id="annotationtrack" class="section level3">
<h3>AnnotationTrack</h3>
<p>Essentially they consist of one or several genomic ranges that can be grouped into composite annotation elements if needed. The necessary building blocks are the range coordinates, a chromosome and a genome identifier. Information can be passed to the function, either in the form of separate function arguments, as IRanges, GRanges or data.frame objects.</p>
<pre class="r"><code>aTrack <- AnnotationTrack(start = c(10, 40, 120),
                          width = 15, chromosome = "chrX", strand = c("+","*", "-"),
                          id = c("Huey", "Dewey", "Louie"),
                          genome = "hg19", name = "foo")
plotTracks(aTrack)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-6.png" title="plot of chunk unnamed-chunk-6" alt="plot of chunk unnamed-chunk-6" width="432" /></p>
<div id="building-annotationtrack-objects-from-files" class="section level4">
<h4>Building AnnotationTrack objects from files</h4>
<p>The default import function reads the coordinates of all the sequence alignments from the bam file.</p>
<pre class="r"><code>#Annotation track
aTrack2 <- AnnotationTrack(range = bamFile, genome = "hg19",
                           name = "Reads", chromosome = "chr1")
plotTracks(aTrack2, from = 189995000, to = 190000000)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-annotation-track-from-file.png" title="plot of chunk annotation-track-from-file" alt="plot of chunk annotation-track-from-file" width="480" /></p>
<p>We can now plot both the DataTrack representation as well as the AnnotationTrack representation of the bam file together to prove that the underlying data are indeed identical.</p>
<pre class="r"><code>plotTracks(list(dTrack4, aTrack2), from = 189990000,
           to = 190000000)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-7.png" title="plot of chunk unnamed-chunk-7" alt="plot of chunk unnamed-chunk-7" width="480" /></p>
</div>
</div>
<div id="generegiontrack" class="section level3">
<h3>GeneRegionTrack</h3>
<p>GeneRegionTrack objects are in principle very similar to AnnotationTrack objects. The only difference is that they are a little more gene/transcript centric. We need to pass start and end positions (or the width) of each annotation feature in the track and also supply the exon, transcript and gene identifiers for each item which will be used to create the transcript groupings.</p>
<pre class="success"><code>A somewhat special case is to build a GeneRegionTrack object directly from one of the popular TranscriptDb objects, an option that is treated in more detail below.</code></pre>
<pre class="r"><code>data(geneModels)
grtrack <- GeneRegionTrack(geneModels, genome = gen,
                           chromosome = chr, name = "foo", 
                           transcriptAnnotation = "symbol")</code></pre>
<div id="building-generegiontrack-objects-from-transcriptdbs" class="section level4">
<h4>Building GeneRegionTrack objects from TranscriptDbs</h4>
<p>The GenomicFeatures packages provides an elegant framework to download gene model information from online sources and to store it locally in a SQLite data base.</p>
<p>A nice bonus when building GeneRegionTracks from TranscriptDb objects is that we get additional information about coding and non-coding regions of the transcripts, i.e., coordinates of the 5’ and 3’ UTRs and of the CDS regions.</p>
<pre class="r"><code>library(GenomicFeatures)
samplefile <- system.file("extdata", "UCSC_knownGene_sample.sqlite",
                          package = "GenomicFeatures")
txdb <- loadDb(samplefile)
txTr <- GeneRegionTrack(txdb, chromosome = "chr6", start = 300000, end = 350000)
#feature(txTr)
plotTracks(txTr)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-9.png" title="plot of chunk unnamed-chunk-9" alt="plot of chunk unnamed-chunk-9" width="384" /></p>
</div>
<div id="biomartgeneregiontrack" class="section level4">
<h4>BiomartGeneRegionTrack</h4>
<p>BiomartGeneRegionTrack class, provides a direct interface to the ENSEMBL Biomart service. We just enter a genome, chromosome and a start and end position on this chromosome, and the constructor function BiomartGeneRegionTrack will automatically contact ENSEMBL, fetch the necessary information and build the gene model on the fly.</p>
<pre class="notice"><code>Please note that you will need an internet connection for this to work, and that contacting Biomart can take a significant amount of time depending on usage and network trafic. </code></pre>
<pre class="r"><code>biomTrack <- BiomartGeneRegionTrack(genome = "hg19",
                                    chromosome = chr, start = 20000000, end = 21000000,
                                    name = "ENSEMBL")
plotTracks(biomTrack, col.line = NULL, col = NULL)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-10.png" title="plot of chunk unnamed-chunk-10" alt="plot of chunk unnamed-chunk-10" width="480" /></p>
</div>
</div>
<div id="sequence-track" class="section level3">
<h3>Sequence Track</h3>
<pre class="r"><code>library(BSgenome.Hsapiens.UCSC.hg19)
sTrack <- SequenceTrack(Hsapiens)
#sequence track : add 5&amp;#39;->3&amp;#39;
plotTracks(sTrack, chromosome = 1, from = 20000,to = 20050,
           add53=TRUE)
#The complement
plotTracks(sTrack, chromosome = 1, from = 20000,to = 20050,
           add53=TRUE, complement = TRUE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-111.png" title="plot of chunk unnamed-chunk-11" alt="plot of chunk unnamed-chunk-11" width="384" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-112.png" title="plot of chunk unnamed-chunk-11" alt="plot of chunk unnamed-chunk-11" width="384" /></p>
</div>
<div id="alignmentstrack" class="section level3">
<h3>AlignmentsTrack</h3>
<p>Plots of aligned sequences, typically from next generation sequencing experiments can be quite helpful, for instance when visually inspecting the validity of a called SNP. Those alignments are usually stored in BAM files.</p>
<div id="rnaseq-experiment" class="section level4">
<h4>RNAseq experiment</h4>
<p>For this demonstration let’s use a small BAM file for which paired NGS reads have been mapped to an extract of the human hg19 genome. The data originate from an RNASeq experiment, and the alignments have been performed using the STAR aligner allowing for gaps. We also download some gene annotation data for that region from Biomart.</p>
<pre class="r"><code>afrom=2960000
ato=3160000
#bam file
alTrack <- AlignmentsTrack(system.file(package = "Gviz", "extdata", "gapped.bam"), isPaired = TRUE)
bmt <- BiomartGeneRegionTrack(genome = "hg19", chromosome = "chr12",
                              start = afrom, end = ato, filter = list(with_ox_refseq_mrna = TRUE),
                              stacking = "dense")
plotTracks(c(bmt, alTrack), from = afrom, to = ato, chromosome = "chr12")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-alignementtrack.png" title="plot of chunk alignementtrack" alt="plot of chunk alignementtrack" width="480" /></p>
<pre class="success"><code>Now this already shows us the general layout of the track: on top we have a panel with the read coverage information in the form of a histogram, and below that a pile-up view of the individual reads. There is
only a certain amount of vertical space available for the plotting, and not the whole depth of the pile-up
can be displayed here. This fact is indicated by the white downward-pointing arrows in the title panel. We
could address this issue by playing around with the max.height, min.height or stackHeight display parameters which all control the height or the vertical spacing of the stacked reads. Or we could reduce the size of the coverage section by setting the coverageHeight or the minCoverageHeight parameters.</code></pre>
<pre class="r"><code>plotTracks(c(bmt, alTrack), from = afrom, to = ato,
           chromosome = "chr12", min.height = 0, coverageHeight = 0.08,
           minCoverageHeight = 0)</code></pre>
<p>From that far out the pile-ups are not particularly useful, and we can turn those off by setting the type display parameter accordingly.</p>
<pre class="r"><code>plotTracks(c(alTrack, bmt), from = afrom, to = ato, chromosome = "chr12", type = "coverage")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-13.png" title="plot of chunk unnamed-chunk-13" alt="plot of chunk unnamed-chunk-13" width="480" /></p>
<p><strong>Let’s zoom in a bit further to check out the details of the pile-ups section:</strong></p>
<pre class="r"><code>plotTracks(c(bmt, alTrack), from = afrom + 12700,
           to = afrom + 15200, chromosome = "chr12")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-14.png" title="plot of chunk unnamed-chunk-14" alt="plot of chunk unnamed-chunk-14" width="480" /></p>
<pre class="success"><code>The direction of the individual reads is indicated by the arrow head, and read pairs are connect by a bright gray line. Gaps in the alignments are show by the connecting dark gray lines. On devices that support transparancy we can also see that some of the read pairs are actually overlapping. </code></pre>
<p>As mentioned before we can control whether the data should be treated as paired end or single end data by setting the isPaired argument in the constructor. Here is how we could take a look at the data in the same file, but in single end mode.</p>
<pre class="r"><code>alTrack <- AlignmentsTrack(system.file(package = "Gviz",
                                       "extdata", "gapped.bam"), isPaired = FALSE)
plotTracks(c(bmt, alTrack), from = afrom + 12700,
           to = afrom + 15200, chromosome = "chr12")</code></pre>
</div>
<div id="dnaseq-experiment" class="section level4">
<h4>DNAseq experiment</h4>
<p>To better show the features of the AlignmentsTrack for sequence variants we will load a different data set, this time from a whole genome DNASeq SNP calling experiment. Again the reference genome is hg19 and the alignments have been performed using Bowtie2.</p>
<p>We need to tell the AlignmentsTrack about the reference genome (sequenceTrack).</p>
<pre class="r"><code>afrom <- 44945200
ato <- 44947200
alTrack <- AlignmentsTrack(system.file(package = "Gviz","extdata", "snps.bam"), isPaired = TRUE)
plotTracks(c(alTrack, sTrack), chromosome = "chr21", from = afrom,to = ato)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-dnaseq-experiment.png" title="plot of chunk dnaseq-experiment" alt="plot of chunk dnaseq-experiment" width="480" /></p>
<pre class="success"><code>The mismatched bases are indicated on both the individual reads in the pileup section and also in the coverage plot in the form of a stacked histogram.</code></pre>
<p>When zooming in to one of the obvious heterozygous SNP positions we can reveal even more details.</p>
<pre class="r"><code>#Zoom
plotTracks(c(alTrack, sTrack), chromosome = "chr21", from = 44946590, to = 44946660)

#show individual letters
plotTracks(c(alTrack, sTrack), chromosome = "chr21",
           from = 44946590, to = 44946660, cex = 0.5, min.height = 8)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-161.png" title="plot of chunk unnamed-chunk-16" alt="plot of chunk unnamed-chunk-16" width="432" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-162.png" title="plot of chunk unnamed-chunk-16" alt="plot of chunk unnamed-chunk-16" width="432" /></p>
</div>
</div>
</div>
<div id="track-highlighting-and-overlays" class="section level2">
<h2>Track highlighting and overlays</h2>
<div id="highlighting" class="section level3">
<h3>Highlighting</h3>
<pre class="r"><code>#highlight
ht <- HighlightTrack(trackList = list(atrack, grtrack,dtrack),
                     start = c(26705000, 26720000), width = 7000, chromosome = 7)
plotTracks(list(itrack, gtrack, ht), from = lim[1], to = lim[2])</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-highlighting.png" title="plot of chunk highlighting" alt="plot of chunk highlighting" width="432" /></p>
</div>
<div id="overlays" class="section level3">
<h3>Overlays</h3>
<p>For certain applications it can make sense to overlay multiple tracks on the same area of the plot. For the purpose of an instructive example we will generate a second DataTrack object and combine it with the existing one from the second chapter.</p>
<pre class="r"><code>#create data
dat <- runif(100, min = -2, max = 22)
dtrack2 <- DataTrack(data = dat, start = coords[-length(coords)],
                     end = coords[-1], chromosome = chr, genome = gen,
                     name = "Uniform2", groups = factor("sample 2",levels = c("sample 1", "sample 2")),
                     legend = TRUE)
displayPars(dtrack) <- list(groups = factor("sample 1",levels = c("sample 1", "sample 2")), legend = TRUE)
ot <- OverlayTrack(trackList = list(dtrack2, dtrack))
ylims <- extendrange(range(c(values(dtrack), values(dtrack2))))
plotTracks(list(itrack, gtrack, ot), from = lim[1], to = lim[2], ylim = ylims, type = c("smooth", "p"))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-overlays.png" title="plot of chunk overlays" alt="plot of chunk overlays" width="480" /></p>
<p>On devices that support it, alpha blending can be a useful tool to tease out even more information out of track overlays, at least when comparing just a small number of samples. The resulting transparency effectively eliminates the problem of overplotting. The following example will only work if this vignette has been built on a system with alpha blending support.</p>
<pre class="r"><code>displayPars(dtrack) <- list(alpha.title = 1, alpha = 0.5)
displayPars(dtrack2) <- list(alpha.title = 1, alpha = 0.5)
ot <- OverlayTrack(trackList = list(dtrack, dtrack2))
plotTracks(list(itrack, gtrack, ot), from = lim[1],
           to = lim[2], ylim = ylims, type = c("hist"), window = 30)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/Gviz-visualize-genomic-data-unnamed-chunk-17.png" title="plot of chunk unnamed-chunk-17" alt="plot of chunk unnamed-chunk-17" width="480" /></p>
</div>
</div>
<div id="footnotes" class="section level2">
<h2>Footnotes</h2>
<ul>
<li>Gviz <a href="http://www.bioconductor.org/packages/release/bioc/html/Gviz.html">http://www.bioconductor.org/packages/release/bioc/html/Gviz.html</a></li>
</ul>
</div>
<div id="licence" class="section level2">
<h2>Licence</h2>
<p><a href="https://www.sthda.com/english/english/wiki/mit-licence">Licence</a></p>
</div>

<script>jQuery(document).ready(function () {jQuery('h1,h2,h3,h4').addClass('formatter-title');});//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->


<!-- END HTML -->]]></description>
			<pubDate>Mon, 29 Sep 2014 11:11:36 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[ggbio - Visualize genomic data]]></title>
			<link>https://www.sthda.com/english/wiki/ggbio-visualize-genomic-data</link>
			<guid>https://www.sthda.com/english/wiki/ggbio-visualize-genomic-data</guid>
			<description><![CDATA[<!-- START HTML -->


  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<div id="TOC">
<ul>
<li><a href="#building-your-first-track">Building your first track</a><ul>
<li><a href="#add-ideogram-track-plot-single-chromosome-with-cytoband">Add ideogram track : Plot single chromosome with cytoband</a></li>
<li><a href="#add-gene-model-track">Add gene model track</a></li>
<li><a href="#add-a-reference-track">Add a reference track</a></li>
<li><a href="#add-an-alignement-track">Add an alignement track</a></li>
<li><a href="#add-a-variants-track-vizualize-vcf-file">Add a variants track : vizualize vcf file</a></li>
</ul></li>
<li><a href="#building-your-tracks">Building your tracks</a></li>
<li><a href="#simple-navigation">Simple navigation</a></li>
<li><a href="#overview-plots">Overview plots</a><ul>
<li><a href="#circular-plots">Circular plots</a></li>
</ul></li>
<li><a href="#footnotes">Footnotes</a></li>
<li><a href="#licence">Licence</a></li>
</ul>
</div>

<hr />
<pre class="warning"><code>This analysis was performed using R (ver. 3.1.0).</code></pre>
<p><strong>ggbio</strong> is a package build on top of <strong>ggplot2()</strong> to visualize easily genomic data.</p>
<div id="building-your-first-track" class="section level2">
<h2>Building your first track</h2>
<p>In this chapter, you will learn : &amp;#136;1.How to add ideogram track. &amp;#136;2. How to add gene model track. 4. How to add reference track 3. How to add track from bam files to visualize coverage and mismatch summary. 4. How to add track for vcf file to visualize variants</p>
<div id="add-ideogram-track-plot-single-chromosome-with-cytoband" class="section level3">
<h3>Add ideogram track : Plot single chromosome with cytoband</h3>
<pre class="success"><code>hg19, hg18, mm10, mm9  as been built inside, so you don&amp;#39;t have download it on the fly.</code></pre>
<pre class="r"><code>library(ggbio)
#chr 1 is automatically drawn by default (subchr="chr1")
p.ideo <- Ideogram(genome = "hg19")
p.ideo
#Highlights a region on "chr2"
library(GenomicRanges)
p.ideo + xlim(GRanges("chr2", IRanges(1e8, 1e8+10000000)))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-ideogram1.png" title="plot of chunk ideogram" alt="plot of chunk ideogram" width="432" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-ideogram2.png" title="plot of chunk ideogram" alt="plot of chunk ideogram" width="432" /></p>
<p><span class="warning"> <strong>color</strong> and <strong>fill</strong> arguments can be used to change the color and the fill color of highlight region </span></p>
</div>
<div id="add-gene-model-track" class="section level3">
<h3>Add gene model track</h3>
<p>Gene model is composed of genetic features CDS, UTR, introns, exons and non-genetic region. ggbio supports three methods to make gene model track:</p>
<p><strong>OrganismDb object</strong>: recommended, support gene symbols and other combination of columns as label. <strong>TranscriptDb object</strong>: don’t support gene symbol labeling. <strong>GRangesList object</strong>: flexible, if you don’t have annotation package available for the first two methods, you could prepare a data set parsed from gtf file, you can simply use it and plot it as gene model track.</p>
<p><span class="success"> In this section, we’ll show, how to make gene model from OrganismDb and from TranscriptDb objects. To make a gene model from GRangesList object, see the <a href="http://www.bioconductor.org/packages/release/bioc/vignettes/ggbio/inst/doc/ggbio.pdf">vignette of ggbio package</a>. </span></p>
<div id="gene-model-from-organismdb-object" class="section level4">
<h4>Gene model from OrganismDb object</h4>
<p><strong>OrganismDb</strong> object has a simpler API to retrieve data from different annotation resources, so we could label our transcripts in different ways.</p>
<pre class="r"><code>library(ggbio)
library(Homo.sapiens)

#load gene symbol : GRanges, one gene/row
data(genesymbol, package = "biovizBase")
#retrieve information of the gene of interest
wh <- genesymbol[c("BRCA1", "NBR1")]
wh <- range(wh, ignore.strand = TRUE)

#Plot the different transcripts  for our gene of interest
p.txdb <- autoplot(Homo.sapiens, which = wh)
p.txdb

#Change inton geometry, use gap.geom
autoplot(Homo.sapiens, which = wh, gap.geom = "chevron")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-gene-model-from-organismdb1.png" title="plot of chunk gene-model-from-organismdb" alt="plot of chunk gene-model-from-organismdb" width="432" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-gene-model-from-organismdb2.png" title="plot of chunk gene-model-from-organismdb" alt="plot of chunk gene-model-from-organismdb" width="432" /></p>
<p>Different arguments to change colors : <strong>label.color</strong> (color of the label), <strong>color</strong>(line color) and <strong>fill</strong> (fill color of exons):</p>
<pre class="r"><code>autoplot(Homo.sapiens, which = wh, label.color = "black", color = "brown",
fill = "brown")</code></pre>
<pre class="success"><code>Label could be turned off by setting it to FALSE, you could also use expression to make a flexible label combination from column names.</code></pre>
<pre class="r"><code>columns(Homo.sapiens)</code></pre>
<pre><code>##  [1] "GOID"         "TERM"         "ONTOLOGY"     "DEFINITION"  
##  [5] "ENTREZID"     "PFAM"         "IPI"          "PROSITE"     
##  [9] "ACCNUM"       "ALIAS"        "CHR"          "CHRLOC"      
## [13] "CHRLOCEND"    "ENZYME"       "MAP"          "PATH"        
## [17] "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"     
## [21] "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "GENENAME"    
## [25] "UNIPROT"      "GO"           "EVIDENCE"     "GOALL"       
## [29] "EVIDENCEALL"  "ONTOLOGYALL"  "OMIM"         "UCSCKG"      
## [33] "CDSID"        "CDSNAME"      "CDSCHROM"     "CDSSTRAND"   
## [37] "CDSSTART"     "CDSEND"       "EXONID"       "EXONNAME"    
## [41] "EXONCHROM"    "EXONSTRAND"   "EXONSTART"    "EXONEND"     
## [45] "GENEID"       "TXID"         "EXONRANK"     "TXNAME"      
## [49] "TXCHROM"      "TXSTRAND"     "TXSTART"      "TXEND"</code></pre>
<pre class="r"><code>#Flexible label
autoplot(Homo.sapiens, which = wh, columns = c("GENENAME", "GO"), names.expr = "GENENAME::GO")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-flexible-label.png" title="plot of chunk flexible-label" alt="plot of chunk flexible-label" width="432" /></p>
</div>
<div id="gene-model-from-transcriptdb-object" class="section level4">
<h4>Gene model from TranscriptDb object</h4>
<pre class="warning"><code>TranscriptDb doesn&amp;#39;t contain any gene symbol information, so we use tx id as default for label.</code></pre>
<pre class="r"><code>library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
autoplot(txdb, which = wh)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-gene-model-from-transcriptdb.png" title="plot of chunk gene-model-from-transcriptdb" alt="plot of chunk gene-model-from-transcriptdb" width="432" /></p>
</div>
</div>
<div id="add-a-reference-track" class="section level3">
<h3>Add a reference track</h3>
<p>To add a reference track, we need to load a <strong>BSgenome object</strong> from the annotation package. You can choose to plot the sequence as text, rect, segment.</p>
<pre class="warning"><code>You can pass a zoom in factor into zoom function, if it&amp;#39;s over 1 it&amp;#39;s zooming out, if it&amp;#39;s smaller than 1 it&amp;#39;s zooming in.</code></pre>
<pre class="r"><code>library(BSgenome.Hsapiens.UCSC.hg19)
bg <- BSgenome.Hsapiens.UCSC.hg19
p.bg <- autoplot(bg, which = wh)
## no geom
p.bg
## segment
p.bg + zoom(1/100)
## rectangle
p.bg + zoom(1/1000)
## text
p.bg + zoom(1/2500)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-reference-track1.png" title="plot of chunk reference-track" alt="plot of chunk reference-track" width="384" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-reference-track2.png" title="plot of chunk reference-track" alt="plot of chunk reference-track" width="384" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-reference-track3.png" title="plot of chunk reference-track" alt="plot of chunk reference-track" width="384" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-reference-track4.png" title="plot of chunk reference-track" alt="plot of chunk reference-track" width="384" /></p>
<p>To override a zemantic zoom threshold, you simply provide a geom explicitly.</p>
<pre class="r"><code>library(BSgenome.Hsapiens.UCSC.hg19)
bg <- BSgenome.Hsapiens.UCSC.hg19
## force to use geom &amp;#39;segment&amp;#39; at this level
autoplot(bg, which = resize(wh, width = width(wh)/2000), geom = "segment")</code></pre>
</div>
<div id="add-an-alignement-track" class="section level3">
<h3>Add an alignement track</h3>
<div id="create-a-bam-object" class="section level4">
<h4>Create a bam object</h4>
<p>RSamtools package is required. The following code is just an example to create a bam object. This bam object can be used in autoplot function</p>
<pre class="r"><code>bam<-BamFile(file="file.bam", index="file.bai")
#use it for autoplot
autoplot(bam, which = wh)</code></pre>
</div>
<div id="visualize-bam-file" class="section level4">
<h4>Visualize bam file</h4>
<p><strong>ggbio</strong> supports visualization of alignments file stored in bam, <strong>autoplot</strong> method accepts :</p>
<ul>
<li>bam file path (indexed)</li>
<li>BamFile object</li>
</ul>
<pre class="success"><code>It&amp;#39;s simple to just pass a file path to autoplot function, you can stream a chunk of region by providing &amp;#39;which&amp;#39; parameter. Otherwise please use method &amp;#39;estiamte&amp;#39; to show overall estiamted coverage.</code></pre>
<pre class="r"><code>fl.bam <- system.file("extdata", "wg-brca1.sorted.bam", package = "biovizBase")
#keeps only the seqlevels in value and removes all others
wh <- keepSeqlevels(wh, "chr17")
autoplot(fl.bam, which = wh)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-view-bam-file.png" title="plot of chunk view-bam-file" alt="plot of chunk view-bam-file" width="384" /></p>
</div>
<div id="mismatch-proportion" class="section level4">
<h4>Mismatch proportion</h4>
<p>To show mismatch proportion, you have to provide reference sequence, the mismatched proportion is color coded in the bar chart.</p>
<pre class="r"><code>library(BSgenome.Hsapiens.UCSC.hg19)
bg <- BSgenome.Hsapiens.UCSC.hg19
p.mis <- autoplot(fl.bam, bsgenome = bg, which = wh, stat = "mismatch")
p.mis</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-mismatch-proportion.png" title="plot of chunk mismatch-proportion" alt="plot of chunk mismatch-proportion" width="432" /></p>
</div>
<div id="view-all-coverage-distribution" class="section level4">
<h4>View all coverage distribution</h4>
<p>To view overall estimated coverage distribution, please use method ‘estiamte’. ‘which’ parameter also accept characters. And there is a hidden value called ‘..coverage..’ to let you do simple transformation in aes().</p>
<pre class="r"><code>#View all coverage distribution
autoplot(fl.bam, method = "estimate")
#Select chromosomes of interest
#Log transformation of coverage
autoplot(fl.bam, method = "estimate", 
         which = paste0("chr", 17:18),
         aes(y = log(..coverage..)))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-coverage-distribution1.png" title="plot of chunk coverage-distribution" alt="plot of chunk coverage-distribution" width="432" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-coverage-distribution2.png" title="plot of chunk coverage-distribution" alt="plot of chunk coverage-distribution" width="432" /></p>
</div>
</div>
<div id="add-a-variants-track-vizualize-vcf-file" class="section level3">
<h3>Add a variants track : vizualize vcf file</h3>
<pre class="success"><code>This track is supported by semantic zoom.</code></pre>
<p>To view your variants file, you could : - Import it using package** VariantAnntoation as VCF object<strong>, then use autoplot - Convert it in </strong>VRanges** object and use autoplot - Simply provide vcf file path in <strong>autoplot()</strong></p>
<pre class="r"><code>library(VariantAnnotation)
fl.vcf <- system.file("extdata", "17-1409-CEU-brca1.vcf.bgz", package="biovizBase")
vcf <- readVcf(fl.vcf, "hg19")
vr <- as(vcf[, 1:3], "VRanges")
vr <- renameSeqlevels(vr, value = c("17" = "chr17"))
## small region contains data
gr17 <- GRanges("chr17", IRanges(41234400, 41234530))
p.vr <- autoplot(vr, which = wh)

## none geom
p.vr

## rect geom
p.vr + xlim(gr17)
## text geom
p.vr + xlim(gr17) + zoom()</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-vizualize-vcf1.png" title="plot of chunk vizualize-vcf" alt="plot of chunk vizualize-vcf" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-vizualize-vcf2.png" title="plot of chunk vizualize-vcf" alt="plot of chunk vizualize-vcf" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-vizualize-vcf3.png" title="plot of chunk vizualize-vcf" alt="plot of chunk vizualize-vcf" width="288" /></p>
<p>You can simply override geom</p>
<pre class="r"><code>autoplot(vr, which = wh, geom = "rect", arrow = FALSE)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-unnamed-chunk-4.png" title="plot of chunk unnamed-chunk-4" alt="plot of chunk unnamed-chunk-4" width="432" /></p>
</div>
</div>
<div id="building-your-tracks" class="section level2">
<h2>Building your tracks</h2>
<pre class="r"><code>gr17 <- GRanges("chr17", IRanges(41234415, 41234569))
tks <- tracks(p.ideo, mismatch = p.mis, dbSNP = p.vr, ref = p.bg, gene = p.txdb,
heights = c(2, 3, 3, 1, 4)) + xlim(gr17) + theme_tracks_sunset()
tks</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-tracks.png" title="plot of chunk tracks" alt="plot of chunk tracks" width="432" /></p>
</div>
<div id="simple-navigation" class="section level2">
<h2>Simple navigation</h2>
<p>You could zoom in and zoom out, or go through view chunks one by one.
 <strong>- zoom:</strong> put a factor inside and you can zoom in or zoom out
 <strong>- nextView:</strong> switch to next view
 <strong>- prevView:</strong> switch to previous view</p>
<pre class="r"><code>## zoom in
tks + zoom()
## zoom in with scale
p.txdb + zoom(1/8)
## zoom out
p.txdb + zoom(2)
## next view page
p.txdb + nextView()
## previous view page
p.txdb + prevView()</code></pre>
<pre class="warning"><code>Don&amp;#39;t forget xlim accept GRanges object (single row), so you could simply prepare a GRanges to store the region of interests and go through them one by one.</code></pre>
</div>
<div id="overview-plots" class="section level2">
<h2>Overview plots</h2>
<p>Overview is a good way to show all events at the same time, give overall summary statistics for the whole genome. In this chapter, we will introduce three different layouts that are used a lots in genomic data visualization.</p>
<div id="circular-plots" class="section level3">
<h3>Circular plots</h3>
<p>We are going to visualize somatic mutation as segment.</p>
<p><span class="warning"> - rule of thumb seqlengths, seqlevels and chromosomes names should be exactly the same. - to use circle, you have to use ggbio constructor at the beginning instead of ggplot. </span></p>
<p>All the raw data processed and stored in GRanges ready for use, you can simply load the sample data from <strong>biovizBase</strong></p>
<div id="load-data" class="section level4">
<h4>Load data</h4>
<pre class="r"><code>#Load the data
data("CRC", package = "biovizBase")
head(hg19sub)</code></pre>
<pre><code>## GRanges with 6 ranges and 0 metadata columns:
##       seqnames         ranges strand
##          <Rle>      <IRanges>  <Rle>
##   [1]        1 [1, 249250621]      *
##   [2]        2 [1, 243199373]      *
##   [3]        3 [1, 198022430]      *
##   [4]        4 [1, 191154276]      *
##   [5]        5 [1, 180915260]      *
##   [6]        6 [1, 171115067]      *
##   ---
##   seqlengths:
##            1         2         3 ...        20        21        22
##    249250621 243199373 198022430 ...  63025520  48129895  51304566</code></pre>
</div>
<div id="create-ideogram-label-and-scale-track" class="section level4">
<h4>Create ideogram, label and scale track</h4>
<p>The function layouts the circle by the order you created from inside to outside.</p>
<pre class="r"><code>p <- ggbio() + circle(hg19sub, geom = "ideo", fill = "gray70") + #Ideogram
circle(hg19sub, geom = "scale", size = 2) + #Scale
circle(hg19sub, geom = "text", aes(label = seqnames), vjust = 0, size = 3) # label
p #print plot</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-unnamed-chunk-7.png" title="plot of chunk unnamed-chunk-7" alt="plot of chunk unnamed-chunk-7" width="384" /></p>
</div>
<div id="show-somatic-mutation" class="section level4">
<h4>Show somatic mutation</h4>
<p>We add a “rectangle” track to show somatic mutation.</p>
<pre class="r"><code>head(mut.gr)</code></pre>
<pre><code>## GRanges with 6 ranges and 10 metadata columns:
##       seqnames                 ranges strand | Hugo_Symbol Entrez_Gene_Id
##          <Rle>              <IRanges>  <Rle> |    <factor>      <integer>
##   [1]        1 [ 11003085,  11003085]      + |      TARDBP          23435
##   [2]        1 [ 62352395,  62352395]      + |       INADL          10207
##   [3]        1 [194960885, 194960885]      + |         CFH           3075
##   [4]        2 [ 10116508,  10116508]      - |        CYS1         192668
##   [5]        2 [ 33617747,  33617747]      + |     RASGRP3          25780
##   [6]        2 [ 73894280,  73894280]      + |     C2orf78         388960
##         Center NCBI_Build   Strand Variant_Classification Variant_Type
##       <factor>  <integer> <factor>               <factor>     <factor>
##   [1]    Broad         36        +               Missense          SNP
##   [2]    Broad         36        +               Missense          SNP
##   [3]    Broad         36        +               Missense          SNP
##   [4]    Broad         36        -               Missense          SNP
##   [5]    Broad         36        +               Missense          SNP
##   [6]    Broad         36        +               Missense          SNP
##       Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2
##               <factor>          <factor>          <factor>
##   [1]                G                 G                 A
##   [2]                T                 T                 G
##   [3]                G                 G                 A
##   [4]                C                 C                 T
##   [5]                C                 C                 T
##   [6]                T                 T                 C
##   ---
##   seqlengths:
##            1         2         3 ...        20        21        22
##    249250621 243199373 198022430 ...  63025520  48129895  51304566</code></pre>
<pre class="r"><code>p <- ggbio() + circle(mut.gr, geom = "rect", color = "steelblue") + #somatic mutation
circle(hg19sub, geom = "ideo", fill = "gray70") +#Ideogram
circle(hg19sub, geom = "scale", size = 2) +#Scale
circle(hg19sub, geom = "text", aes(label = seqnames), vjust = 0, size = 3)#label
p</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/ggbio-show-mutation.png" title="plot of chunk show-mutation" alt="plot of chunk show-mutation" width="432" /></p>
<p><span class="warning"> Many other examples are available in the <a href="http://www.bioconductor.org/packages/release/bioc/html/ggbio.html">ggbio package vignette</a> </span></p>
</div>
</div>
</div>
<div id="footnotes" class="section level2">
<h2>Footnotes</h2>
<ul>
<li>ggbio <a href="http://www.bioconductor.org/packages/release/bioc/html/ggbio.html">http://www.bioconductor.org/packages/release/bioc/html/ggbio.html</a></li>
</ul>
</div>
<div id="licence" class="section level2">
<h2>Licence</h2>
<p><a href="https://www.sthda.com/english/english/wiki/mit-licence">Licence</a></p>
</div>

<script>jQuery(document).ready(function () {jQuery('h1,h2,h3,h4').addClass('formatter-title');});//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Sun, 28 Sep 2014 11:49:42 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Visualize NGS data with R and Bioconductor]]></title>
			<link>https://www.sthda.com/english/wiki/visualize-ngs-data-with-r-and-bioconductor</link>
			<guid>https://www.sthda.com/english/wiki/visualize-ngs-data-with-r-and-bioconductor</guid>
			<description><![CDATA[<!-- START HTML -->
            
  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">
<div id="TOC">
<ul>
<li><a href="#load-packages-and-data">Load packages and data</a></li>
<li><a href="#simple-plot">Simple plot</a></li>
<li><a href="#extracting-the-gene-of-interest-using-the-transcript-database">Extracting the gene of interest using the transcript database</a></li>
<li><a href="#gviz">Gviz</a></li>
<li><a href="#ggbio">ggbio</a></li>
<li><a href="#footnotes">Footnotes</a></li>
<li><a href="#licence">Licence</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>

<hr />
<pre class="warning"><code>This analysis was performed using R (ver. 3.1.0).</code></pre>
<div id="load-packages-and-data" class="section level2">
<h2>Load packages and data</h2>
<p>We’re going to use the RNA sequencing experiment in the passilaBamSubset package.</p>
<pre class="r"><code>#biocLite("pasillaBamSubset")
#biocLite("TxDb.Dmelanogaster.UCSC.dm3.ensGene")
library(pasillaBamSubset)
#Genes annotated in transcript database
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
#The 2 files of interest
fl1 <- untreated1_chr4()
fl2 <- untreated3_chr4()</code></pre>
<p>fl1 and fl2 are bam file from RNA sequencing data of Drosophila.</p>
</div>
<div id="simple-plot" class="section level2">
<h2>Simple plot</h2>
<pre class="r"><code>library(GenomicRanges)</code></pre>
<pre class="notice"><code>Note: if you are using Bioconductor version 14, paired with R 3.1, you should also load the following library. You do not need to load this library, and it will not be available to you, if you are using Bioconductor version 13, paired with R 3.0.x.</code></pre>
<pre class="r"><code>library(GenomicAlignments)</code></pre>
<p>We read in the alignments from the file <code>fl1</code>. Then we use the <code>coverage</code> function to tally up the base pair coverage. We then extract the subset of coverage which overlaps our gene of interest (<strong>lgs</strong> gene), and convert this coverage from an <code>RleList</code> into a <code>numeric</code> vector. <code>Rle</code> objects are compressed, such that repeating numbers are stored as a number and a length.</p>
<pre class="r"><code># x is a class of GAlignments, Each row is a read
x <- readGAlignments(fl1)
#Coverage of the reads : this will generate a RleList
#Rle is a run-length encoding (week2)
xcov <- coverage(x)
xcov</code></pre>
<pre><code>## RleList of length 8
## $chr2L
## integer-Rle of length 23011544 with 1 run
##   Lengths: 23011544
##   Values :        0
## 
## $chr2R
## integer-Rle of length 21146708 with 1 run
##   Lengths: 21146708
##   Values :        0
## 
## $chr3L
## integer-Rle of length 24543557 with 1 run
##   Lengths: 24543557
##   Values :        0
## 
## $chr3R
## integer-Rle of length 27905053 with 1 run
##   Lengths: 27905053
##   Values :        0
## 
## $chr4
## integer-Rle of length 1351857 with 122061 runs
##   Lengths:  891   27    5   12   13   45 ...    3  106   75 1600   75 1659
##   Values :    0    1    2    3    4    5 ...    6    0    1    0    1    0
## 
## ...
## <3 more elements></code></pre>
<pre class="r"><code>#Extract one element of the list
#we have zero coverage for the first 891 base pairs,
#and then we have coverage of one for 27 base pairs, etc.
xcov$chr4</code></pre>
<pre><code>## integer-Rle of length 1351857 with 122061 runs
##   Lengths:  891   27    5   12   13   45 ...    3  106   75 1600   75 1659
##   Values :    0    1    2    3    4    5 ...    6    0    1    0    1    0</code></pre>
<pre class="r"><code>#Let&amp;#39;s zoom in now to range which is near this gene of interest, LGS.
z <- GRanges("chr4",IRanges(456500,466000))

# only available for Bioconductor 2.14
xcov[z]#subset the coverage of the region of interest</code></pre>
<pre><code>## RleList of length 1
## $chr4
## integer-Rle of length 9501 with 1775 runs
##   Lengths: 1252   10   52    4    7    2 ...   10    7   12  392   75 1041
##   Values :    0    2    3    4    5    6 ...    3    2    1    0    1    0</code></pre>
<pre class="r"><code># Equivalent in Bioconductor 2.13
#xcov$chr4[ranges(z)]# Works for all version of Bioconductor

#Plot of the coverage arround the region of interest
xnum <- as.numeric(xcov$chr4[ranges(z)])#Uncompress the coverage
plot(xnum)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-coverage-in-r.png" title="plot of chunk coverage-in-r" alt="plot of chunk coverage-in-r" width="384" /></p>
<p>We can do the same for another file: So because fl2 is a paired in sequencing experiment, we now have pairs of reads which corresponds to one fragment.</p>
<pre class="r"><code>y <- readGAlignmentPairs(fl2)
ycov <- coverage(y)
ynum <- as.numeric(ycov$chr4[ranges(z)])
plot(xnum, type="l", col="blue", lwd=2)
lines(ynum, col="red", lwd=2)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-coverage-in-r-2.png" title="plot of chunk coverage-in-r-2" alt="plot of chunk coverage-in-r-2" width="384" /></p>
<p>We can zoom in on a single exon, between the area of 6 000 base pairs:</p>
<pre class="r"><code>plot(xnum, type="l", col="blue", lwd=2, xlim=c(6200,6600))
lines(ynum, col="red", lwd=2)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-coverage-in-r-exon.png" title="plot of chunk coverage-in-r-exon" alt="plot of chunk coverage-in-r-exon" width="384" /></p>
</div>
<div id="extracting-the-gene-of-interest-using-the-transcript-database" class="section level2">
<h2>Extracting the gene of interest using the transcript database</h2>
<p>Extract information about gene of interest.</p>
<p>Suppose we are interested in visualizing the gene <em>lgs</em>. We can extract it from the transcript database <code>TxDb.Dmelanogaster.UCSC.dm3.ensGene</code> on Bioconductor, but first we need to look up the Ensembl gene name. We will use the functions that we learned in the previous chapter.</p>
<pre class="r"><code># biocLite("biomaRt")
library(biomaRt)
#load the drosophila ensemble gene BioMart.
m <- useMart("ensembl", dataset = "dmelanogaster_gene_ensembl")
lf <- listFilters(m)
lf[grep("name", lf$description, ignore.case=TRUE),]</code></pre>
<pre><code>##                             name
## 1                chromosome_name
## 12  with_flybasename_translation
## 15   with_flybasename_transcript
## 19         with_flybasename_gene
## 62              flybasename_gene
## 63        flybasename_transcript
## 64       flybasename_translation
## 86                 wikigene_name
## 98                go_parent_name
## 194               so_parent_name
##                                      description
## 1                                Chromosome name
## 12                with FlyBaseName protein ID(s)
## 15             with FlyBaseName transcript ID(s)
## 19                   with FlyBaseName gene ID(s)
## 62           FlyBaseName Gene ID(s) [e.g. cul-2]
## 63  FlyBaseName Transcript ID(s) [e.g. cul-2-RB]
## 64     FlyBaseName Protein ID(s) [e.g. cul-2-PB]
## 86                 WikiGene Name(s) [e.g. Ir21a]
## 98                              Parent term name
## 194                             Parent term name</code></pre>
<pre class="r"><code>#get the ensembl gene name
map <- getBM(mart = m,
  attributes = c("ensembl_gene_id", "flybasename_gene"),
  filters = "flybasename_gene", 
  values = "lgs")
map</code></pre>
<pre><code>##   ensembl_gene_id flybasename_gene
## 1     FBgn0039907              lgs</code></pre>
<p>Now we extract the exons for each gene, and then the exons for the gene <em>lgs</em>.</p>
<pre class="r"><code>#get the exons out of the transcript database.
library(GenomicFeatures)
grl <- exonsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, by="gene")
gene <- grl[[map$ensembl_gene_id[1]]]
#View the 6 exons of lgs gene
gene</code></pre>
<pre><code>## GRanges with 6 ranges and 2 metadata columns:
##       seqnames           ranges strand |   exon_id   exon_name
##          <Rle>        <IRanges>  <Rle> | <integer> <character>
##   [1]     chr4 [457583, 459544]      - |     63350        <NA>
##   [2]     chr4 [459601, 459791]      - |     63351        <NA>
##   [3]     chr4 [460074, 462077]      - |     63352        <NA>
##   [4]     chr4 [462806, 463015]      - |     63353        <NA>
##   [5]     chr4 [463490, 463780]      - |     63354        <NA>
##   [6]     chr4 [463839, 464533]      - |     63355        <NA>
##   ---
##   seqlengths:
##        chr2L     chr2R     chr3L ...   chrXHet   chrYHet chrUextra
##     23011544  21146708  24543557 ...    204112    347038  29004656</code></pre>
<p>Finally we can plot these ranges to see what it looks like:</p>
<pre class="r"><code>#Plot each exon as an arrow
rg <- range(gene)
plot(c(start(rg), end(rg)), c(0,0), type="n", xlab=seqnames(gene)[1], ylab="")
arrows(start(gene),rep(0,length(gene)),
       end(gene),rep(0,length(gene)),
       lwd=3, length=.1)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-unnamed-chunk-6.png" title="plot of chunk unnamed-chunk-6" alt="plot of chunk unnamed-chunk-6" width="288" /></p>
<p>But actually, the gene is on the minus strand. We should add a line which corrects for minus strand genes:</p>
<p>If it’s a plus strand, then use code=2 and that means put the arrow head at the end. If it’s a minus strand gene, use code=1 and that means to put an arrow at the start.</p>
<pre class="r"><code>plot(c(start(rg), end(rg)), c(0,0), type="n", xlab=seqnames(gene)[1], ylab="")
arrows(start(gene),rep(0,length(gene)),
       end(gene),rep(0,length(gene)),
       lwd=3, length=.1, 
       code=ifelse(as.character(strand(gene)[1]) == "+", 2, 1))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-unnamed-chunk-7.png" title="plot of chunk unnamed-chunk-7" alt="plot of chunk unnamed-chunk-7" width="288" /></p>
<pre class="success"><code>In the next units we&amp;#39;re going to continue. And I&amp;#39;m going to show two packages for visualizing
genomic data in Bioconductor which allow you to avoid rewriting all of this messy code every time you want to draw exons or coverage.</code></pre>
</div>
<div id="gviz" class="section level2">
<h2>Gviz</h2>
<p>We will briefly show two packages for visualizing genomic data in Bioconductor. Note that each of these have extensive vignettes for plotting many kinds of data. We will show here how to make the coverage plots as before:</p>
<pre class="r"><code>#biocLite("Gviz")
library(Gviz)
#You set up a Genome Axis Track
gtrack <- GenomeAxisTrack()

#specify an Annotation Track
atrack <- AnnotationTrack(gene, name = "Gene Model")
plotTracks(list(gtrack, atrack))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-gviz-plot-track.png" title="plot of chunk gviz-plot-track" alt="plot of chunk gviz-plot-track" width="480" /></p>
<p>The GVIZ package also allows you to draw data. We can look at the coverage, for instance. We have the coverage already as an RleList. In order to plot coverage using the GVIZ package, you need to turn the data into a GRanges object.</p>
<p><code>Gviz</code> expects that data will be provided as <code>GRanges</code> objects, so we convert the <code>RleList</code> coverage to a <code>GRanges</code> object:</p>
<pre class="r"><code>#Convert coverage to GRanges object
xgr <- as(xcov, "GRanges")
ygr <- as(ycov, "GRanges")
#So we have zero coverage for the first four chromosomes, and then it says,
#chromosome four starts out with 891 base pairs of zero coverage.
#And that was the same information that we had here, 891 base pairs of zero coverage.
xgr</code></pre>
<pre><code>## GRanges with 122068 ranges and 1 metadata column:
##            seqnames              ranges strand   |     score
##               <Rle>           <IRanges>  <Rle>   | <integer>
##        [1]    chr2L       [1, 23011544]      *   |         0
##        [2]    chr2R       [1, 21146708]      *   |         0
##        [3]    chr3L       [1, 24543557]      *   |         0
##        [4]    chr3R       [1, 27905053]      *   |         0
##        [5]     chr4       [1,      891]      *   |         0
##        ...      ...                 ...    ... ...       ...
##   [122064]     chr4 [1350124,  1350198]      *   |         1
##   [122065]     chr4 [1350199,  1351857]      *   |         0
##   [122066]     chrM [      1,    19517]      *   |         0
##   [122067]     chrX [      1, 22422827]      *   |         0
##   [122068]  chrYHet [      1,   347038]      *   |         0
##   ---
##   seqlengths:
##       chr2L    chr2R    chr3L    chr3R     chr4     chrM     chrX  chrYHet
##    23011544 21146708 24543557 27905053  1351857    19517 22422827   347038</code></pre>
<pre class="r"><code>#create two datatracks
#plot coverage which overlap z
dtrack1 <- DataTrack(xgr[xgr %over% z], name = "sample 1")
dtrack2 <- DataTrack(ygr[ygr %over% z], name = "sample 2")
plotTracks(list(gtrack, atrack, dtrack1, dtrack2))
plotTracks(list(gtrack, atrack, dtrack1, dtrack2), type="polygon")</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-gviz-plot-coverage1.png" title="plot of chunk gviz-plot-coverage" alt="plot of chunk gviz-plot-coverage" width="384" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-gviz-plot-coverage2.png" title="plot of chunk gviz-plot-coverage" alt="plot of chunk gviz-plot-coverage" width="384" /></p>
</div>
<div id="ggbio" class="section level2">
<h2>ggbio</h2>
<p>GGBIO builds off of the GGPLOT2 package, which is a whole other way of drawing plots in R. ggbio makes thing very easy. If you indicate the BAM file and the range of interest, it will read in the BAM file, parse the coverage, read the alignments, extract the information, and draw this nice plot.</p>
<pre class="r"><code>#biocLite("ggbio")
library(ggbio)
#autoplot gene model
autoplot(gene)
autoplot(fl1, which=z)
autoplot(fl2, which=z)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-ggbio1.png" title="plot of chunk ggbio" alt="plot of chunk ggbio" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-ggbio2.png" title="plot of chunk ggbio" alt="plot of chunk ggbio" width="288" /><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/visualize-ngs-data-with-r-and-bioconductor-ggbio3.png" title="plot of chunk ggbio" alt="plot of chunk ggbio" width="288" /></p>
</div>
<div id="footnotes" class="section level2">
<h2>Footnotes</h2>
<ul>
<li><p>IGV <a href="https://www.broadinstitute.org/igv/home">https://www.broadinstitute.org/igv/home</a></p></li>
<li><p>Gviz <a href="http://www.bioconductor.org/packages/release/bioc/html/Gviz.html">http://www.bioconductor.org/packages/release/bioc/html/Gviz.html</a></p></li>
<li><p>ggbio <a href="http://www.bioconductor.org/packages/release/bioc/html/ggbio.html">http://www.bioconductor.org/packages/release/bioc/html/ggbio.html</a></p></li>
<li><p>UCSC Genome Browser: zooms and scrolls over chromosomes, showing the work of annotators worldwide <a href="http://genome.ucsc.edu/">http://genome.ucsc.edu/</a></p></li>
<li><p>Ensembl genome browser: genome databases for vertebrates and other eukaryotic species <a href="http://ensembl.org">http://ensembl.org</a></p></li>
<li><p>Roadmap Epigenome browser: public resource of human epigenomic data <a href="http://www.epigenomebrowser.org">http://www.epigenomebrowser.org</a> <a href="http://genomebrowser.wustl.edu/">http://genomebrowser.wustl.edu/</a> <a href="http://epigenomegateway.wustl.edu/">http://epigenomegateway.wustl.edu/</a></p></li>
<li><p>Circos: designed for visualizing genomic data in a cirlce <a href="http://circos.ca/">http://circos.ca/</a></p></li>
<li><p>SeqMonk: a tool to visualise and analyse high throughput mapped sequence data <a href="http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/">http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/</a></p></li>
</ul>
</div>
<div id="licence" class="section level2">
<h2>Licence</h2>
<p><a href="https://www.sthda.com/english/english/wiki/mit-licence">Licence</a></p>
</div>
<div id="references" class="section level2">
<h2>References</h2>
<p><a href="https://github.com/genomicsclass">https://github.com/genomicsclass</a></p>
</div>

<script>jQuery(document).ready(function () {jQuery('h1,h2,h3,h4').addClass('formatter-title');});//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Fri, 26 Sep 2014 04:32:38 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Visualizing next geration sequencing data]]></title>
			<link>https://www.sthda.com/english/wiki/visualizing-next-geration-sequencing-data</link>
			<guid>https://www.sthda.com/english/wiki/visualizing-next-geration-sequencing-data</guid>
			<description><![CDATA[We will try four ways to look at NGS coverage: using the standalone <strong>Java program IGV</strong>, using simple <strong>plot</strong> commands, and using the <strong>Gviz</strong> and <strong>ggbio</strong> packages in Bioconductor.<br />]]></description>
			<pubDate>Fri, 26 Sep 2014 01:54:43 +0200</pubDate>
			
		</item>
		
	</channel>
</rss>
