<?xml version="1.0" encoding="UTF-8" ?>
<!-- RSS generated by PHPBoost on Wed, 13 May 2026 07:48:41 +0200 -->

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Easy Guides]]></title>
		<atom:link href="https://www.sthda.com/english/syndication/rss/wiki/12" rel="self" type="application/rss+xml"/>
		<link>https://www.sthda.com</link>
		<description><![CDATA[Last articles of the category: Basic Bioconductor infrastructure]]></description>
		<copyright>(C) 2005-2026 PHPBoost</copyright>
		<language>en</language>
		<generator>PHPBoost</generator>
		
		
		<item>
			<title><![CDATA[ExpressionSet and SummarizedExperiment]]></title>
			<link>https://www.sthda.com/english/wiki/expressionset-and-summarizedexperiment</link>
			<guid>https://www.sthda.com/english/wiki/expressionset-and-summarizedexperiment</guid>
			<description><![CDATA[<!-- START HTML -->
<!-- START HTML --> <!--====================== start from here when you copy to sthda================--> <div id="rdoc"> <div id="TOC"> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#expressionset">ExpressionSet</a></li> <li><a href="#summarized-experiment">Summarized Experiment</a></li> <li><a href="#licence">Licence</a></li> <li><a href="#references">References</a></li> </ul> </div> <pre class="warning"><code>This analysis was performed using R (ver. 3.1.0).</code></pre> <div id="introduction" class="section level2"> <h2>Introduction</h2> <p>The <strong>ExpressionSet</strong> is generally used for array-based experiments, where the rows are features, and the <strong>SummarizedExperiment</strong> is generally used for sequencing-based experiments, where the rows are GenomicRanges. ExpressionSet is in the Biobase library.</p> <p>There’s a library GEOquery which lets you pull down ExpressionSets by identifier.</p> <pre class="r"><code>library(Biobase) #Download data from GEO library(GEOquery) geoq <- getGEO("GSE9514") #The list has a single element #Save it to letter e for simplicity names(geoq)</code></pre> <pre><code>## [1] "GSE9514_series_matrix.txt.gz"</code></pre> <pre class="r"><code>e <- geoq[[1]]</code></pre> </div> <div id="expressionset" class="section level2"> <h2>ExpressionSet</h2> <p>ExpressionSets are basically matrices with a lot of metadata around them. Here, we have a matrix which is 9,000 by 8. It has phenotypic data, feature and annotation information. You can use the functions dim, ncol, nrow to have the dimension, the number of columns and rows respectively.</p> <p>The <strong>matrix of the expression data</strong>, is stored in the <strong>exprs</strong> slot and you can access that with the exprs function. The <strong>phenotypic data</strong>, can be accessed using <strong>pData</strong> and this gives us a data frame, which is information about the samples, including the accession number, the submission date, etc). The <strong>feature data</strong> is accessible with <strong>fData</strong> function and this is information about genes or probe sets. The names of the data in the feature data are, for example, the gene title, the gene symbol, the ENREZ_Gene_ID or Gene Ontology information, which might be useful for downstream analysis.</p> <pre class="r"><code>#Number of dimension dim(e)</code></pre> <pre><code>## Features Samples ## 9335 8</code></pre> <pre class="r"><code>#The expression matrix exprs(e)[1:3,1:3]</code></pre> <pre><code>## GSM241146 GSM241147 GSM241148 ## 10000_at 15.33 9.459 7.985 ## 10001_at 283.47 300.729 270.016 ## 10002_i_at 2569.45 2382.815 2711.814</code></pre> <pre class="r"><code>#Phenotypic data: information about the samples pData(e)[1:3,1:6]</code></pre> <pre><code>## title ## GSM241146 hem1 strain grown in YPD with 250 uM ALA (08-15-06_Philpott_YG_S98_1) ## GSM241147 WT strain grown in YPD under Hypoxia (08-15-06_Philpott_YG_S98_10) ## GSM241148 WT strain grown in YPD under Hypoxia (08-15-06_Philpott_YG_S98_11) ## geo_accession status submission_date ## GSM241146 GSM241146 Public on Nov 06 2007 Nov 02 2007 ## GSM241147 GSM241147 Public on Nov 06 2007 Nov 02 2007 ## GSM241148 GSM241148 Public on Nov 06 2007 Nov 02 2007 ## last_update_date type ## GSM241146 Aug 14 2011 RNA ## GSM241147 Aug 14 2011 RNA ## GSM241148 Aug 14 2011 RNA</code></pre> <pre class="r"><code>dim(pData(e))</code></pre> <pre><code>## [1] 8 31</code></pre> <pre class="r"><code>#Column names of the phenotypic data names(pData(e))</code></pre> <pre><code>## [1] "title" "geo_accession" ## [3] "status" "submission_date" ## [5] "last_update_date" "type" ## [7] "channel_count" "source_name_ch1" ## [9] "organism_ch1" "characteristics_ch1" ## [11] "molecule_ch1" "extract_protocol_ch1" ## [13] "label_ch1" "label_protocol_ch1" ## [15] "taxid_ch1" "hyb_protocol" ## [17] "scan_protocol" "description" ## [19] "data_processing" "platform_id" ## [21] "contact_name" "contact_email" ## [23] "contact_department" "contact_institute" ## [25] "contact_address" "contact_city" ## [27] "contact_state" "contact_zip/postal_code" ## [29] "contact_country" "supplementary_file" ## [31] "data_row_count"</code></pre> <pre class="r"><code>#Feature data : information about genes or probe sets fData(e)[1:3,1:3]</code></pre> <pre><code>## ID ORF SPOT_ID ## 10000_at 10000_at YLR331C <NA> ## 10001_at 10001_at YLR332W <NA> ## 10002_i_at 10002_i_at YLR333C <NA></code></pre> <pre class="r"><code>dim(fData(e))</code></pre> <pre><code>## [1] 9335 17</code></pre> <pre class="r"><code>names(fData(e))</code></pre> <pre><code>## [1] "ID" "ORF" ## [3] "SPOT_ID" "Species Scientific Name" ## [5] "Annotation Date" "Sequence Type" ## [7] "Sequence Source" "Target Description" ## [9] "Representative Public ID" "Gene Title" ## [11] "Gene Symbol" "ENTREZ_GENE_ID" ## [13] "RefSeq Transcript ID" "SGD accession number" ## [15] "Gene Ontology Biological Process" "Gene Ontology Cellular Component" ## [17] "Gene Ontology Molecular Function"</code></pre> <pre class="r"><code>head(fData(e)$"Gene Symbol")</code></pre> <pre><code>## [1] JIP3 MID2 RPS25B NUP2 ## 4869 Levels: ACO1 ARV1 ATP14 BOP2 CDA1 CDA2 CDC25 CDC3 CDD1 CTS1 ... Il4</code></pre> <pre class="r"><code>head(rownames(e))</code></pre> <pre><code>## [1] "10000_at" "10001_at" "10002_i_at" "10003_f_at" "10004_at" ## [6] "10005_at"</code></pre> <pre class="r"><code>#experiment data: experimenter name, laboratory, contact, abstract. #It's empty sometimes experimentData(e)</code></pre> <pre><code>## Experiment data ## Experimenter name: ## Laboratory: ## Contact information: ## Title: ## URL: ## PMIDs: ## No abstract available.</code></pre> <pre class="r"><code>#Annotation plateform annotation(e)</code></pre> <pre><code>## [1] "GPL90"</code></pre> </div> <div id="summarized-experiment" class="section level2"> <h2>Summarized Experiment</h2> <p>I’m going to load a bioconductor annotation package, which is the parathyroid SummarizedExperiment library. The loaded data is a SummarizedExperiment, which summarizes counts of RNA sequencing reads in genes for an experiment on human cell culture. The SummarizedExperiment object has 63,000 rows, which are genes, and 27 columns, which are samples, and the matrix, in this case, is called counts. And we have the row names, which are ensemble genes, and metadata about the row data, and metadata about the column data.</p> <pre class="r"><code>library(parathyroidSE) #RNA sequencing reads data(parathyroidGenesSE) se <- parathyroidGenesSE se</code></pre> <pre><code>## class: SummarizedExperiment ## dim: 63193 27 ## exptData(1): MIAME ## assays(1): counts ## rownames(63193): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99 ## rowData metadata column names(0): ## colnames: NULL ## colData names(8): run experiment ... study sample</code></pre> <p><strong>assay function</strong> can be used get access to the counts of RNA sequencing reads. <strong>colData function</strong> , the column data, is equivalent to the pData on the ExpressionSet. Each row in this data frame corresponds to a column in the SummarizedExperiment. We can see that there are indeed 27 rows here, which give information about the columns. Each sample in this case is treated with two treatments or control and we can see the number of replicates for each, using the as.numeric function again.</p> <pre class="r"><code>#Dimension of the SummarizedExperiment dim(se)</code></pre> <pre><code>## [1] 63193 27</code></pre> <pre class="r"><code>#Get access to the counts of RNA sequencing reads, using assay function. assay(se)[1:3,1:3]</code></pre> <pre><code>## [,1] [,2] [,3] ## ENSG00000000003 792 1064 444 ## ENSG00000000005 4 1 2 ## ENSG00000000419 294 282 164</code></pre> <pre class="r"><code>#Dimensions of this assay is a matrix, which has the same dimensions as the SummarizedExperiment. dim(assay(se))</code></pre> <pre><code>## [1] 63193 27</code></pre> <pre class="r"><code>#Get information about samples colData(se)[1:3,1:6]</code></pre> <pre><code>## DataFrame with 3 rows and 6 columns ## run experiment patient treatment time submission ## <character> <factor> <factor> <factor> <factor> <factor> ## 1 SRR479052 SRX140503 1 Control 24h SRA051611 ## 2 SRR479053 SRX140504 1 Control 48h SRA051611 ## 3 SRR479054 SRX140505 1 DPN 24h SRA051611</code></pre> <pre class="r"><code>#dimension of column data dim(colData(se))</code></pre> <pre><code>## [1] 27 8</code></pre> <pre class="r"><code>#characteristics of the samples names(colData(se))</code></pre> <pre><code>## [1] "run" "experiment" "patient" "treatment" "time" ## [6] "submission" "study" "sample"</code></pre> <pre class="r"><code>#Get access to treatment column of sample characteristics colData(se)$treatment</code></pre> <pre><code>## [1] Control Control DPN DPN OHT OHT Control Control ## [9] DPN DPN DPN OHT OHT OHT Control Control ## [17] DPN DPN OHT OHT Control DPN DPN DPN ## [25] OHT OHT OHT ## Levels: Control DPN OHT</code></pre> <p>The rows in this case correspond to genes. Genes are collections of exons. The rows of the SummarizedExperiment is a GRangesList where each row corresponds to a GRanges which contains the exons, which were used to count the RNA sequencing reads. Some metadata are included in the row data and is accessible with the <strong>metadata function</strong>. This information tells us, how this GRangesList was constructed. if it was constructed from the genomic features package using a transcript database. Homo sapiens was the organism, and the database was ENSEMBL GENES number 72, and etc. In addition, there’s some more information under experiment data, using exptData and then specifying the MIAME, which is minimal information about a microarray experiment, Although we’re not using microarrays, we’ve still used the same slots to describe extra information about this object.</p> <pre class="r"><code>#Extract out a single GRanges object: 17 ranges and 2 metada columns which is #the ensembl id for the exon. rowData(se)[1]</code></pre> <pre><code>## GRangesList of length 1: ## $ENSG00000000003 ## GRanges with 17 ranges and 2 metadata columns: ## seqnames ranges strand | exon_id exon_name ## <Rle> <IRanges> <Rle> | <integer> <character> ## [1] X [99883667, 99884983] - | 664095 ENSE00001459322 ## [2] X [99885756, 99885863] - | 664096 ENSE00000868868 ## [3] X [99887482, 99887565] - | 664097 ENSE00000401072 ## [4] X [99887538, 99887565] - | 664098 ENSE00001849132 ## [5] X [99888402, 99888536] - | 664099 ENSE00003554016 ## ... ... ... ... ... ... ... ## [13] X [99890555, 99890743] - | 664106 ENSE00003512331 ## [14] X [99891188, 99891686] - | 664108 ENSE00001886883 ## [15] X [99891605, 99891803] - | 664109 ENSE00001855382 ## [16] X [99891790, 99892101] - | 664110 ENSE00001863395 ## [17] X [99894942, 99894988] - | 664111 ENSE00001828996 ## ## --- ## seqlengths: ## 1 2 ... LRG_99 ## 249250621 243199373 ... 13294</code></pre> <pre class="r"><code>#rowData is indeed GRangesList class(rowData(se))</code></pre> <pre><code>## [1] "GRangesList" ## attr(,"package") ## [1] "GenomicRanges"</code></pre> <pre class="r"><code>#length gives the number of genes length(rowData(se))</code></pre> <pre><code>## [1] 63193</code></pre> <pre class="r"><code>#length of the first GRanges : gives the number of exons for the first gene. length(rowData(se)[[1]])</code></pre> <pre><code>## [1] 17</code></pre> <pre class="r"><code>head(rownames(se))</code></pre> <pre><code>## [1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457" ## [5] "ENSG00000000460" "ENSG00000000938"</code></pre> <pre class="r"><code>metadata(rowData(se))</code></pre> <pre><code>## $genomeInfo ## $genomeInfo$`Db type` ## [1] "TranscriptDb" ## ## $genomeInfo$`Supporting package` ## [1] "GenomicFeatures" ## ## $genomeInfo$`Data source` ## [1] "BioMart" ## ## $genomeInfo$Organism ## [1] "Homo sapiens" ## ## $genomeInfo$`Resource URL` ## [1] "www.biomart.org:80" ## ## $genomeInfo$`BioMart database` ## [1] "ensembl" ## ## $genomeInfo$`BioMart database version` ## [1] "ENSEMBL GENES 72 (SANGER UK)" ## ## $genomeInfo$`BioMart dataset` ## [1] "hsapiens_gene_ensembl" ## ## $genomeInfo$`BioMart dataset description` ## [1] "Homo sapiens genes (GRCh37.p11)" ## ## $genomeInfo$`BioMart dataset version` ## [1] "GRCh37.p11" ## ## $genomeInfo$`Full dataset` ## [1] "yes" ## ## $genomeInfo$`miRBase build ID` ## [1] NA ## ## $genomeInfo$transcript_nrow ## [1] "213140" ## ## $genomeInfo$exon_nrow ## [1] "737783" ## ## $genomeInfo$cds_nrow ## [1] "531154" ## ## $genomeInfo$`Db created by` ## [1] "GenomicFeatures package from Bioconductor" ## ## $genomeInfo$`Creation time` ## [1] "2013-07-30 17:30:25 +0200 (Tue, 30 Jul 2013)" ## ## $genomeInfo$`GenomicFeatures version at creation time` ## [1] "1.13.21" ## ## $genomeInfo$`RSQLite version at creation time` ## [1] "0.11.4" ## ## $genomeInfo$DBSCHEMAVERSION ## [1] "1.0"</code></pre> <pre class="r"><code>exptData(se)$MIAME</code></pre> <pre><code>## Experiment data ## Experimenter name: Felix Haglund ## Laboratory: Science for Life Laboratory Stockholm ## Contact information: Mikael Huss ## Title: DPN and Tamoxifen treatments of parathyroid adenoma cells ## URL: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37211 ## PMIDs: 23024189 ## ## Abstract: A 251 word abstract is available. Use 'abstract' method.</code></pre> <pre class="r"><code>abstract(exptData(se)$MIAME)</code></pre> <pre><code>## [1] "Primary hyperparathyroidism (PHPT) is most frequently present in postmenopausal women. Although the involvement of estrogen has been suggested, current literature indicates that parathyroid tumors are estrogen receptor (ER) alpha negative. Objective: The aim of the study was to evaluate the expression of ERs and their putative function in parathyroid tumors. Design: A panel of 37 parathyroid tumors was analyzed for expression and promoter methylation of the ESR1 and ESR2 genes as well as expression of the ERalpha and ERbeta1/ERbeta2 proteins. Transcriptome changes in primary cultures of parathyroid adenoma cells after treatment with the selective ERbeta1 agonist diarylpropionitrile (DPN) and 4-hydroxytamoxifen were identified using next-generation RNA sequencing. Results: Immunohistochemistry revealed very low expression of ERalpha, whereas all informative tumors expressed ERbeta1 (n = 35) and ERbeta2 (n = 34). Decreased nuclear staining intensity and mosaic pattern of positive and negative nuclei of ERbeta1 were significantly associated with larger tumor size. Tumor ESR2 levels were significantly higher in female vs. male cases. In cultured cells, significantly increased numbers of genes with modified expression were detected after 48 h, compared to 24-h treatments with DPN or 4-hydroxytamoxifen, including the parathyroid-related genes CASR, VDR, JUN, CALR, and ORAI2. Bioinformatic analysis of transcriptome changes after DPN treatment revealed significant enrichment in gene sets coupled to ER activation, and a highly significant similarity to tumor cells undergoing apoptosis. Conclusions: Parathyroid tumors express ERbeta1 and ERbeta2. Transcriptional changes after ERbeta1 activation and correlation to clinical features point to a role of estrogen signaling in parathyroid function and disease."</code></pre> </div> <div id="licence" class="section level2"> <h2>Licence</h2> <p><a href="https://www.sthda.com/english/english/wiki/mit-licence">Licence</a></p> </div> <div id="references" class="section level2"> <h2>References</h2> <p><a href="https://github.com/genomicsclass">https://github.com/genomicsclass</a></p> </div> <script>jQuery(document).ready(function () {jQuery(\'h1,h2,h3,h4\').addClass(\'formatter-title\');});//add phpboost class to header</script> <style>.content{padding:0px;}</style> </div><!--end rdoc--> <!--====================== stop here when you copy to sthda================--> <!-- END HTML --> 
<!-- END HTML -->]]></description>
			<pubDate>Tue, 16 Aug 2016 23:04:28 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[GRanges and GRangesList]]></title>
			<link>https://www.sthda.com/english/wiki/granges-and-grangeslist</link>
			<guid>https://www.sthda.com/english/wiki/granges-and-grangeslist</guid>
			<description><![CDATA[<!-- START HTML -->

           
  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#granges">GRanges</a></li>
<li><a href="#grangeslist">GRangesList</a></li>
<li><a href="#findoverlaps-and-over">findOverlaps and %over%</a></li>
<li><a href="#rle-and-views">Rle and Views</a></li>
<li><a href="#licence">Licence</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>

<pre class="warning"><code>This analysis was performed using R (ver. 3.1.0).</code></pre>
<div id="granges" class="section level2">
<h2>GRanges</h2>
<p>The GenomicRanges package, which is an extension of IRanges to the genomic space. <strong>GRanges</strong> object contains the <strong>sequence names</strong> (here chromosome Z), the <strong>Iranges</strong>, the <strong>strand</strong> information and the <strong>sequence lengths</strong>. If we print out the GRanges object, we see that we have two ranges, zero metadata columns. And it gives the sequence names as an rle which we’ll discuss later. It gives the IRanges, and the strand also as an rle, and the bottom it prints the sequence lengths. We’ve specified that chromosome z is 100 base pairs long.</p>
<pre class="r"><code>library(GenomicRanges)
gr <- GRanges(seqnames="chrZ", ranges=IRanges(start=c(5,10),end=c(35,45)),
              strand="+", seqlengths=c(chrZ=100L))
gr</code></pre>
<pre><code>## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
##   ---
##   seqlengths:
##    chrZ
##     100</code></pre>
<p>Like with IRanges, we can shift the GRanges, and it will move the starts and ends by 10 base pairs to the right. We can also shift by 80. But notice that if we shift by 80, these will go off the end of the chromosome. GenomicRanges package gives us an error that says that the ranges contain values outside of the sequence bounds. If we wrap this in a <strong>trim</strong> function it will make sure that these end at the chromosome end and then don’t go over it.</p>
<pre class="r"><code>shift(gr, 10)</code></pre>
<pre><code>## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [15, 45]      +
##   [2]     chrZ  [20, 55]      +
##   ---
##   seqlengths:
##    chrZ
##     100</code></pre>
<pre class="r"><code>shift(gr, 80)</code></pre>
<pre><code>## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ [85, 115]      +
##   [2]     chrZ [90, 125]      +
##   ---
##   seqlengths:
##    chrZ
##     100</code></pre>
<pre class="r"><code>trim(shift(gr, 80))</code></pre>
<pre><code>## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ [85, 100]      +
##   [2]     chrZ [90, 100]      +
##   ---
##   seqlengths:
##    chrZ
##     100</code></pre>
<p>The metadata columns we mentioned previously, can be accessed by using the function mcols for metadata columns. Before we have zero columns here. We can add columns by using mcols plus the dollar sign. Now we have an additional column which is a numeric and has two values.</p>
<pre class="r"><code>mcols(gr)</code></pre>
<pre><code>## DataFrame with 2 rows and 0 columns</code></pre>
<pre class="r"><code>mcols(gr)$value <- c(-1,4)
gr</code></pre>
<pre><code>## GRanges with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |     value
##          <Rle> <IRanges>  <Rle> | <numeric>
##   [1]     chrZ  [ 5, 35]      + |        -1
##   [2]     chrZ  [10, 45]      + |         4
##   ---
##   seqlengths:
##    chrZ
##     100</code></pre>
</div>
<div id="grangeslist" class="section level2">
<h2>GRangesList</h2>
<p>There’s an additional class in the GRanges package, which is called <strong>GRangesList</strong>. GRangesList is an object which groups GRanges together. The most obvious example of a GRangesList would be grouping exons by gene, or grouping exons by transcript.</p>
<pre class="r"><code>#Create a second GRanges
gr2 <- GRanges("chrZ",IRanges(11:13,51:53))
mcols(gr)$value <- NULL
#Create GRangesList : This object contains two GRanges
grl <- GRangesList(gr,gr2)
grl</code></pre>
<pre><code>## GRangesList of length 2:
## [[1]] 
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
## 
## [[2]] 
## GRanges with 3 ranges and 0 metadata columns:
##       seqnames   ranges strand
##   [1]     chrZ [11, 51]      *
##   [2]     chrZ [12, 52]      *
##   [3]     chrZ [13, 53]      *
## 
## ---
## seqlengths:
##  chrZ
##   100</code></pre>
<pre class="r"><code>#It&amp;#39;s a GRanges list of length two, 
#where the first GRanges has two ranges and the second GRanges has three ranges.
length(grl)</code></pre>
<pre><code>## [1] 2</code></pre>
<pre class="r"><code>#Ask for the first element
grl[[1]]</code></pre>
<pre><code>## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
##   ---
##   seqlengths:
##    chrZ
##     100</code></pre>
<pre class="r"><code>#If you specify metadata columns to the GRanges list,
#these will be assigned to each GRanges object in the list.
mcols(grl)$value <- c(5,7)
grl</code></pre>
<pre><code>## GRangesList of length 2:
## [[1]] 
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
## 
## [[2]] 
## GRanges with 3 ranges and 0 metadata columns:
##       seqnames   ranges strand
##   [1]     chrZ [11, 51]      *
##   [2]     chrZ [12, 52]      *
##   [3]     chrZ [13, 53]      *
## 
## ---
## seqlengths:
##  chrZ
##   100</code></pre>
<pre class="r"><code>mcols(grl)</code></pre>
<pre><code>## DataFrame with 2 rows and 1 column
##       value
##   <numeric>
## 1         5
## 2         7</code></pre>
</div>
<div id="findoverlaps-and-over" class="section level2">
<h2>findOverlaps and %over%</h2>
<p>Once we’ve created sets of GRanges or GRangesList objects, one common thing we might need to do is to find overlaps between objects. Let’s create two GRanges objects. The first one, will have five ranges. So 1 to 5, 11 to 15, 21 to 2. And the second object will have two ranges. Both GRanges objects are on the same sequence, chromosome z. We’ll use the <strong>findOverlaps</strong> function to find the overlaps. The first two arguments of this function, query and subject, are the most important. If you want to count overlaps, you can use the <strong>countOverlaps</strong> function.</p>
<pre class="r"><code>#Creating two GRanges objects
gr1 <- GRanges("chrZ",IRanges(c(1,11,21,31,41),width=5))
gr2 <- GRanges("chrZ",IRanges(c(19,33),c(38,35)))
gr1</code></pre>
<pre><code>## GRanges with 5 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 1,  5]      *
##   [2]     chrZ  [11, 15]      *
##   [3]     chrZ  [21, 25]      *
##   [4]     chrZ  [31, 35]      *
##   [5]     chrZ  [41, 45]      *
##   ---
##   seqlengths:
##    chrZ
##      NA</code></pre>
<pre class="r"><code>gr2</code></pre>
<pre><code>## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [19, 38]      *
##   [2]     chrZ  [33, 35]      *
##   ---
##   seqlengths:
##    chrZ
##      NA</code></pre>
<pre class="r"><code>#Find overlaps
fo <- findOverlaps(query=gr1, subject=gr2)
fo</code></pre>
<pre><code>## Hits of length 3
## queryLength: 5
## subjectLength: 2
##   queryHits subjectHits 
##    <integer>   <integer> 
##  1         3           1 
##  2         4           1 
##  3         4           2</code></pre>
<p>The output of the findOverlaps function is a hits object, which has length three, and this gives us the three different overlaps which occurred. The table here tells us that the third element of the query (gr1) intersected with the first element of the subject (gr2). These are given as integer vectors.</p>
<p>Another way to get the overlaps is to use the over function, %over%. It gives a logical vector. For the five ranges in gr1, it gives a logical vector describing which of these have any overlap with the ranges in the second, so gr2. If we use that as a subset, so a logical sub-setting, we returned those ranges in gr1, which had some overlap with gr2.</p>
<pre class="r"><code>queryHits(fo)</code></pre>
<pre><code>## [1] 3 4 4</code></pre>
<pre class="r"><code>subjectHits(fo)</code></pre>
<pre><code>## [1] 1 1 2</code></pre>
<pre class="r"><code>gr1 %over% gr2</code></pre>
<pre><code>## [1] FALSE FALSE  TRUE  TRUE FALSE</code></pre>
<pre class="r"><code>gr1[gr1 %over% gr2]</code></pre>
<pre><code>## GRanges with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [21, 25]      *
##   [2]     chrZ  [31, 35]      *
##   ---
##   seqlengths:
##    chrZ
##      NA</code></pre>
</div>
<div id="rle-and-views" class="section level2">
<h2>Rle and Views</h2>
<p>Rle is an object which is defined by IRanges. But also there’s a similar object in base r which is a run length encoding. The meaning of this is that if you have a vector, which repeats certain values, you can save memory. By instead of storing each value, you save the number, and then the number of repeats.</p>
<p>If we have such an rle object, and we want to peer into it in different regions, we can construct a views object. Views is a virtual class, which contains the subject, and then a number of views, which are essentially IRanges into that object. You can also use the views constructor for FASTA files, for example, if you want to look into genome sequence or other objects.</p>
<pre class="r"><code>r <- Rle(c(1,1,1,0,0,-2,-2,-2,rep(-1,20)))
r</code></pre>
<pre><code>## numeric-Rle of length 28 with 4 runs
##   Lengths:  3  2  3 20
##   Values :  1  0 -2 -1</code></pre>
<pre class="r"><code>#Structure of r
str(r)</code></pre>
<pre><code>## Formal class &amp;#39;Rle&amp;#39; [package "IRanges"] with 4 slots
##   ..@ values         : num [1:4] 1 0 -2 -1
##   ..@ lengths        : int [1:4] 3 2 3 20
##   ..@ elementMetadata: NULL
##   ..@ metadata       : list()</code></pre>
<pre class="r"><code>as.numeric(r)</code></pre>
<pre><code>##  [1]  1  1  1  0  0 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
## [24] -1 -1 -1 -1 -1</code></pre>
<pre class="r"><code>#creating 2 views of the rle
Views(r, start=c(4,2), end=c(7,6))</code></pre>
<pre><code>## Views on a 28-length Rle subject
## 
## views:
##     start end width
## [1]     4   7     4 [ 0  0 -2 -2]
## [2]     2   6     5 [ 1  1  0  0 -2]</code></pre>
</div>
<div id="licence" class="section level2">
<h2>Licence</h2>
<p><a href="https://www.sthda.com/english/english/wiki/mit-licence">Licence</a></p>
</div>
<div id="references" class="section level2">
<h2>References</h2>
<p><a href="https://github.com/genomicsclass">https://github.com/genomicsclass</a></p>
</div>

<script>jQuery(document).ready(function () {jQuery('h1,h2,h3,h4').addClass('formatter-title');});//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Thu, 25 Sep 2014 00:32:56 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Installing from github.com]]></title>
			<link>https://www.sthda.com/english/wiki/installing-from-github-com</link>
			<guid>https://www.sthda.com/english/wiki/installing-from-github-com</guid>
			<description><![CDATA[<!-- START HTML -->

 <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#installing-from-github.com">Installing from github.com</a></li>
<li><a href="#licence">Licence</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>

<pre class="warning"><code>This analysis was performed using R (ver. 3.1.0).</code></pre>
<div id="installing-from-github.com" class="section level2">
<h2>Installing from github.com</h2>
<pre class="r"><code>#install.packages("devtools")
#library("devtools")
#pack=rafalib, ririzarr=user
#install_github("rafalib", "ririzarr")
library(rafalib)
mypar</code></pre>
<pre><code>## function (a = 1, b = 1, brewer.n = 8, brewer.name = "Dark2", 
##     cex.lab = 2, cex.main = 2, cex.axis = 1.5, mar = c(5.1, 5.1, 
##         3.5, 2.1), mgp = c(3, 1, 0), ...) 
## {
##     require(RColorBrewer)
##     par(mar = mar, mgp = mgp, cex.lab = cex.lab, cex.main = cex.main, 
##         cex.axis = cex.axis)
##     par(mfrow = c(a, b), ...)
##     palette(brewer.pal(brewer.n, brewer.name))
## }
## <environment: namespace:rafalib></code></pre>
<pre class="r"><code>shist(rnorm(100))</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/installing-from-github-unnamed-chunk-1.png" title="plot of chunk unnamed-chunk-1" alt="plot of chunk unnamed-chunk-1" width="288" /></p>
</div>
<div id="licence" class="section level2">
<h2>Licence</h2>
<p><a href="https://www.sthda.com/english/english/wiki/mit-licence">Licence</a></p>
</div>
<div id="references" class="section level2">
<h2>References</h2>
<p><a href="https://github.com/genomicsclass">https://github.com/genomicsclass</a></p>
</div>

<script>jQuery(document).ready(function () {jQuery('h1,h2,h3,h4').addClass('formatter-title');});//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->


<!-- END HTML -->]]></description>
			<pubDate>Thu, 25 Sep 2014 00:29:38 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[IRanges]]></title>
			<link>https://www.sthda.com/english/wiki/iranges</link>
			<guid>https://www.sthda.com/english/wiki/iranges</guid>
			<description><![CDATA[<!-- START HTML -->
 <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#iranges">IRanges</a></li>
<li><a href="#licence">Licence</a></li>
<li><a href="#references">References</a></li>
</ul>
</div>
<br/>
<pre class="warning"><code>This analysis was performed using R (ver. 3.1.0).</code></pre>
<div id="iranges" class="section level2">
<h2>IRanges</h2>
<p>IRanges is a library for representing ranges of integers, which is useful in genomics, because we have base pair ranges that we'd like to manipulate.There's a very detailed vignette on IRanges. IRanges have a <strong>start</strong>, an <strong>end</strong> and a <strong>width</strong>.</p>
<pre class="r"><code>library(IRanges)

#IR with 6 base-pairs long : start and end are indicated
ir <- IRanges(5,10) 
ir</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     5  10     6</code></pre>
<pre class="r"><code>start(ir) #Return the start value</code></pre>
<pre><code>## [1] 5</code></pre>
<pre class="r"><code>end(ir) #Return the end value</code></pre>
<pre><code>## [1] 10</code></pre>
<pre class="r"><code>width(ir) #return the width</code></pre>
<pre><code>## [1] 6</code></pre>
<pre class="r"><code># ?IRanges</code></pre>
<p>If you specify that the start is 5, and the width should be 6, you'll get the identical IRange.</p>
<pre class="r"><code>#IR with 6 base-pairs long : start and width are indicated
ir <- IRanges(5,width=6)
ir</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     5  10     6</code></pre>
<p>You can also specify more than one range at a time. If you give three starts and three ends, you get a <strong>IRanges object</strong> of length 3. If you, again, ask for the starts, you'll get the three different starts, 3, 5, and 17.</p>
<pre class="r"><code># Miultiple ranges
ir <- IRanges(start=c(3,5,17), end=c(10,8,20))
ir</code></pre>
<pre><code>## IRanges of length 3
##     start end width
## [1]     3  10     8
## [2]     5   8     4
## [3]    17  20     4</code></pre>
<pre class="r"><code>length(ir)</code></pre>
<pre><code>## [1] 3</code></pre>
<pre class="r"><code>start(ir)</code></pre>
<pre><code>## [1]  3  5 17</code></pre>
<p>There are a number of intra range methods for IRanges. And intra-range means that the operation will occur for each range that you have. An example of this is to <strong>shift</strong> the IRange to the left by two. Before we had an IRange that started at 5 and ended at 10. Applying the shift operation produces an IRange which starts at 3 and ends at 8.</p>
<pre class="r"><code># ?"intra-range-methods"
ir <- IRanges(5,10)
shift(ir, -2)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     3   8     6</code></pre>
<pre class="success"><code>Remember, all of these commands can work on more than one range at once.</code></pre>
<p>Here we show the effects of the different methods using a single range:</p>
<pre class="r"><code>#Shift the IRange to the left by 2 base-pairs
shift(ir,-2)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     3   8     6</code></pre>
<pre class="r"><code>#narrow : Relative to the start, you should start this range at the second base pair.
narrow(ir, start=2)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     6  10     5</code></pre>
<pre class="r"><code>#narrow: Relative to 5, you should end on the fifth base pair, which means it should end at 9.
narrow(ir, end=5)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     5   9     5</code></pre>
<pre class="r"><code>#flank allows you to get flanking sequence
#here three base pairs from the start.
flank(ir, width=3, start=TRUE, both=FALSE)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     2   4     3</code></pre>
<pre class="r"><code>#flank: three base pairs from the end by specifying start equals false
flank(ir, width=3, start=FALSE, both=FALSE)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]    11  13     3</code></pre>
<pre class="r"><code>#bidirectional flanking sequence from the start by specifying both equals true.
flank(ir, width=3, start=TRUE, both=TRUE)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     2   7     6</code></pre>
<pre class="r"><code>ir * 2</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     6   8     3</code></pre>
<pre class="r"><code>ir + 2</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     3  12    10</code></pre>
<pre class="r"><code>ir - 2</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     7   8     2</code></pre>
<p>The exact same functions are shown graphically with the code below:</p>
<pre class="r"><code># set up a plotting window so we can look at range operations
plotir <- function(ir,i) { arrows(start(ir)-.5,i,end(ir)+.5,i,code=3,angle=90,lwd=3) }
plot(0,0,xlim=c(0,15),ylim=c(0,11),type="n",xlab="",ylab="",xaxt="n")
axis(1,0:15)
abline(v=0:30 + .5,col=rgb(0,0,0,.5))

# plot the original IRange
plotir(ir,1)

# draw a red shadow for the original IRange
polygon(c(start(ir)-.5,start(ir)-.5,end(ir)+.5,end(ir)+.5),c(-1,12,12,-1),col=rgb(1,0,0,.2),border=NA)
plotir(shift(ir,-2), 2)
plotir(narrow(ir, start=2), 3)
plotir(narrow(ir, end=5), 4)
plotir(flank(ir, width=3, start=TRUE, both=FALSE), 5)
plotir(flank(ir, width=3, start=FALSE, both=FALSE), 6)
plotir(flank(ir, width=3, start=TRUE, both=TRUE), 7)
plotir(ir * 2, 8)
plotir(ir + 2, 9)
plotir(ir - 2, 10)</code></pre>
<p><img src="https://www.sthda.com/english/sthda/RDoc/figure/genomics/iranges-irange-fig.png" title="plot of chunk irange-fig" alt="plot of chunk irange-fig" width="384" /></p>
<p>The inter-range methods are those functions which depend on the other ranges in the object. Let's create an IRanges object with three ranges, which starts at 3, 5, 17, ends at 10, 8, and 20.</p>
<pre class="r"><code># ?"inter-range-methods"
ir <- IRanges(start=c(3,5,17), end=c(10,8,20))
#range: returns the beginning of the IRanges to the end, including gaps in between.
range(ir)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]     3  20    18</code></pre>
<pre class="r"><code>#reduce, returns those base pairs which are covered by the original ranges.
#So we do not get the gap, so the end at 10 and the beginning at 17.
reduce(ir)</code></pre>
<pre><code>## IRanges of length 2
##     start end width
## [1]     3  10     8
## [2]    17  20     4</code></pre>
<pre class="r"><code>#gaps: returns a gap from 10 to 17.
gaps(ir)</code></pre>
<pre><code>## IRanges of length 1
##     start end width
## [1]    11  16     6</code></pre>
<pre class="r"><code>#disjoint: gives a set of ranges which has the same coverage as the original IRanges object,
#but they&amp;#39;re not overlapping in any way, and they also contain the union of all the endpoints of the original range.
disjoin(ir)</code></pre>
<pre><code>## IRanges of length 4
##     start end width
## [1]     3   4     2
## [2]     5   8     4
## [3]     9  10     2
## [4]    17  20     4</code></pre>
</div>
<div id="licence" class="section level2">
<h2>Licence</h2>
<p><a href="https://www.sthda.com/english/english/wiki/mit-licence">Licence</a></p>
</div>
<div id="references" class="section level2">
<h2>References</h2>
<p><a href="https://github.com/genomicsclass">https://github.com/genomicsclass</a></p>
</div>

<script>jQuery(document).ready(function () {jQuery('h1,h2,h3,h4').addClass('formatter-title');});//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->
<!-- END HTML -->]]></description>
			<pubDate>Thu, 25 Sep 2014 00:25:46 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Basic Bioconductor infrastructure]]></title>
			<link>https://www.sthda.com/english/wiki/basic-bioconductor-infrastructure</link>
			<guid>https://www.sthda.com/english/wiki/basic-bioconductor-infrastructure</guid>
			<description><![CDATA[-=-]]></description>
			<pubDate>Wed, 24 Sep 2014 23:39:29 +0200</pubDate>
			
		</item>
		
	</channel>
</rss>
