R programming skills

Resources
R refresher
Installing Bioconductor and finding help
Licence
References

This analysis was performed using R (ver. 3.1.0).

Resources

R refresher

Summarizing, matching and grouping dataset.

#load a data.frame
rats <- data.frame(id = paste0("rat",1:10),  
                   sex = factor(rep(c("female","male"),each=5)),
                   weight = c(2,4,1,11,18,12,7,12,19,20),
                   length = c(100,105,115,130,95,150,165,180,190,175))
rats

##       id    sex weight length
## 1   rat1 female      2    100
## 2   rat2 female      4    105
## 3   rat3 female      1    115
## 4   rat4 female     11    130
## 5   rat5 female     18     95
## 6   rat6   male     12    150
## 7   rat7   male      7    165
## 8   rat8   male     12    180
## 9   rat9   male     19    190
## 10 rat10   male     20    175

Data summaries: summary, str

The summary and str functions are two helpful functions for getting a sense of data. summary works on vectors or matrix-like objects (including data.frames). str works on an arbitrary R object and will compactly display the structure.

summary(rats) #Data frame summaries

##        id        sex        weight          length   
##  rat1   :1   female:5   Min.   : 1.00   Min.   : 95  
##  rat10  :1   male  :5   1st Qu.: 4.75   1st Qu.:108  
##  rat2   :1              Median :11.50   Median :140  
##  rat3   :1              Mean   :10.60   Mean   :140  
##  rat4   :1              3rd Qu.:16.50   3rd Qu.:172  
##  rat5   :1              Max.   :20.00   Max.   :190  
##  (Other):4

summary(rats$weight) #Numeric vector summaries

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    4.75   11.50   10.60   16.50   20.00

str(rats) #Display data structure

## 'data.frame':    10 obs. of  4 variables:
##  $ id    : Factor w/ 10 levels "rat1","rat10",..: 1 3 4 5 6 7 8 9 10 2
##  $ sex   : Factor w/ 2 levels "female","male": 1 1 1 1 1 2 2 2 2 2
##  $ weight: num  2 4 1 11 18 12 7 12 19 20
##  $ length: num  100 105 115 130 95 150 165 180 190 175

Aligning two objects: match, merge

We load another example data frame, with the original ID and another secretID. Suppose we want to sort the original data frame by the secretID.

ratsTable <- data.frame(id = paste0("rat",c(6,9,7,3,5,1,10,4,8,2)),
                        secretID = 1:10)
ratsTable

##       id secretID
## 1   rat6        1
## 2   rat9        2
## 3   rat7        3
## 4   rat3        4
## 5   rat5        5
## 6   rat1        6
## 7  rat10        7
## 8   rat4        8
## 9   rat8        9
## 10  rat2       10

match gives you, for each element in the first vector, the index of the first match in the second vector.

match(ratsTable$id, rats$id)

##  [1]  6  9  7  3  5  1 10  4  8  2

rats[match(ratsTable$id, rats$id),]

##       id    sex weight length
## 6   rat6   male     12    150
## 9   rat9   male     19    190
## 7   rat7   male      7    165
## 3   rat3 female      1    115
## 5   rat5 female     18     95
## 1   rat1 female      2    100
## 10 rat10   male     20    175
## 4   rat4 female     11    130
## 8   rat8   male     12    180
## 2   rat2 female      4    105

cbind(rats[match(ratsTable$id, rats$id),], ratsTable)

##       id    sex weight length    id secretID
## 6   rat6   male     12    150  rat6        1
## 9   rat9   male     19    190  rat9        2
## 7   rat7   male      7    165  rat7        3
## 3   rat3 female      1    115  rat3        4
## 5   rat5 female     18     95  rat5        5
## 1   rat1 female      2    100  rat1        6
## 10 rat10   male     20    175 rat10        7
## 4   rat4 female     11    130  rat4        8
## 8   rat8   male     12    180  rat8        9
## 2   rat2 female      4    105  rat2       10

Or you can use the merge function which will handle everything for you.

ratsMerged <- merge(rats, ratsTable, by.x="id", by.y="id")
ratsMerged[order(ratsMerged$secretID),]

##       id    sex weight length secretID
## 7   rat6   male     12    150        1
## 10  rat9   male     19    190        2
## 8   rat7   male      7    165        3
## 4   rat3 female      1    115        4
## 6   rat5 female     18     95        5
## 1   rat1 female      2    100        6
## 2  rat10   male     20    175        7
## 5   rat4 female     11    130        8
## 9   rat8   male     12    180        9
## 3   rat2 female      4    105       10

Analysis over groups: split, tapply, and dplyr libary

Suppose we need to calculate the average rat weight for each sex. We could start by splitting the weight vector into a list of weight vectors divided by sex. split is a useful function for breaking up a vector into groups defined by a second vector, typically a factor. We can then use the lapply function to calculate the average of each element of the list, which are vectors of weights.

sp <- split(rats$weight, rats$sex)
sp

## $female
## [1]  2  4  1 11 18
## 
## $male
## [1] 12  7 12 19 20

lapply(sp, mean)

## $female
## [1] 7.2
## 
## $male
## [1] 14

A shortcut for this is to use tapply and give the function which should run on each element of the list as a third argument:

tapply(rats$weight, rats$sex, mean)

## female   male 
##    7.2   14.0

R library “dplyr” can easily accomplish the same task as above. The “d” in the name is for data.frame, and the “ply” is because the library attempts to simplify tasks typically used by the set of functions: sapply, lapply, tapply, etc. Here is the same task as before done with the dplyr functions group_by and summarise:

library(dplyr)
sexes <- group_by(rats, sex)
summarise(sexes, ave=mean(weight))

## Source: local data frame [2 x 2]
## 
##      sex  ave
## 1 female  7.2
## 2   male 14.0

With dplyr, you can chain operations using the %.% operator:

rats %.% group_by(sex) %.% summarise(ave=mean(weight))

## Source: local data frame [2 x 2]
## 
##      sex  ave
## 1 female  7.2
## 2   male 14.0

Installing Bioconductor and finding help

Installing Bioconductor

In order to install Bioconductor, copy the following two lines into your R console. This installation will take several minutes.

source("http://bioconductor.org/biocLite.R")
biocLite()

To install a specific package from bioconductor, use the following code:

source("http://bioconductor.org/biocLite.R")
biocLite(c("genefilter","geneplotter"))

Finding help

Type a question mark ?, followed by the function name and hitting return.

?mean
example(mean)
mean #get the source code
class(6) #get the class of an R object

Vignettes

Vignettes are documents which accompany R packages and are required for every Bioconductor package. To browse the various vignettes of a package, use the following code :

browseVignettes(package="Biobase")

Licence

References

https://github.com/genomicsclass

Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!