Aggregate multiple FastQC reports into a data frame.

qc_aggregate(qc.dir = ".", progressbar = TRUE)

# S3 method for qc_aggregate
summary(object, ...)

qc_stats(object)

Arguments

qc.dir
path to the FastQC result directory to scan.
progressbar
logical value. If TRUE, shows a progress bar.
object
an object of class qc_aggregate.
...
other arguments.

Value

  • qc_aggregate() returns an object of class qc_aggregate which is a (tibble) data frame with the following column names:
    • sample: sample names
    • module: fastqc modules
    • status: fastqc module status for each sample
    • tot.seq: total sequences (i.e.: the number of reads)
    • seq.length: sequence length
    • pct.gc: % of GC content
    • pct.dup: % of duplicate reads

  • summary: Generates a summary of qc_aggregate. Returns a data frame with the following columns:
    • module: fastqc modules
    • nb_samples: the number of samples tested
    • nb_pass, nb_fail, nb_warn: the number of samples that passed, failed and warned, respectively.
    • failed, warned: the name of samples that failed and warned, respectively.

  • qc_stats: returns a data frame containing general statistics of fastqc reports. columns are: sample, pct.dup, pct.gc, tot.seq and seq.length.

Functions

  • qc_aggregate: Aggregate FastQC Reports for Multiple Samples

  • qc_stats: Creates general statistics of fastqc reports.

Examples

# Demo QC dir qc.dir <- system.file("fastqc_results", package = "fastqcr") qc.dir
#> [1] "/Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results"
# List of files in the directory list.files(qc.dir)
#> [1] "S1_fastqc.zip" "S2_fastqc.zip" "S3_fastqc.zip" "S4_fastqc.zip" #> [5] "S5_fastqc.zip"
# Aggregate the report qc <- qc_aggregate(qc.dir, progressbar = FALSE) qc
#> # A tibble: 60 × 7 #> sample module status tot.seq seq.length pct.gc #> * <chr> <chr> <chr> <chr> <chr> <dbl> #> 1 S1 Basic Statistics PASS 50299587 35-76 48 #> 2 S1 Per base sequence quality PASS 50299587 35-76 48 #> 3 S1 Per tile sequence quality PASS 50299587 35-76 48 #> 4 S1 Per sequence quality scores PASS 50299587 35-76 48 #> 5 S1 Per base sequence content FAIL 50299587 35-76 48 #> 6 S1 Per sequence GC content WARN 50299587 35-76 48 #> 7 S1 Per base N content PASS 50299587 35-76 48 #> 8 S1 Sequence Length Distribution WARN 50299587 35-76 48 #> 9 S1 Sequence Duplication Levels PASS 50299587 35-76 48 #> 10 S1 Overrepresented sequences PASS 50299587 35-76 48 #> # ... with 50 more rows, and 1 more variables: pct.dup <dbl>
# Generates a summary of qc_aggregate summary(qc)
#> Source: local data frame [12 x 7] #> Groups: module [?] #> #> module nb_samples nb_fail nb_pass nb_warn #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 Adapter Content 5 0 5 0 #> 2 Basic Statistics 5 0 5 0 #> 3 Kmer Content 5 0 5 0 #> 4 Overrepresented sequences 5 0 5 0 #> 5 Per base N content 5 0 5 0 #> 6 Per base sequence content 5 5 0 0 #> 7 Per base sequence quality 5 0 5 0 #> 8 Per sequence GC content 5 2 0 3 #> 9 Per sequence quality scores 5 0 5 0 #> 10 Per tile sequence quality 5 0 5 0 #> 11 Sequence Duplication Levels 5 0 5 0 #> 12 Sequence Length Distribution 5 0 0 5 #> # ... with 2 more variables: failed <chr>, warned <chr>
# General statistics of fastqc reports. qc_stats(qc)
#> # A tibble: 5 × 5 #> sample pct.dup pct.gc tot.seq seq.length #> <chr> <dbl> <dbl> <chr> <chr> #> 1 S1 17.24 48 50299587 35-76 #> 2 S2 15.70 48 50299587 35-76 #> 3 S3 22.14 49 67255341 35-76 #> 4 S4 19.89 49 67255341 35-76 #> 5 S5 18.15 48 65011962 35-76