Read FastQC data into R.

qc_read(file, modules = "all", verbose = TRUE)

Arguments

file
Path to the file to be imported. Can be the path to either :
  • the fastqc zipped file (e.g.: 'path/to/samplename_fastqc.zip'). No need to unzip,
  • or the unzipped folder name (e.g.: 'path/to/samplename_fastqc'),
  • or the sample name (e.g.: 'path/to/samplename' )
  • or the fastqc_data.txt file,
modules
Character vector containing the names of FastQC modules for which you want to import/inspect the data. Default is all. Allowed values include one or the combination of:
  • "Summary",
  • "Basic Statistics",
  • "Per base sequence quality",
  • "Per tile sequence quality",
  • "Per sequence quality scores",
  • "Per base sequence content",
  • "Per sequence GC content",
  • "Per base N content",
  • "Sequence Length Distribution",
  • "Sequence Duplication Levels",
  • "Overrepresented sequences",
  • "Adapter Content",
  • "Kmer Content"
Partial match of module names allowed. For example, you can use modules = "GC content", instead of the full names modules = "Per sequence GC content".
verbose
logical value. If TRUE, print filename when reading.

Value

Returns a list of tibbles containing the data for specified modules.

Examples

# Demo file qc.file <- system.file("fastqc_results", "S1_fastqc.zip", package = "fastqcr") qc.file
#> [1] "/Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results/S1_fastqc.zip"
# Read all modules qc_read(qc.file)
#> Reading: /Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results/S1_fastqc.zip
#> $summary #> # A tibble: 12 × 3 #> status module sample #> <chr> <chr> <chr> #> 1 PASS Basic Statistics S1.fastq #> 2 PASS Per base sequence quality S1.fastq #> 3 PASS Per tile sequence quality S1.fastq #> 4 PASS Per sequence quality scores S1.fastq #> 5 FAIL Per base sequence content S1.fastq #> 6 WARN Per sequence GC content S1.fastq #> 7 PASS Per base N content S1.fastq #> 8 WARN Sequence Length Distribution S1.fastq #> 9 PASS Sequence Duplication Levels S1.fastq #> 10 PASS Overrepresented sequences S1.fastq #> 11 PASS Adapter Content S1.fastq #> 12 PASS Kmer Content S1.fastq #> #> $basic_statistics #> # A tibble: 7 × 2 #> Measure Value #> <chr> <chr> #> 1 Filename S1.fastq #> 2 File type Conventional base calls #> 3 Encoding Sanger / Illumina 1.9 #> 4 Total Sequences 50299587 #> 5 Sequences flagged as poor quality 0 #> 6 Sequence length 35-76 #> 7 %GC 48 #> #> $per_base_sequence_quality #> # A tibble: 43 × 7 #> Base Mean Median `Lower Quartile` `Upper Quartile` `10th Percentile` #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 31.24760 32 32 32 32 #> 2 2 31.54951 32 32 32 32 #> 3 3 31.65344 32 32 32 32 #> 4 4 31.68952 32 32 32 32 #> 5 5 31.71131 32 32 32 32 #> 6 6 35.33942 36 36 36 36 #> 7 7 35.32502 36 36 36 36 #> 8 8 35.31620 36 36 36 36 #> 9 9 35.32692 36 36 36 36 #> 10 10-11 35.33064 36 36 36 36 #> # ... with 33 more rows, and 1 more variables: `90th Percentile` <dbl> #> #> $per_tile_sequence_quality #> # A tibble: 18,576 × 3 #> Tile Base Mean #> <int> <chr> <dbl> #> 1 11101 1 0.17529405 #> 2 11101 2 0.04780781 #> 3 11101 3 0.06683322 #> 4 11101 4 0.05580719 #> 5 11101 5 0.04848320 #> 6 11101 6 0.01943990 #> 7 11101 7 0.10426096 #> 8 11101 8 0.06294413 #> 9 11101 9 0.10283679 #> 10 11101 10-11 0.05799534 #> # ... with 18,566 more rows #> #> $per_sequence_quality_scores #> # A tibble: 34 × 2 #> Quality Count #> <int> <dbl> #> 1 2 75 #> 2 3 0 #> 3 4 0 #> 4 5 0 #> 5 6 0 #> 6 7 0 #> 7 8 0 #> 8 9 0 #> 9 10 0 #> 10 11 0 #> # ... with 24 more rows #> #> $per_base_sequence_content #> # A tibble: 43 × 5 #> Base G A T C #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 1 24.10465 27.37697 24.52457 23.99380 #> 2 2 23.51217 27.18070 25.48813 23.81900 #> 3 3 23.19071 25.81301 26.28049 24.71579 #> 4 4 23.53870 25.94238 26.18950 24.32942 #> 5 5 23.68119 26.31318 26.07065 23.93498 #> 6 6 24.28040 25.42350 25.57356 24.72254 #> 7 7 24.12291 25.69376 26.07771 24.10562 #> 8 8 23.51441 25.79722 26.23024 24.45814 #> 9 9 23.51723 25.57078 26.40905 24.50294 #> 10 10-11 23.57537 25.86219 26.42284 24.13961 #> # ... with 33 more rows #> #> $per_sequence_gc_content #> # A tibble: 101 × 2 #> `GC Content` Count #> <int> <dbl> #> 1 0 81.0 #> 2 1 44.0 #> 3 2 14.0 #> 4 3 39.5 #> 5 4 58.0 #> 6 5 78.5 #> 7 6 143.0 #> 8 7 264.5 #> 9 8 342.5 #> 10 9 427.5 #> # ... with 91 more rows #> #> $per_base_n_content #> # A tibble: 43 × 2 #> Base `N-Count` #> <chr> <dbl> #> 1 1 0.0634418728 #> 2 2 0.0003101417 #> 3 3 0.0002703800 #> 4 4 0.0001530828 #> 5 5 0.0001491066 #> 6 6 0.0093758225 #> 7 7 0.0025586691 #> 8 8 0.0002604395 #> 9 9 0.0002823085 #> 10 10-11 0.0006043787 #> # ... with 33 more rows #> #> $sequence_length_distribution #> # A tibble: 42 × 2 #> Length Count #> <int> <dbl> #> 1 35 1282 #> 2 36 144 #> 3 37 160 #> 4 38 172 #> 5 39 177 #> 6 40 164 #> 7 41 174 #> 8 42 183 #> 9 43 167 #> 10 44 198 #> # ... with 32 more rows #> #> $sequence_duplication_levels #> # A tibble: 16 × 3 #> `Duplication Level` `Percentage of deduplicated` `Percentage of total` #> <chr> <dbl> <dbl> #> 1 1 8.383533e+01 69.383491164 #> 2 2 1.272876e+01 21.069059100 #> 3 3 2.628524e+00 6.526228605 #> 4 4 5.907985e-01 1.955817988 #> 5 5 1.519676e-01 0.628854184 #> 6 6 4.003142e-02 0.198783920 #> 7 7 1.283672e-02 0.074367175 #> 8 8 5.317799e-03 0.035208779 #> 9 9 2.433420e-03 0.018125443 #> 10 >10 3.934207e-03 0.041520960 #> 11 >50 2.980446e-05 0.002177839 #> 12 >100 2.469404e-05 0.005240829 #> 13 >500 2.998461e-06 0.002024965 #> 14 >1k 2.662684e-06 0.002604751 #> 15 >5k 0.000000e+00 0.000000000 #> 16 >10k+ 2.412492e-06 0.056494299 #> #> $overrepresented_sequences #> # A tibble: 0 × 0 #> #> $adapter_content #> # A tibble: 64 × 5 #> Position `Illumina Universal Adapter` `Illumina Small RNA Adapter` #> <int> <dbl> <dbl> #> 1 1 9.940439e-06 1.988088e-06 #> 2 2 1.192853e-05 1.988088e-06 #> 3 3 1.789279e-05 3.976176e-06 #> 4 4 2.584514e-05 3.976176e-06 #> 5 5 3.379749e-05 3.976176e-06 #> 6 6 3.777367e-05 3.976176e-06 #> 7 7 4.373793e-05 3.976176e-06 #> 8 8 5.367837e-05 3.976176e-06 #> 9 9 5.566646e-05 3.976176e-06 #> 10 10 5.566646e-05 3.976176e-06 #> # ... with 54 more rows, and 2 more variables: `Nextera Transposase #> # Sequence` <dbl>, `SOLID Small RNA Adapter` <dbl> #> #> $kmer_content #> # A tibble: 0 × 0 #> #> $total_deduplicated_percentage #> [1] 82.76 #> #> attr(,"class") #> [1] "list" "qc_read"
# Read a specified module qc_read(qc.file,"Per base sequence quality")
#> Reading: /Users/kassambara/Documents/R/MyPackages/fastqcr/inst/fastqc_results/S1_fastqc.zip
#> $per_base_sequence_quality #> # A tibble: 43 × 7 #> Base Mean Median `Lower Quartile` `Upper Quartile` `10th Percentile` #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 31.24760 32 32 32 32 #> 2 2 31.54951 32 32 32 32 #> 3 3 31.65344 32 32 32 32 #> 4 4 31.68952 32 32 32 32 #> 5 5 31.71131 32 32 32 32 #> 6 6 35.33942 36 36 36 36 #> 7 7 35.32502 36 36 36 36 #> 8 8 35.31620 36 36 36 36 #> 9 9 35.32692 36 36 36 36 #> 10 10-11 35.33064 36 36 36 36 #> # ... with 33 more rows, and 1 more variables: `90th Percentile` <dbl> #> #> attr(,"class") #> [1] "list" "qc_read"