Articles - Bioinformatics

Strategies for Analyzing Bisulfite Sequencing Data

  |   698  |  Post a comment  |  Bioinformatics  |  Genomics, DNA methylation, NGS
DNA methylation, one of the main epigenetic modifications in the eukaryotic genome, has been shown to play a role in cell-type specific regulation of gene expression, and therefore cell-type identity.

Bisulfite sequencing is the gold-standard for measuring methylation over the genomes of interest, because it provides global coverage at single-base resolution.

The current article describes strategies for analyzing high-throughput bisulfite sequencing. The following steps are described:

- Short-read alignment techniques,
- pre/post-alignment quality check methods to ensure data quality,
- Subsequent analysis steps after alignment,
- Differential methylation methods
- Methylomes segmentation for identifying regulatory regions
- Annotation methods for further classification of regions returned by segmentation and differential methylation methods.

Finally, the article describes software packages and online workflow to efficiently handle large bisulfite sequencing datasets.

The content is organized as follow:

- Introduction to high-throughput sequencing techniques based on bisulfite treatment
- Algorithms and tools for detecting differential methylation and methylation profile segmentation.
- Management of large datasets and data analysis workflows with a guided user interface.


Bisulfite sequencing for detection of methylation and other base modifications




Whole genome bisulfite sequencing (WGBS) combines bisulfite conversion of DNA molecules with high-throughput sequencing.

The procedure can be summarized as follow is as follow:

- Random fragmentation of genomic DNA to the desired size (200pb)

- Conversion of the fragmented DNA into sequencing library by ligation to adaptors that contain 5mCs.

- Bisulfite treatment. This treatment effectively converts unmethylated cytosines to uracil while methylated cytosines remain protected.

- PCR amplification of the library (After the PCR, uracils will be represented as thymines)

- High-throughput sequencing.


Despite its advantages, WGBS remains the most expensive technique and standard library prep requires relatively large quantities of DNA (100ng–5 ug); as such, it is usually not applied to large numbers of samples. To achieve high sensitivity in detecting methylation differences between samples, high sequencing depth is required which leads to significant increase in sequencing cost.


Reduced representation bisulfite sequencing (RRBS) is another technique that can also profile DNA methylation at single-base resolution.It combines digestion of genomic DNA with restriction enzymes (MspI) and sequencing with bisulfite treatment in order to enrich for areas with high CpG content.

RRBS can sequence only CpG dense regions and doesn’t interrogate CpG-deficient regions such as functional enhancers, intronic regions, intergenic regions or in general lowly methylated regions (LMRs) of the genome. It has limited coverage of the genome in CpG-poor regions and examines about 4% to 17% of the approximately 28 million CpG dinucleotides distributed throughout the human genome depending on the sequencing depth and which variant of RRBS.


Workflow for analysis of DNA methylation using data from bisulfite sequencing experiments






Read more: Katarzyna Wreczycka et al., Strategies for analyzing bisulfite sequencing data, http://www.biorxiv.org/content/early/2017/08/09/109512