Skip to main content

Automation tool for high-throughput sequencing DNA methylation data

Project description

MethSeq - Automation tool for high-throughput sequencing DNA methylation data

Supported sequencing protocols for identification of DNA methylation

  • WGBS
  • Pico-Methyl
  • EMSeq

Reduced-representation bisulfite sequencing (RRBS) is not supported!

Required software

  • Python 3
  • Cromwell workflow management system running in server mode
  • Docker (optional but highly recommended to improve reproducibility)

Required data

  • Paired-end raw sequencing FASTQ files (one file per strand)
  • Indexed reference genome using Bismark 0.22.2 and Bowtie 2.3.5.1

Used software via Docker container images

  • TrimGalore version 0.6.5
  • Bismark 0.22.2
  • Bowtie 2.3.5.1
  • FastQC

Your compute environment may not support Docker. Install those software locally and overwrite their paths using MethSeq command-line parameters.

Main data processing tasks

  • Trim and filter raw sequencing reads (TrimGalore!)
  • Align filtered reads to methylation-aware indexed reference genome (Bismark, Bowtie 2)
  • Deduplicate aligned reads to remove PCR-bias (Bismark)
  • Extract DNA methylation of CpG context genome-wide (Bismark)

Quality control and report tasks

  • Quality assessment of filtered sequencing reads (FastQC)
  • Overall report (Bismark)

Expected result files for each sample, independently of sequencing protocol

  • cpg_report Genome-wide DNA methylation in CpG context, contains strand information
  • cov Coverage file, doesn't contain strand information (only CpG with coverage)
  • bedgraph BedGraph file
  • mbias_png_1 M-Bias plot of forward strand (R1)
  • mbias_png_2 M-Bias Plot of reverse strand (R2, except PicoMethyl)
  • trim_stats_1 Trimming statistics of forward strand (R1)
  • trim_stats_2 Trimming statistics of reverse strand (R2)
  • align_stats Alignment stats
  • nucleotide_coverage
  • deduplicate_stats Deduplication statistics
  • mbias_stats M-Bias statistics (used to generate M-Bias plots)
  • splitting_stats
  • trim_qc_report_1 QC report of filtered reads of forward strand (R1)
  • trim_qc_report_zip_1 Zipped file containing QC statistics forward strand (R1)
  • trim_qc_report_2 QC report of filtered reads of reverse strand (R2)
  • trim_qc_report_zip_2 Zipped file containing QC statistics reverse strand (R2)
  • unmapped_qc_report_1 QC report of unmapped reads of forward strand (R1)
  • unmapped_qc_report_zip_1 Zipped file containing QC statistics forward strand (R1)
  • unmapped_qc_report_2 QC report of unmapped reads of reverse strand (R2, except PicoMethyl)
  • unmapped_qc_report_zip_2 Zipped file containing QC statistics reverse strand (R2, except PicoMethyl)
  • report Overall report

How to use

methseq accel|pico|emseq --fastq /path/to/fastqs [--fastq /path/to/other_fastqs] [trimming parameters] result_dir

By default, trimming step will only trim (Illumina) adapter sequences. Use the following trimming parameters to cut sequences after adapter removal. It is useful to remove methylation bias saw in M-Bias plot.

  • --five_prime_clip_R1 N remove N bases from the beginning (5') of forward reads (R1)
  • --three_prime_clip_R1 N remove N bases from the end (3') of forward reads (R1)
  • --five_prime_clip_R1 N remove N bases from the beginning (5') of reverse reads (R2)
  • --three_prime_clip_R1 N remove N bases from the end (3') of reverse reads (R2)

By default, trimming step will trim bases at the end (3') that have PHREAD score lower than 20. Use --quality N to change quality cutoff to N.

By default, trimming step will filter trimmed reads that are smaller than 20 bp. Use --length N to change read length cutoff to N.

MethSeq will

  1. Check if all samples are paired FASTQ files
  2. Check if indexed reference genome files exists
  3. Generate inputs JSON file
  4. Write workflow (WDL) and JSON files to result directory
  5. Submit WDL and JSON files to Cromwell server through its API
  6. Wait until workflow execution is completed
  7. If success, collect workflow output files copping (or moving) them to result directory

Extra: run MultiQC on result_dir folder!

Use cases

To process WGBS samples with trimming parameters to remove methylation bias saw in M-Bias plot.

methseq wgbs \
    --fastq /path/to/wgbs_fastqs \
    --five_prime_clip_R1 16 \
    --three_prime_clip_R1 16 \
    --five_prime_clip_R2 16 \
    --three_prime_clip_R2 16 \
    /path/to/wgbs_result

To process a single sample (EMSeq) passing path to specific file. It is useful to run MethSeq on files that are not in the same directory.

methseq emseq \
    --fastq_1 /path/to/emseq_1.fastq.gz \
    --fastq_2 /path/to/emseq_2.fastq.gz \
    /path/to/wgbs_result

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

methseq-1.0.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

methseq-1.0.1-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file methseq-1.0.1.tar.gz.

File metadata

  • Download URL: methseq-1.0.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.4

File hashes

Hashes for methseq-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d1dcbcf67da76319cdfa788abc76b803e14e6ee2fa2f4e29e910642ff29d8f53
MD5 c20b74a53aa3f4e9ad993bdb19727a7c
BLAKE2b-256 0b2e3074c93a0031ef86404fb7d2e9103d179195f88b510da6134e3f98b5570e

See more details on using hashes here.

File details

Details for the file methseq-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: methseq-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.4

File hashes

Hashes for methseq-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c0473616d5ef1f33f22d5a8a28930ef3bf7db4e4db77f7a3761ecfdd4ee18469
MD5 e278c79a0307ef57e325be481a9dc3f6
BLAKE2b-256 db6a53a403c2debd0cde32eb10d640d64846e92850dc7258b696460e40d565eb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page