Automation tool for high-throughput sequencing DNA methylation data
Project description
MethSeq - Automation tool for high-throughput sequencing DNA methylation data
Supported sequencing protocols for identification of DNA methylation
- WGBS
- Pico-Methyl
- EMSeq
Reduced-representation bisulfite sequencing (RRBS) is not supported!
Required software
- Python 3
- Cromwell workflow management system running in server mode
- Docker (optional but highly recommended to improve reproducibility)
Required data
- Paired-end raw sequencing FASTQ files (one file per strand)
- Indexed reference genome using Bismark 0.22.2 and Bowtie 2.3.5.1
Used software via Docker container images
- TrimGalore version 0.6.5
- Bismark 0.22.2
- Bowtie 2.3.5.1
- FastQC
Your compute environment may not support Docker. Install those software locally and overwrite their paths using MethSeq command-line parameters.
Main data processing tasks
- Trim and filter raw sequencing reads (TrimGalore!)
- Align filtered reads to methylation-aware indexed reference genome (Bismark, Bowtie 2)
- Deduplicate aligned reads to remove PCR-bias (Bismark)
- Extract DNA methylation of CpG context genome-wide (Bismark)
Quality control and report tasks
- Quality assessment of filtered sequencing reads (FastQC)
- Overall report (Bismark)
Expected result files for each sample, independently of sequencing protocol
- cpg_report Genome-wide DNA methylation in CpG context, contains strand information
- cov Coverage file, doesn't contain strand information (only CpG with coverage)
- bedgraph BedGraph file
- mbias_png_1 M-Bias plot of forward strand (R1)
- mbias_png_2 M-Bias Plot of reverse strand (R2, except PicoMethyl)
- trim_stats_1 Trimming statistics of forward strand (R1)
- trim_stats_2 Trimming statistics of reverse strand (R2)
- align_stats Alignment stats
- nucleotide_coverage
- deduplicate_stats Deduplication statistics
- mbias_stats M-Bias statistics (used to generate M-Bias plots)
- splitting_stats
- trim_qc_report_1 QC report of filtered reads of forward strand (R1)
- trim_qc_report_zip_1 Zipped file containing QC statistics forward strand (R1)
- trim_qc_report_2 QC report of filtered reads of reverse strand (R2)
- trim_qc_report_zip_2 Zipped file containing QC statistics reverse strand (R2)
- unmapped_qc_report_1 QC report of unmapped reads of forward strand (R1)
- unmapped_qc_report_zip_1 Zipped file containing QC statistics forward strand (R1)
- unmapped_qc_report_2 QC report of unmapped reads of reverse strand (R2, except PicoMethyl)
- unmapped_qc_report_zip_2 Zipped file containing QC statistics reverse strand (R2, except PicoMethyl)
- report Overall report
How to use
methseq accel|pico|emseq --fastq /path/to/fastqs [--fastq /path/to/other_fastqs] [trimming parameters] result_dir
By default, trimming step will only trim (Illumina) adapter sequences. Use the following trimming parameters to cut sequences after adapter removal. It is useful to remove methylation bias saw in M-Bias plot.
--five_prime_clip_R1 N
removeN
bases from the beginning (5') of forward reads (R1)--three_prime_clip_R1 N
removeN
bases from the end (3') of forward reads (R1)--five_prime_clip_R1 N
removeN
bases from the beginning (5') of reverse reads (R2)--three_prime_clip_R1 N
removeN
bases from the end (3') of reverse reads (R2)
By default, trimming step will trim bases at the end (3') that have PHREAD score lower than 20.
Use --quality N
to change quality cutoff to N
.
By default, trimming step will filter trimmed reads that are smaller than 20 bp.
Use --length N
to change read length cutoff to N
.
MethSeq will
- Check if all samples are paired FASTQ files
- Check if indexed reference genome files exists
- Generate inputs JSON file
- Write workflow (WDL) and JSON files to result directory
- Submit WDL and JSON files to Cromwell server through its API
- Wait until workflow execution is completed
- If success, collect workflow output files copping (or moving) them to result directory
Extra: run MultiQC on
result_dir
folder!
Use cases
To process WGBS samples with trimming parameters to remove methylation bias saw in M-Bias plot.
methseq wgbs \
--fastq /path/to/wgbs_fastqs \
--five_prime_clip_R1 16 \
--three_prime_clip_R1 16 \
--five_prime_clip_R2 16 \
--three_prime_clip_R2 16 \
/path/to/wgbs_result
To process a single sample (EMSeq) passing path to specific file. It is useful to run MethSeq on files that are not in the same directory.
methseq emseq \
--fastq_1 /path/to/emseq_1.fastq.gz \
--fastq_2 /path/to/emseq_2.fastq.gz \
/path/to/wgbs_result
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file methseq-1.0.1.tar.gz
.
File metadata
- Download URL: methseq-1.0.1.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1dcbcf67da76319cdfa788abc76b803e14e6ee2fa2f4e29e910642ff29d8f53 |
|
MD5 | c20b74a53aa3f4e9ad993bdb19727a7c |
|
BLAKE2b-256 | 0b2e3074c93a0031ef86404fb7d2e9103d179195f88b510da6134e3f98b5570e |
File details
Details for the file methseq-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: methseq-1.0.1-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0473616d5ef1f33f22d5a8a28930ef3bf7db4e4db77f7a3761ecfdd4ee18469 |
|
MD5 | e278c79a0307ef57e325be481a9dc3f6 |
|
BLAKE2b-256 | db6a53a403c2debd0cde32eb10d640d64846e92850dc7258b696460e40d565eb |