Python package designed to estimate sequencing saturation for reduced-representation bisulfite sequencing (RRBS) data.
Project description
🧬 methurator
Methurator is a Python package designed to estimate sequencing saturation for reduced-representation bisulfite sequencing (RRBS) data.
Although optimized for RRBS, methurator can also be used for whole-genome bisulfite sequencing (WGBS) or other genome-wide methylation data (e.g. EMseq). However, this data we advise you to use Preseq package.
📑 Table of Contents
- 1. Dependencies and Notes
- 2. Pip installation
- 3. Quick Start
- 4. Command Reference
- 5. Example Workflow
- 6. How do we compute the sequencing saturation?
1. Dependencies and Notes
- methurator uses SAMtools and MethylDackel internally for BAM subsampling, thus they need to be installed.
- When
--genomeis provided, the corresponding FASTA file will be automatically fetched and cached. - Temporary intermediate files are deleted by default unless
--keep-temporary-filesis specified.
2. Pip installation
pip install methurator
3. Quick Start
Step 1 — Downsample BAM files
The downsample command performs BAM downsampling according to the specified percentages and coverage.
methurator downsample --genome hg19 --bam test_data/SRX1631721.markdup.sorted.csorted.bam
This command generates two summary files:
- CpG summary — number of unique CpGs detected in each downsampled BAM
- Reads summary — number of reads in each downsampled BAM
Example outputs can be found in tests/data.
Step 2 — Plot the sequencing saturation curve
Use the plot command to visualize sequencing saturation:
methurator plot \
--cpgs_file tests/data/cpgs_summary.csv \
--reads_file tests/data/reads_summary.csv
4. Command Reference
downsample command
| Argument | Description | Default |
|---|---|---|
--bam |
Path to a single .bam file. |
— |
--bamdir |
Directory containing multiple BAM files. | — |
--outdir |
Output directory. | ./output |
--fasta |
Path to the reference genome FASTA file. If not provided, it will be automatically downloaded based on --genome. |
— |
--genome |
Genome used for alignment. Available: hg19, hg38, GRCh37, GRCh38, mm10, mm39. |
— |
--downsampling-percentages, -ds |
Comma-separated list of downsampling percentages between 0 and 1 (exclusive). | 0.1,0.25,0.5,0.75 |
--minimum-coverage |
Minimum CpG coverage to consider for saturation. Can be a single integer or a list (e.g. 1,3,5). |
3 |
--keep-temporary-files |
If set, temporary files will be kept after analysis. | False |
plot command
| Argument | Description | Default |
|---|---|---|
--cpgs_file |
Path to the CpG coverage summary file. | |
--reads_file |
Path to the reads coverage summary file. | |
--outdir |
Output directory. | ./output |
5. Example Workflow
# Step 1: Downsample BAM file
methurator downsample --genome hg19 --bam my_sample.bam
# Step 2: Plot saturation curve
methurator plot \
--cpgs_file output/cpgs_summary.csv \
--reads_file output/reads_summary.csv
6. How do we compute the sequencing saturation?
To calculate the sequencing saturation of an RRBS sample, we adopt the following strategy. For each sample, we downsample it according to 4 different percentages (default: 0.1,0.25,0.5,0.75). Then, we compute the number of unique CpGs covered by at least 3 reads and the number of reads at each downsampling percentage.
We then fit the following curve using the scipy.optimize.curve_fit function:
$$ y = \beta_0 \cdot \arctan(\beta_1 \cdot x) $$
We chose the arctangent function because it exhibits an asymptotic growth similar to sequencing saturation. For large values of $x$ (as $x \to \infty$), the asymptote corresponds to the theoretical maximum number of unique CpGs covered by at least 3 reads and can be computed as:
$$ \text{asymptote} = \beta_0 \cdot \frac{\pi}{2} $$
Finally, the sequencing saturation value can be calculated as following:
$$ \text{Saturation} = \frac{\text{Number of unique CpGs (≥3 counts)}}{\text{Asymptote}} $$
This approach allows estimation of the theoretical maximum number of CpGs that can be detected given an infinite sequencing depth, and quantifies how close the sample is to reaching sequencing saturation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file methurator-0.1.5.tar.gz.
File metadata
- Download URL: methurator-0.1.5.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b7a443312e5d4db2128ab986e16b729debf53f7b58706c19a753fedbafda686
|
|
| MD5 |
62df4badbb89c9cdb9753dd11ec2885d
|
|
| BLAKE2b-256 |
5af9d442fa0d3c4121948c26c71f0ed35b3db783f5b00ac6ae808ba12f09f705
|
Provenance
The following attestation bundles were made for methurator-0.1.5.tar.gz:
Publisher:
publish.yml on VIBTOBIlab/methurator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
methurator-0.1.5.tar.gz -
Subject digest:
7b7a443312e5d4db2128ab986e16b729debf53f7b58706c19a753fedbafda686 - Sigstore transparency entry: 718892667
- Sigstore integration time:
-
Permalink:
VIBTOBIlab/methurator@7a47929b80b53cda41a94026b7b2040285db75e7 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/VIBTOBIlab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7a47929b80b53cda41a94026b7b2040285db75e7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file methurator-0.1.5-py3-none-any.whl.
File metadata
- Download URL: methurator-0.1.5-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
640495d904e9c033ac74db72d602581bbf99762ceabb8b23dc2d2b1373d7eee7
|
|
| MD5 |
a83148ff0431c4be09f864f11ffc0617
|
|
| BLAKE2b-256 |
56d200722ff4999ecc56945caa9645340895d3539f4e3f91867d8b5c512b8128
|
Provenance
The following attestation bundles were made for methurator-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on VIBTOBIlab/methurator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
methurator-0.1.5-py3-none-any.whl -
Subject digest:
640495d904e9c033ac74db72d602581bbf99762ceabb8b23dc2d2b1373d7eee7 - Sigstore transparency entry: 718892675
- Sigstore integration time:
-
Permalink:
VIBTOBIlab/methurator@7a47929b80b53cda41a94026b7b2040285db75e7 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/VIBTOBIlab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7a47929b80b53cda41a94026b7b2040285db75e7 -
Trigger Event:
release
-
Statement type: