Skip to main content

mdb: population-level DNA methylation analysis toolkit

Project description

mdb

PyPI version Python versions License

mdb builds and queries CpG-by-sample methylation matrices from ONT and PacBio BED inputs.

  • PyPI package: methdb
  • CLI command: mdb

Install

pip install .

Verify:

mdb --help
mdb --version

Core Concepts

  • Sample bundle (.smdb): one sample, multiple track views (assay/haplotype/strand).
  • Cohort store (.mmdb): merged sample bundles for population-scale queries.
  • Backends:
    • zarr (default, compressed, block-aligned merge writes)
    • npy (optional compatibility backend)

Quick Start

1) Build CpG index

mdb index -r GRCh38_no_alt.fa -o GRCh38.cpg_index.npz

Include chrX/chrY:

mdb index -r GRCh38_no_alt.fa -o GRCh38.cpg_index.npz --sex

2) Create sample bundle

ONT (modkit output file or directory):

mdb create \
  -p ont \
  -n GRCh38.cpg_index.npz \
  -b /path/to/ont_input \
  -o sample_ont.smdb \
  -c 5 \
  --sample-id SAMPLE_ONT

PacBio (prefix or directory):

mdb create \
  -p pacbio \
  -n GRCh38.cpg_index.npz \
  -b /path/to/pacbio_prefix \
  -o sample_pb.smdb \
  -c 5 \
  --sample-id SAMPLE_PB

3) Merge sample bundles into a cohort

Default backend (zarr):

mdb merge \
  -i sample_ont.smdb sample_pb.smdb \
  -o cohort.mmdb \
  --workers 2 \
  --block-size 64 \
  --zarr-row-chunk 65536 \
  --zarr-codec zstd \
  --zarr-clevel 5 \
  --zarr-shuffle bitshuffle \
  --zarr-codec-threads 4

NPY backend (explicit):

mdb merge \
  -i sample_ont.smdb sample_pb.smdb \
  -o cohort_npy.mmdb \
  --cohort-backend npy \
  --workers 2 \
  --block-size 64

Build modifiedC view (5mC + 5hmC where available):

mdb merge -i sample_ont.smdb sample_pb.smdb -o cohort_modifiedc.mmdb -m

4) Append new samples to existing cohort

mdb append \
  -c cohort.mmdb \
  -i new_sample1.smdb new_sample2.smdb

5) Query values

Point query:

mdb query \
  -i cohort.mmdb \
  --sample-id SAMPLE_PB \
  --assay 5mC \
  --haplotype combined \
  --strand combined \
  --locus chr1:10469

Range query:

mdb query \
  -i cohort.mmdb \
  --sample-id SAMPLE_PB \
  --assay 5mC \
  --haplotype combined \
  --strand combined \
  --region chr1:10469-12000

Important Notes

  • create --reader currently defaults to scan and the active create path uses scan-based reading.
  • merge and append require sample bundles created by current mdb create (manifest-based .smdb layout).
  • pca is a legacy command path that expects flat merged .npy matrix layout, not the current view-based cohort store.

License

MIT (LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

methdb-0.0.3.tar.gz (39.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

methdb-0.0.3-py3-none-any.whl (42.0 kB view details)

Uploaded Python 3

File details

Details for the file methdb-0.0.3.tar.gz.

File metadata

  • Download URL: methdb-0.0.3.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for methdb-0.0.3.tar.gz
Algorithm Hash digest
SHA256 56b85bc0b78c1d813c0b5c64878959a3a9018f85671318409c968492390f5d64
MD5 6625c773760eb74800de0672de9adef3
BLAKE2b-256 5492ee484fe68554d04ffe3814996d351329d49d6cd713c5a6342ca20f7c57a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for methdb-0.0.3.tar.gz:

Publisher: publish-pypi.yml on Fu-Yilei/mdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file methdb-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: methdb-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 42.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for methdb-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 96228a15044b98b0855fab423c40cd86d9c8329b9957300d3b65d9797c953e7c
MD5 787d87d3cdf466ab24957e40af66221e
BLAKE2b-256 fac93cb000cf90e2f5ef657f415f85ccd3827823bf608e64225ae57c5c4611c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for methdb-0.0.3-py3-none-any.whl:

Publisher: publish-pypi.yml on Fu-Yilei/mdb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page