Skip to main content

Local-first MTase candidate + methylation motif prediction pipeline

Project description

mtase-motif

mtase-motif is a local-first Python package for finding bacterial DNA methyltransferase candidates and transferring or inferring methylation motifs from a single genome.

The package keeps the database management code in mtase_motif/, but it does not bundle downloaded Pfam, TIGRFAMs, or REBASE payloads in the published artifact.

Install

Install the Python package in a local virtual environment:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e '.[dev]'

For a PyPI release, install the distribution as:

python -m pip install mtasemotif

The installed CLI command remains:

mtase-motif --help

Install native executables separately. The repo includes an optional environment.yml for a conda env that provides the non-Python tools:

conda env create -f environment.yml
conda activate mtase

End-To-End Walkthrough

The commands below cover the full sequence-first workflow from a clean checkout to a completed run on the example E. coli genome stored on this machine at /Users/li/data/hammerhead_test/ecoli.fa. That path is machine-local, not part of the checkout.

1. Prepare the environments

Activate the native-tool conda environment first, then the local Python venv:

conda activate mtase
source .venv/bin/activate

This keeps prodigal, hmmscan, hmmsearch, hmmpress, mmseqs or blastp, and fimo on PATH while the Python package stays in .venv. tigrfams remains local-source-only, so you only use --source for that target when you already have a local TIGRFAMs HMM install.

2. Create and populate the local database

The walkthrough below uses a project-local DB directory so the full run is self-contained:

DB_DIR=$PWD/db

mtase-motif db init --db-dir "$DB_DIR"
mtase-motif db fetch pfam --db-dir "$DB_DIR"
mtase-motif db fetch rebase --db-dir "$DB_DIR"
mtase-motif db index --db-dir "$DB_DIR"
mtase-motif db status --db-dir "$DB_DIR"

Downloaded databases live outside the package by default under ~/.cache/mtase-motif/db, but using --db-dir "$DB_DIR" keeps this walkthrough local to the repo.

Offline and local-mirror examples:

mtase-motif db fetch pfam --db-dir "$DB_DIR" --source /path/to/Pfam-A.hmm.gz
mtase-motif db fetch rebase --db-dir "$DB_DIR" --source /path/to/rebase_emboss_dir
mtase-motif db fetch tigrfams --db-dir "$DB_DIR" --source /path/to/TIGRFAMs_15.0_HMM.LIB.gz
mtase-motif db index --db-dir "$DB_DIR"

For offline REBASE, --source alone is not enough. The source directory must also contain rebase_proteins.faa or one or more REBASE *_Protein.txt protein dumps before mtase-motif db index can build the REBASE protein search index.

More database download and import notes are in docs/database_setup.md.

3. Run motif prediction on a genome FASTA

Use the example genome from /Users/li/data/hammerhead_test:

GENOME=/Users/li/data/hammerhead_test/ecoli.fa
OUT_DIR=$PWD/results/ecoli

mtase-motif run --genome "$GENOME" --db-dir "$DB_DIR" --out "$OUT_DIR" -j 4

Replace "$GENOME" with your own .fa or .fna file when running on a new genome.

Optional structure-assisted runs require local candidate structures via --structures-dir. Foldseek-backed steps also require foldseek on PATH and either --foldseek-db or the default DB at <db-dir>/structures/pdb/foldseek_db; --foldseek-labels only applies when that Foldseek DB-backed search path is available.

If you already have predicted proteins, you can skip gene calling:

mtase-motif run --genome "$GENOME" --proteins /path/to/proteins.faa --db-dir "$DB_DIR" --out "$OUT_DIR" -j 4

4. Inspect the outputs

Core outputs in "$OUT_DIR":

  • "$OUT_DIR"/mtase_candidates.tsv
  • "$OUT_DIR"/motif_calls.tsv
  • "$OUT_DIR"/motif_assignment.tsv
  • "$OUT_DIR"/summary.tsv
  • "$OUT_DIR"/<candidate_id>/motif/pwm.meme
  • "$OUT_DIR"/<candidate_id>/fimo/fimo.tsv
  • "$OUT_DIR"/<candidate_id>/qc/qc.json

motif_calls.tsv now carries derived motif semantics such as mod_position, motif_class, canonical/reverse-complement forms, and an overall assignment_state. motif_assignment.tsv separates the primary call from alternate related-candidate or hint routes when a candidate is unresolved or ambiguous.

Quick checks:

ls "$OUT_DIR"
head "$OUT_DIR/mtase_candidates.tsv"
head "$OUT_DIR/summary.tsv"

Runtime Tools

These executables must be on PATH for the sequence-first workflow:

  • prodigal
  • hmmscan, hmmsearch, hmmpress
  • mmseqs or blastp plus makeblastdb
  • fimo

Development

make lint
make test
make sdist-check
make package-check

Tagged releases are set up for GitHub Releases and PyPI publishing through the GitHub Actions workflows in .github/workflows/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mtasemotif-0.1.0.tar.gz (78.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mtasemotif-0.1.0-py3-none-any.whl (72.5 kB view details)

Uploaded Python 3

File details

Details for the file mtasemotif-0.1.0.tar.gz.

File metadata

  • Download URL: mtasemotif-0.1.0.tar.gz
  • Upload date:
  • Size: 78.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mtasemotif-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f49a16d569284a615d89a2e9c3515c0939fa6397c02edf9e8560f80e89f17a1a
MD5 07922400201c065502ae96e01f5ad48e
BLAKE2b-256 ddb805bf62352e5b62fbbb2283b5ff32f8e2277d01e55f31ee3e7d826137b079

See more details on using hashes here.

Provenance

The following attestation bundles were made for mtasemotif-0.1.0.tar.gz:

Publisher: release.yml on lrslab/mtasemotif

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mtasemotif-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mtasemotif-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 72.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mtasemotif-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71431082dd99cbf51073c6c40eace18531c629f7f2b8498dd9a3fa8fdb531b2d
MD5 854d4a5fb21b9b9ac38274ac1fdc70b7
BLAKE2b-256 8f3165bbe619bb3d5270697bb94d6bb272bd7c86dac24143c74869495b6d267e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mtasemotif-0.1.0-py3-none-any.whl:

Publisher: release.yml on lrslab/mtasemotif

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page