Local-first MTase candidate + methylation motif prediction pipeline
Project description
mtase-motif
mtase-motif is a local-first Python package for finding bacterial DNA
methyltransferase candidates and transferring or inferring methylation motifs
from a single genome.
The package keeps the database management code in mtase_motif/, but it does
not bundle downloaded Pfam, TIGRFAMs, or REBASE payloads in the published
artifact.
Install
Install the Python package in a local virtual environment:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e '.[dev]'
For a PyPI release, install the distribution as:
python -m pip install mtasemotif
The installed CLI command remains:
mtase-motif --help
Install native executables separately. The repo includes an optional
environment.yml for a conda env that provides the non-Python tools:
conda env create -f environment.yml
conda activate mtase
End-To-End Walkthrough
The commands below cover the full sequence-first workflow from a clean checkout
to a completed run on the example E. coli genome stored on this machine at
/Users/li/data/hammerhead_test/ecoli.fa. That path is machine-local, not part
of the checkout.
1. Prepare the environments
Activate the native-tool conda environment first, then the local Python venv:
conda activate mtase
source .venv/bin/activate
This keeps prodigal, hmmscan, hmmsearch, hmmpress, mmseqs or
blastp, and fimo on PATH while the Python package stays in .venv.
The conda environment provides the HMMER tools used for TIGRFAMs, but it does
not download the TIGRFAMs model library itself.
2. Create and populate the local database
The walkthrough below uses a project-local DB directory so the full run is self-contained:
DB_DIR=$PWD/db
mtase-motif db init --db-dir "$DB_DIR"
mtase-motif db fetch pfam --db-dir "$DB_DIR"
mtase-motif db fetch rebase --db-dir "$DB_DIR"
mtase-motif db index --db-dir "$DB_DIR"
mtase-motif db status --db-dir "$DB_DIR"
Downloaded databases live outside the package by default under
~/.cache/mtase-motif/db, but using --db-dir "$DB_DIR" keeps this walkthrough
local to the repo.
The curated Pfam subset includes MTase catalytic domains plus Type I context
models (PF02384, PF12161, PF01420, and PF04313). Pfam-only runs can
therefore recover Type I-like candidates when the MTase core/support domains or
nearby S/R context are present. TIGRFAMs remains optional, but it adds direct
HsdM/S/R and Type III Mod/Res context and can make type hints more specific.
To enable TIGRFAMs, download or otherwise obtain the TIGRFAMs HMM library
separately, then point --source at the file or at a directory containing one
of these names:
TIGRFAMs.hmmTIGRFAMs.hmm.gzTIGRFAMs_15.0_HMM.LIBTIGRFAMs_15.0_HMM.LIB.gz- another
*.HMM.LIBor*.HMM.LIB.gz
Then import and rebuild indexes:
mtase-motif db fetch tigrfams --db-dir "$DB_DIR" --source /path/to/TIGRFAMs_15.0_HMM.LIB.gz
mtase-motif db index --db-dir "$DB_DIR"
mtase-motif db status --db-dir "$DB_DIR"
db index needs hmmpress; mtase-motif run uses hmmsearch when the
TIGRFAMs subset is present.
Offline and local-mirror examples:
mtase-motif db fetch pfam --db-dir "$DB_DIR" --source /path/to/Pfam-A.hmm.gz
mtase-motif db fetch rebase --db-dir "$DB_DIR" --source /path/to/rebase_emboss_dir
mtase-motif db index --db-dir "$DB_DIR"
For offline REBASE, --source alone is not enough. The source directory must
also contain rebase_proteins.faa or one or more REBASE *_Protein.txt
protein dumps before mtase-motif db index can build the REBASE protein
search index.
More database download and import notes are in docs/database_setup.md.
3. Run motif prediction on a genome FASTA
Use the example genome from /Users/li/data/hammerhead_test:
GENOME=/Users/li/data/hammerhead_test/ecoli.fa
OUT_DIR=$PWD/results/ecoli
mtase-motif run --genome "$GENOME" --db-dir "$DB_DIR" --out "$OUT_DIR" -j 4
Replace "$GENOME" with your own .fa or .fna file when running on a new
genome.
Optional structure-assisted runs require local candidate structures via
--structures-dir. Foldseek-backed steps also require foldseek on PATH and
either --foldseek-db or the default DB at
<db-dir>/structures/pdb/foldseek_db; --foldseek-labels only applies when
that Foldseek DB-backed search path is available.
If you already have predicted proteins, you can skip gene calling:
mtase-motif run --genome "$GENOME" --proteins /path/to/proteins.faa --db-dir "$DB_DIR" --out "$OUT_DIR" -j 4
4. Inspect the outputs
Core outputs in "$OUT_DIR":
"$OUT_DIR"/mtase_candidates.tsv"$OUT_DIR"/motif_calls.tsv"$OUT_DIR"/motif_assignment.tsv"$OUT_DIR"/summary.tsv"$OUT_DIR"/<candidate_id>/motif/pwm.meme"$OUT_DIR"/<candidate_id>/fimo/fimo.tsv"$OUT_DIR"/<candidate_id>/qc/qc.json
motif_calls.tsv now carries derived motif semantics such as mod_position,
motif_class, canonical/reverse-complement forms, and an overall
assignment_state. motif_assignment.tsv separates the primary call from
alternate related-candidate or hint routes when a candidate is unresolved or
ambiguous.
Quick checks:
ls "$OUT_DIR"
head "$OUT_DIR/mtase_candidates.tsv"
head "$OUT_DIR/summary.tsv"
Runtime Tools
These executables must be on PATH for the sequence-first workflow:
prodigalhmmscan,hmmsearch,hmmpressmmseqsorblastpplusmakeblastdbfimo
Development
make lint
make test
make sdist-check
make package-check
Release notes live in CHANGELOG.md. Before tagging a release, bump
mtase_motif/__init__.py, run make package-check, then push a matching tag
such as v0.1.1. Tagged releases are set up for GitHub Releases and PyPI
publishing through the GitHub Actions workflows in .github/workflows/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mtasemotif-0.1.1.tar.gz.
File metadata
- Download URL: mtasemotif-0.1.1.tar.gz
- Upload date:
- Size: 83.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1992f0430c8dfc46ca589a02a6201f6fcf6cb96d0cc5a57851e5529884b19798
|
|
| MD5 |
bd725b629f48803e0da15d1c67a6b274
|
|
| BLAKE2b-256 |
6b97341e117338fa864f96838eaa6e855d249d70b0d60bbe1b1ec955f8cfd35e
|
Provenance
The following attestation bundles were made for mtasemotif-0.1.1.tar.gz:
Publisher:
release.yml on lrslab/mtasemotif
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mtasemotif-0.1.1.tar.gz -
Subject digest:
1992f0430c8dfc46ca589a02a6201f6fcf6cb96d0cc5a57851e5529884b19798 - Sigstore transparency entry: 1439305548
- Sigstore integration time:
-
Permalink:
lrslab/mtasemotif@a5e5ce8509277840800f493ef89a2e47033cdba3 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/lrslab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a5e5ce8509277840800f493ef89a2e47033cdba3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mtasemotif-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mtasemotif-0.1.1-py3-none-any.whl
- Upload date:
- Size: 74.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c51e8226ba5783e9b26114d4b9ab6ef7297651c4bb57639adb1ed2ad130a0c96
|
|
| MD5 |
5830a7769561de95de14fd1d749b7da9
|
|
| BLAKE2b-256 |
484a2b94d32017da4786e8612bdeb116dae57cadb40b0ab1d4e0890fa7d33987
|
Provenance
The following attestation bundles were made for mtasemotif-0.1.1-py3-none-any.whl:
Publisher:
release.yml on lrslab/mtasemotif
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mtasemotif-0.1.1-py3-none-any.whl -
Subject digest:
c51e8226ba5783e9b26114d4b9ab6ef7297651c4bb57639adb1ed2ad130a0c96 - Sigstore transparency entry: 1439305552
- Sigstore integration time:
-
Permalink:
lrslab/mtasemotif@a5e5ce8509277840800f493ef89a2e47033cdba3 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/lrslab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a5e5ce8509277840800f493ef89a2e47033cdba3 -
Trigger Event:
push
-
Statement type: