Generate isolate-specific genome masks for Mycobacterium tuberculosis
Project description
mtbmasker
mtbmasker is a Python command-line tool designed to generate isolate-specific conservative genome masks for Mycobacterium tuberculosis (MTB) genomes. This is particularly useful for downstream variant calling and phylogenomic analyses by masking problematic genomic regions (e.g., PE/PPE genes, IS elements, and other repetitive loci).
✨ Features
- Generates genome masks per isolate using BLASTn alignment against predefined repetitive genes.
- Supports custom isolate genome files and gene query sets.
- Automatically formats coordinates to BED, sorts, and merges overlapping masked regions.
- Outputs high-quality, isolate-specific
.bedfiles for genome masking.
🧬 Use case
This tool was originally developed for comparative genomics and transmission studies of Mycobacterium tuberculosis complex (MTBC) isolates, including M. africanum. It ensures that inter-lineage diversity is respected during masking.
🔧 Installation
From GitHub (development version):
pip install git+https://github.com/EtienneNtumba/mtbmasker.git
🚀 Usage
mtbmasker mask input_list.tsv --query-fasta data/genes_to_mask.fasta
Arguments:
input_list.tsv— A tab-separated file with one isolate ID per line (without .fasta extension). Each ID must correspond to aID.fastafile present in the working directory.--query-fasta— Fasta file of problematic/repetitive genes to be masked (default:data/genes_to_mask.fasta).
📁 Example
input_list.tsv:
ARR1960.LR.Asm
QC-9.LR.Asm
N1177.LR.Asm
Each listed isolate must have a corresponding ARR1960.LR.Asm.fasta, etc., in the current directory.
🔬 Requirements
- Python ≥ 3.8
- BLAST+
- BEDTools
- Typer
Both BLAST and BEDTools must be installed and available in your $PATH or via a conda environment.
📄 Output
For each isolate, the following file is generated:
<isolate>_conservitive_AF19-like_masking_file.bed
This BED file contains sorted and merged coordinates of masked regions.
📚 Citation
If you use this tool in your research, please cite:
Ntumba, E., Whiley, D., et al. (2025). [Manuscript Title], Journal Name, DOI: xxx
📂 License
This tool is licensed under the MIT License.
👥 Authors
- Etienne Ntumba Kabongo — Université de Montréal / McGill University
- Dan Whiley — Nottingham University
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mtbmasker-0.1.0.tar.gz.
File metadata
- Download URL: mtbmasker-0.1.0.tar.gz
- Upload date:
- Size: 3.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
801bc2c1aaa2a2ee571f5504ad4529baa252b9cf88e002ab12594436ed1a8736
|
|
| MD5 |
15653575fd7efd31839fabe0cf022201
|
|
| BLAKE2b-256 |
1f1e9d9ecaca6cf01169217b7560fd615d70a62f9395d2f8d465e8ad3f508e71
|
File details
Details for the file mtbmasker-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mtbmasker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b668c913c8c5f0ec81291402f45bc05d4f93c25f34f796570176a63c0135325e
|
|
| MD5 |
f8a7714a74ea06daac00aabc5bbae449
|
|
| BLAKE2b-256 |
1c65bbe40ee4d9edf90462804e0967ae24629b07a38f69a0bf0ec27ed44d0ff4
|