Skip to main content

Generate isolate-specific genome masks for Mycobacterium tuberculosis

Project description

mtbmasker

👥 Authors

  • Etienne Ntumba Kabongo — Université de Montréal / McGill University
  • Dan Whiley — Nottingham University

mtbmasker is a Python command-line tool designed to generate isolate-specific conservative genome masks for Mycobacterium tuberculosis (MTB) genomes. This is particularly useful for downstream variant calling and phylogenomic analyses by masking problematic genomic regions (e.g., PE/PPE genes, IS elements, and other repetitive loci).


✨ Features

  • Generates genome masks per isolate using BLASTn alignment against predefined repetitive genes.
  • Supports custom isolate genome files and gene query sets.
  • Automatically formats coordinates to BED, sorts, and merges overlapping masked regions.
  • Outputs high-quality, isolate-specific .bed files for genome masking.

🧬 Use case

This tool was originally developed for comparative genomics and transmission studies of Mycobacterium tuberculosis complex (MTBC) isolates, including M. africanum. It ensures that inter-lineage diversity is respected during masking.


🔧 Installation

From GitHub (development version):

pip install git+https://github.com/EtienneNtumba/mtbmasker.git

🚀 Usage

Basic Command

mtbmasker mask input_list.tsv --query-fasta data/genes_to_mask.fasta

Help Output

Usage: mtbmasker mask [OPTIONS] INPUT_LIST

  🔬 Generate isolate-specific conservative genome masks (.bed files) for
  each isolate using BLASTn alignments and BEDTools.

Arguments:
  INPUT_LIST    Path to TSV file with isolate IDs (without .fasta extension) [required]

Options:
  --query-fasta TEXT     Path to fasta file containing genes to be masked [default: data/genes_to_mask.fasta]
  --blastn-path TEXT     Optional: Path to custom blastn binary
  --makeblastdb-path TEXT Optional: Path to makeblastdb binary
  --bedtools-path TEXT   Optional: Path to bedtools binary
  --threads INTEGER      Number of threads to use [default: 4]
  --output-dir TEXT      Directory to save output BED files [default: current directory]
  --help                 Show this message and exit.

Command-Line Options Explained

Required Arguments:

  • INPUT_LIST — Path to a tab-separated file containing one isolate ID per line (without .fasta extension). Each ID must correspond to a <ID>.fasta file present in the working directory.

Optional Flags:

  • --query-fasta — Path to FASTA file containing problematic/repetitive genes to be masked. Default: data/genes_to_mask.fasta
  • --blastn-path — Custom path to blastn executable if not in system PATH
  • --makeblastdb-path — Custom path to makeblastdb executable if not in system PATH
  • --bedtools-path — Custom path to bedtools executable if not in system PATH
  • --threads — Number of CPU threads to use for parallel processing. Default: 4
  • --output-dir — Directory where output BED files will be saved. Default: current working directory

📁 Example

input_list.tsv:

ARR1960.LR.Asm
QC-9.LR.Asm
N1177.LR.Asm

Each listed isolate must have a corresponding ARR1960.LR.Asm.fasta, etc., in the current directory.


🔬 Requirements

  • Python ≥ 3.8
  • BLAST+
  • BEDTools
  • Typer

Both BLAST and BEDTools must be installed and available in your $PATH or via a conda environment.


📄 Output

For each isolate, the following file is generated:

<isolate>_conservitive_AF19-like_masking_file.bed

This BED file contains sorted and merged coordinates of masked regions.


📚 Citation

If you use this tool in your research, please cite:

Ntumba, E., Whiley, D., et al. (2025). [Manuscript Title], Journal Name, DOI: xxx


📂 License

This tool is licensed under the MIT License.


👥 Authors

  • Etienne Ntumba Kabongo — Université de Montréal / McGill University
  • Dan Whiley — Nottingham University

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mtbmasker-0.1.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mtbmasker-0.1.1-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file mtbmasker-0.1.1.tar.gz.

File metadata

  • Download URL: mtbmasker-0.1.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for mtbmasker-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2ac9eb783ca3b91343d6a95af456484f23a4ecdae41e6ea25c8f3033e6b70129
MD5 ab73bf43d15ee7c92b21ed70bcab5e71
BLAKE2b-256 359e1e6c05d54d19ae39e6db21d03608570d4e36327597d6e32be4babdbdfae5

See more details on using hashes here.

File details

Details for the file mtbmasker-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mtbmasker-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for mtbmasker-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 58230b7e489381c9b66e7617e5fb4f205feec34ee4043ea5a025db4cf281c427
MD5 4ef49640afd26f5e3bad338f6fd6b36b
BLAKE2b-256 97eb76de6e65520e425883ffb1a5894f47379795bec6e1cdbfe20a74b0e6d987

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page