Skip to main content

Generate isolate-specific genome masks for Mycobacterium tuberculosis

Project description

mtbmasker

👥 Authors

  • Etienne Ntumba Kabongo — Université de Montréal / McGill University
  • Dan Whiley — Nottingham University

mtbmasker is a Python command-line tool designed to generate isolate-specific conservative genome masks for Mycobacterium tuberculosis (MTB) genomes. This is particularly useful for downstream variant calling and phylogenomic analyses by masking problematic genomic regions (e.g., PE/PPE genes, IS elements, and other repetitive loci).


✨ Features

  • Generates genome masks per isolate using BLASTn alignment against predefined repetitive genes.
  • Supports custom isolate genome files and gene query sets.
  • Automatically formats coordinates to BED, sorts, and merges overlapping masked regions.
  • Outputs high-quality, isolate-specific .bed files for genome masking.

🧬 Use case

This tool was originally developed for comparative genomics and transmission studies of Mycobacterium tuberculosis complex (MTBC) isolates, including M. africanum. It ensures that inter-lineage diversity is respected during masking.


🔧 Installation

From PyPI (stable version):

pip install mtbmasker

From GitHub (development version):

pip install git+https://github.com/EtienneNtumba/mtbmasker.git

🚀 Usage

Basic Command

mtbmasker mask input_list.tsv --query-fasta data/genes_to_mask.fasta

Help Output

Usage: mtbmasker mask [OPTIONS] INPUT_LIST

  🔬 Generate isolate-specific conservative genome masks (.bed files) for
  each isolate using BLASTn alignments and BEDTools.

Arguments:
  INPUT_LIST    Path to TSV file with isolate IDs (without .fasta extension) [required]

Options:
  --query-fasta TEXT     Path to fasta file containing genes to be masked [default: data/genes_to_mask.fasta]
  --blastn-path TEXT     Optional: Path to custom blastn binary
  --makeblastdb-path TEXT Optional: Path to makeblastdb binary
  --bedtools-path TEXT   Optional: Path to bedtools binary
  --threads INTEGER      Number of threads to use [default: 4]
  --output-dir TEXT      Directory to save output BED files [default: current directory]
  --help                 Show this message and exit.

Command-Line Options Explained

Required Arguments:

  • INPUT_LIST — Path to a tab-separated file containing one isolate ID per line (without .fasta extension). Each ID must correspond to a <ID>.fasta file present in the working directory.

Optional Flags:

  • --query-fasta — Path to FASTA file containing problematic/repetitive genes to be masked. Default: data/genes_to_mask.fasta
  • --blastn-path — Custom path to blastn executable if not in system PATH
  • --makeblastdb-path — Custom path to makeblastdb executable if not in system PATH
  • --bedtools-path — Custom path to bedtools executable if not in system PATH
  • --threads — Number of CPU threads to use for parallel processing. Default: 4
  • --output-dir — Directory where output BED files will be saved. Default: current working directory

📁 Example

input_list.tsv:

ARR1960.LR.Asm
QC-9.LR.Asm
N1177.LR.Asm

Each listed isolate must have a corresponding ARR1960.LR.Asm.fasta, etc., in the current directory.


🔬 Requirements

  • Python ≥ 3.8
  • BLAST+
  • BEDTools
  • Typer

Both BLAST and BEDTools must be installed and available in your $PATH or via a conda environment.


📄 Output

For each isolate, the following file is generated:

<isolate>_conservitive_AF19-like_masking_file.bed

This BED file contains sorted and merged coordinates of masked regions.


📂 License

This tool is licensed under the MIT License.


👥 Authors

  • Etienne Ntumba Kabongo — Université de Montréal / McGill University
  • Dan Whiley — Nottingham University

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mtbmasker-0.1.2.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mtbmasker-0.1.2-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file mtbmasker-0.1.2.tar.gz.

File metadata

  • Download URL: mtbmasker-0.1.2.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for mtbmasker-0.1.2.tar.gz
Algorithm Hash digest
SHA256 670a49a967d7857b49b67092bc66f66b2ed7c408ac0d5be9f55ab4bcfd7766f6
MD5 4d542cb376995cd68e281ddbb50c70c2
BLAKE2b-256 144fe8cb0bf8dea20fb02d0a7d597a2b30969eab48c3c5eef373592b2b8c9b59

See more details on using hashes here.

File details

Details for the file mtbmasker-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mtbmasker-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for mtbmasker-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a23702783fa7c500f507e756e49ef77e15501c45b4e694ef7c751958fbeee6ec
MD5 8e56f7884e8483b1231de5ba3a4692c1
BLAKE2b-256 72c634643bc017207f458aa1976486715889600d916b35e411916f9d0283c0ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page