Skip to main content

A simple scanner package for NMD

Project description

NMD variant effect prediction

The NMD-Scanner is a Python-based variant effect annotation tool that predicts the likelihood of transcript degradation through nonsense-mediated decay (NMD). It reconstructs reference and alternative coding sequences as well as transcript sequences in some cases, identifies premature termination codons (PTCs), and evaluates canonical and non-canonical NMD escape rules. It can handle single-nucleotide variants, multiple base substitutions, long and short deletions and duplications as well as frameshift variants.

Features

  • Reconstructs reference and alternative CDS, reference transcript sequence and (in some cases) the alternative transcript sequences with metadata
  • Detects start / stop-loss and premature termination codons (PTCs) with the exact position in the CDS and in which exon it lies
  • Computes different NMD-related features:
    • Total, upstream and downstream exon count
    • Distance of PTC to original stop codon
    • Distance of PTC to start codon
    • Transcript length
    • 3' and 5' UTR lengths
  • Evaluates five canonical NMD escape rules:
    • Last exon rule
    • 50nt penultimate rule
    • Long exon rule
    • Start-proximal rule
    • Single-exon rule
  • Outputs all annotations as a structured DataFrame (CSV)

Installation

git clone https://github.com/gagneurlab/NMD-Scanner.git
cd NMD-Scanner
pip install .

Usage

Option 1: Annotating a VCF on the command line

# if running the script directly
python -m nmd_scanner.cli --vcf input.vcf --gtf annotation.gtf --fasta reference.fa --output results/

# option: fix exon numbering (recommended for hg19)
python -m nmd_scanner.cli --vcf input.vcf --gtf annotation.gtf --fasta reference.fa --output results/ --reassign_exons

Arguments:

  • --vcf: Path to input VCF (SNVs / Indels supported; frameshifts handled)
  • --gtf: Path to gene annotation (GTF)
  • --fasta: Path to reference genome FASTA
  • --output: Path to an existing directory (or a file path whose parent exists)
  • --reassign_exons: (flag) Recompute exon numbers (useful for hg19)

Output:

  • A CSV named <vcf_basename>_final_nmd_results.csv saved to --output, containing:
    • reconstructed reference / alternative CDS and transcript sequences(+ metadata)
    • PTC detection and start / stop-loss flags
    • NMD escape rules
    • extra features such as UTR lengths, exon counts, distances, etc.)

Option 2: Import as a python moduele

Instead of running the entire pipeline, you can import NMD-Scanner in Python and call only specific components. This is useful if you want to

  • only reconstruct transcript / CDS sequences
  • only compute NMD escape rules
  • integrate NMD-Scanner into a larger workflow
  • build custom features

For reconstructing reference and alternative coding and transcript sequences, PTC detection and start / stop-loss information:

import pandas as pd
import pyranges as pr
from pyfaidx import Fasta

import nmd_scanner

vcf = nmd_scanner.read_vcf("input.vcf")
gtf_pr = nmd_scanner.read_gtf("annotation.gtf")
fasta = Fasta("reference.fa")

# Optional: fix exon numbering (recommended for hg19)
gtf_pr = nmd_scanner.compute_exon_numbers(gtf_pr)

gtf_df = gtf_pr.df
cds_df = gtf_df[gtf_df["Feature"] == "CDS"]
exons_df = gtf_df[gtf_df["Feature"] == "exon"].copy()
exons_df["exon_length"] = exons_df["End"] - exons_df["Start"]

results = extract_ptc(cds_df, vcf, fasta, exons_df, output="tmp/")

Add NMD escape rules (last exon rule, 50 nt penultimate rule, long exon rule, start proximal rule, single exon rule, nmd escape) to the above computed results:

nmd_results = results.apply(nmd_scanner.evaluate_nmd_escape_rules, axis=1, result_type='expand')
results = pd.concat([results, nmd_results], axis=1)

Add extra NMD-related features (utr lengths, exon counts, ptc-related features) to above computed results:

extra_features = results.apply(nmd_scanner.add_nmd_features, axis=1, result_type='expand')
results = pd.concat([results, extra_features], axis=1)

License

All source code in this repository is licensed under the MIT License.

Citation

Schröder, C.H. (2025). Enhanced Aberrant Gene Expression Prediction across Human Tissues. Master's Thesis, Technical University of Munich / Ludwig-Maximilians-Universität München.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmd_scanner-0.1.1.tar.gz (26.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nmd_scanner-0.1.1-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file nmd_scanner-0.1.1.tar.gz.

File metadata

  • Download URL: nmd_scanner-0.1.1.tar.gz
  • Upload date:
  • Size: 26.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nmd_scanner-0.1.1.tar.gz
Algorithm Hash digest
SHA256 908995e3e367abc6c2840cf22fb3d67675a0e8f72bce3e0d964b70ed4133c05e
MD5 e633c29e059a4639d0f716e3e3f6f7ce
BLAKE2b-256 f0b869bd4ae58af5c8552b4a69ffa3bb40489c3ee5766707bf29a087678be470

See more details on using hashes here.

Provenance

The following attestation bundles were made for nmd_scanner-0.1.1.tar.gz:

Publisher: publish.yml on gagneurlab/NMD-Scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nmd_scanner-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: nmd_scanner-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nmd_scanner-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ab2cc274fa70445de26e9877a8af0f079e65822048791e371f2bf2392ab66033
MD5 8e518a855e681bff9a8838179470c94b
BLAKE2b-256 322f34c0a1fa062903694d268cab4e1da12d6f057dd4e60a2ad398c1e818335d

See more details on using hashes here.

Provenance

The following attestation bundles were made for nmd_scanner-0.1.1-py3-none-any.whl:

Publisher: publish.yml on gagneurlab/NMD-Scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page