A simple scanner package for NMD

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Hoeze

These details have not been verified by PyPI

Project description

NMD variant effect prediction

The NMD-Scanner is a Python-based variant effect annotation tool that predicts the likelihood of transcript degradation through nonsense-mediated decay (NMD). It reconstructs reference and alternative coding sequences as well as transcript sequences in some cases, identifies premature termination codons (PTCs), and evaluates canonical and non-canonical NMD escape rules. It can handle single-nucleotide variants, multiple base substitutions, long and short deletions and duplications as well as frameshift variants.

Features

Reconstructs reference and alternative CDS, reference transcript sequence and (in some cases) the alternative transcript sequences with metadata
Detects start / stop-loss and premature termination codons (PTCs) with the exact position in the CDS and in which exon it lies
Computes different NMD-related features:
- Total, upstream and downstream exon count
- Distance of PTC to original stop codon
- Distance of PTC to start codon
- Transcript length
- 3' and 5' UTR lengths
Evaluates five canonical NMD escape rules:
- Last exon rule
- 50nt penultimate rule
- Long exon rule
- Start-proximal rule
- Single-exon rule
Outputs all annotations as a structured DataFrame (CSV)

Installation

git clone https://github.com/gagneurlab/NMD-Scanner.git
cd NMD-Scanner
pip install .

Usage

Option 1: Annotating a VCF on the command line

After pip install . the nmd-scanner command is available:

nmd-scanner --vcf input.vcf --gtf annotation.gtf --fasta reference.fa --output results/input.csv

# write Parquet instead of CSV (requires pyarrow)
nmd-scanner --vcf input.vcf --gtf annotation.gtf --fasta reference.fa --output results/input.parquet

# option: fix exon numbering (recommended for hg19)
nmd-scanner --vcf input.vcf --gtf annotation.gtf --fasta reference.fa --output results/input.csv --reassign_exons

The equivalent python -m nmd_scanner.cli ... invocation also works without installing the console script.

Arguments:

--vcf: Path to input VCF (SNVs / Indels supported; frameshifts handled)
--gtf: Path to gene annotation (GTF)
--fasta: Path to reference genome FASTA
--output: Path to the output file. Extension selects the format: .csv for CSV, .parquet or .pq for Parquet. The parent directory must already exist; the file is overwritten if present.
--reassign_exons: (flag) Recompute exon numbers (useful for hg19)

Output:

The file specified by --output, containing:
- reconstructed reference / alternative CDS and transcript sequences (+ metadata)
- PTC detection and start / stop-loss flags
- NMD escape rules
- extra features such as UTR lengths, exon counts, distances, etc.

Option 2: Import as a python module

Instead of running the entire pipeline, you can import NMD-Scanner in Python and call only specific components. This is useful if you want to

only reconstruct transcript / CDS sequences
only compute NMD escape rules
integrate NMD-Scanner into a larger workflow
build custom features

For reconstructing reference and alternative coding and transcript sequences, PTC detection and start / stop-loss information:

import pandas as pd
import pyranges as pr
from pyfaidx import Fasta

import nmd_scanner

vcf = nmd_scanner.read_vcf("input.vcf")
gtf_pr = nmd_scanner.read_gtf("annotation.gtf")
fasta = Fasta("reference.fa")

# Optional: fix exon numbering (recommended for hg19)
gtf_pr = nmd_scanner.compute_exon_numbers(gtf_pr)

gtf_df = gtf_pr.df
cds_df = gtf_df[gtf_df["Feature"] == "CDS"]
exons_df = gtf_df[gtf_df["Feature"] == "exon"].copy()
exons_df["exon_length"] = exons_df["End"] - exons_df["Start"]

results = nmd_scanner.extract_ptc(cds_df, vcf, fasta, exons_df)

Add NMD escape rules (last exon rule, 50 nt penultimate rule, long exon rule, start proximal rule, single exon rule, nmd escape) to the above computed results:

nmd_results = results.apply(nmd_scanner.evaluate_nmd_escape_rules, axis=1, result_type='expand')
results = pd.concat([results, nmd_results], axis=1)

Add extra NMD-related features (utr lengths, exon counts, ptc-related features) to above computed results:

extra_features = results.apply(nmd_scanner.add_nmd_features, axis=1, result_type='expand')
results = pd.concat([results, extra_features], axis=1)

License

All source code in this repository is licensed under the MIT License.

Citation

Schröder, C.H. (2025). Enhanced Aberrant Gene Expression Prediction across Human Tissues. Master's Thesis, Technical University of Munich / Ludwig-Maximilians-Universität München.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Hoeze

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

May 14, 2026

0.1.1

May 13, 2026

0.0.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmd_scanner-0.2.0.tar.gz (26.5 MB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nmd_scanner-0.2.0-py3-none-any.whl (22.1 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file nmd_scanner-0.2.0.tar.gz.

File metadata

Download URL: nmd_scanner-0.2.0.tar.gz
Upload date: May 14, 2026
Size: 26.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nmd_scanner-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`0684c6d37b3506ac33285b8227f2be92609fa7c45647686a20a863636999fc2d`
MD5	`ad1531829667b93aee286a9b3170f0d5`
BLAKE2b-256	`726e606a1cf499ce14c7c8626183799de19234e35569ddb10c0cadc85a2d7aa3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nmd_scanner-0.2.0.tar.gz:

Publisher: publish.yml on gagneurlab/NMD-Scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nmd_scanner-0.2.0.tar.gz
- Subject digest: 0684c6d37b3506ac33285b8227f2be92609fa7c45647686a20a863636999fc2d
- Sigstore transparency entry: 1538051999
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: gagneurlab/NMD-Scanner@0af255742eae42e3ed4e240541bc1eed228bfbfc
- Branch / Tag: refs/heads/main
- Owner: https://github.com/gagneurlab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0af255742eae42e3ed4e240541bc1eed228bfbfc
- Trigger Event: workflow_dispatch

File details

Details for the file nmd_scanner-0.2.0-py3-none-any.whl.

File metadata

Download URL: nmd_scanner-0.2.0-py3-none-any.whl
Upload date: May 14, 2026
Size: 22.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nmd_scanner-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b78c33959f313b854fef247521278122da9f067ec6a29b58625316cda9d777b`
MD5	`0dc6503e9b139ea181aeeef906e36a1a`
BLAKE2b-256	`5f2e68730c6be21a80d15e0e6c47d5938d464f8ce669e21e68e1171a5002188c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nmd_scanner-0.2.0-py3-none-any.whl:

Publisher: publish.yml on gagneurlab/NMD-Scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nmd_scanner-0.2.0-py3-none-any.whl
- Subject digest: 5b78c33959f313b854fef247521278122da9f067ec6a29b58625316cda9d777b
- Sigstore transparency entry: 1538052169
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: gagneurlab/NMD-Scanner@0af255742eae42e3ed4e240541bc1eed228bfbfc
- Branch / Tag: refs/heads/main
- Owner: https://github.com/gagneurlab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0af255742eae42e3ed4e240541bc1eed228bfbfc
- Trigger Event: workflow_dispatch

nmd-scanner 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

NMD variant effect prediction

Features

Installation

Usage

Option 1: Annotating a VCF on the command line

Option 2: Import as a python module

License

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance