Tools for graph based and string based mapping and remapping genomic ↔ transcript ↔ aminoacid sequences.

These details have not been verified by PyPI

Project links

Project description

deeplotyper

Tools for mapping and remapping genomic ↔ transcript sequences.

Deeplotyper is a Python toolkit for genomic and transcriptomic sequence analysis that focuses on mapping coordinates between genomes and transcripts, applying variant haplotypes (sets of SNVs/indels) to reference sequences, and extracting open reading frames (ORFs) as either linear sequences or graph representations. It is designed as an academically rigorous, transparent alternative to traditional variant effect prediction tools. Deeplotyper’s core modules enable fine-grained control and interpretation of complex genetic variants without reliance on large external databases or opaque heuristics.

Installation

pip install deeplotyper

Requires:

Python ≥ 3.8
Biopython
pysam

Quickstart

from deeplotyper import (
    SequenceCoordinateMapper,
    HaplotypeRemapper,
    HaplotypeGroups,
    find_orfs, get_longest_orf,
    make_aligner, apply_alignment_gaps,
    build_linear_coords, build_raw_genome_coords, build_raw_transcript_coords,
    BaseCoordinateMapping, CodonCoordinateMapping,
    SequenceMappingResult, TranscriptMappingResult,
    HaplotypeEvent, NewTranscriptSequences, RawBase
)

# 1. Map a transcript to the genome
mapper = SequenceCoordinateMapper()
results = mapper.map_transcripts(
    genome_metadata={"seq_region_accession": "chr1", "start": 100, "strand": 1},
    full_genomic_sequence="ATGGGGTTTCCC...",
    exon_definitions_by_transcript={
        "tx1": [
            {"exon_number": 1, "start": 100, "end": 102, "sequence": "ATG"},
            …
        ]
    },
    transcript_sequences={"tx1": "ATGCCC"},
    exon_orders={"tx1": [1]},
    min_block_length=5
)

# 2. Apply SNV/indel haplotypes
hap_map = {
    (
        HaplotypeEvent(pos0=2, ref_allele="A", alt_seq="G"),
    ): ()
}
remapper = HaplotypeRemapper("ATGAAA...", results)
mutated = remapper.apply_haplotypes(hap_map)

# 3. Group samples by haplotype from a VCF
groups = HaplotypeGroups.from_vcf(
    "variants.vcf.gz",
    ref_seq="ATGAAA...",
    contig="1",
    start=0
)
distinct = groups.materialize()

Sequence Coordinate Mapping (SequenceCoordinateMapper)

One foundational feature of Deeplotyper is coordinate mapping between genomic DNA and transcript (cDNA/mRNA) coordinates. The SequenceCoordinateMapper class constructs an internal mapping between a reference sequence (e.g. a genomic region) and one or more transcript definitions (exons/introns structure). This allows conversion of coordinates in both directions (genome → transcript and transcript → genome).

For example, given a gene’s reference DNA sequence and exon coordinates for multiple transcripts (splice variants), the mapper can:

Translate a genomic position to a position within a transcript (cDNA coordinate).
Identify which exon or intron a mutation falls into.
Account for strand orientation and splicing (including reverse-complement mappings).

By building a precise base-level map of exonic regions, SequenceCoordinateMapper provides the groundwork for consistent variant placement across transcripts and enables downstream analyses like coding sequence extraction.

Implementation detail: Internally, the mapper may produce a linear coordinate index for each transcript relative to the reference. For instance, if Transcript A has exons 1–100 and 201–300 on the reference genome, a coordinate like genomic 250 can be mapped to position 150 of Transcript A’s cDNA.

Haplotype Remapping (HaplotypeRemapper)

Deeplotyper supports applying a set of genetic variants — collectively forming a haplotype — onto reference sequences or transcripts. The HaplotypeRemapper class takes a SequenceCoordinateMapper and a haplotype map (a collection of variants such as SNVs, insertions, deletions, or complex multi-nucleotide changes) and remaps the reference sequence to produce the altered (haplotype) sequence.

Ensures variants are applied in the correct positions across multi-exon transcripts.
Handles insertions and deletions (indels), adjusting downstream coordinates.
Supports complex events like multi-base substitutions or combinations of proximal variants.
Can model gene fusions or structural rearrangements by mapping coordinates from two reference sequences into one combined transcript.

The output is typically a new sequence (e.g. the mutated cDNA), along with diffs or lists of changed positions for full transparency.

ORF Extraction (find_orfs and get_longest_orf)

To assess coding impacts, Deeplotyper can extract open reading frames (ORFs) from sequences:

find_orfs scans a nucleotide sequence to identify all ORFs bounded by start and stop codons in the correct reading frame.
get_longest_orf retrieves the longest ORF from a given sequence.

These functions help reveal variant-induced effects such as novel start codons, truncated proteins, or frameshifts. Graph representations of ORFs (nodes = exons/segments, edges = splice connections) are also supported for visualizing complex haplotypes.

Sequence Alignment (make_aligner and apply_alignment_gaps)

When visualizing indels, Deeplotyper provides utilities for pairwise sequence alignment:

make_aligner returns a configured Biopython PairwiseAligner (global or local modes).
apply_alignment_gaps projects alignment gaps onto coordinate mappings or sequence strings, inserting dashes (‐) to show indels.

Example alignment output:

Ref: ATGCCCACGT...
Alt: ATG--ACGT...

This aids in interpreting frameshifts or in-frame indels and their effects on codon numbering.

Linear Coordinate Construction (build_linear_coords)

The build_linear_coords utility flattens a spliced transcript into a continuous cDNA or protein coordinate space and maps it back to genomic coordinates. Useful for:

Creating lookup tables (e.g. transcript→genome).
Plotting gene models.
Adjusting coordinates after indels in haplotype transcripts.

Example Use Case

from deeplotyper import SequenceCoordinateMapper, HaplotypeRemapper, find_orfs, get_longest_orf

# 1. Reference sequence (toy example)
gene_name = "GENE1"
chrom = "chr1"
strand = "+"

reference_seq = (
    "ATGGTcacct...TTAG"
)

# 2. Exon definitions for two transcripts
transcript1_exons = [(1, 300), (401, 600)]
transcript2_exons = [(1, 300), (501, 700)]

transcripts = {
    "Transcript1": {"exons": transcript1_exons, "strand": "+", "cds_start": 1, "cds_end": 600},
    "Transcript2": {"exons": transcript2_exons, "strand": "+", "cds_start": 1, "cds_end": 700}
}

mapper = SequenceCoordinateMapper(reference_seq, transcripts)

# 3. Define a haplotype (list of variant dicts)
haplotype = [
    {"pos": 50,  "ref": "G",   "alt": "T"},
    {"pos": 310, "ref": "",    "alt": "ACG"},
    {"pos": 450, "ref": "AGCT","alt": ""},
    {"pos": 480, "ref": "A",   "alt": "TT"},
]

remapper = HaplotypeRemapper(mapper, haplotype)

mut_seq_t1 = remapper.get_sequence("Transcript1")
mut_seq_t2 = remapper.get_sequence("Transcript2")

print(f"Transcript1 (mutated) length: {len(mut_seq_t1)}")
print(mut_seq_t1[40:60])

# 5. ORF extraction in mutated Transcript1
orfs = find_orfs(mut_seq_t1, assume_start_codon=True)
longest_orf = get_longest_orf(mut_seq_t1)
print(f"Number of ORFs: {len(orfs)}")
print(f"Longest ORF length: {len(longest_orf)}")

Addressing Limitations of VEP and Haplosaurus

Traditional VEP/Haplosaurus workflows have known limitations:

Complex variant support: Doesn’t natively handle gene fusions, multi-exon deletions, or intronic/splice-site changes. Deeplotyper applies any user-specified set of variants.
Database dependency: Requires multi-GB Ensembl caches and compiled APIs. Deeplotyper is pure-Python and works on user-provided sequences/coords.
Edge cases: Can fail on short transcripts or produce opaque “high impact” labels. Deeplotyper’s transparent implementation traces frameshifts and disrupted sequences.
Opacity: VEP uses black-box predictors (SIFT/PolyPhen). Deeplotyper exposes explicit sequence changes, enabling direct inspection of altered codons or ORFs.

License

MIT

Contributing

We welcome contributions! Feel free to add requests in the issues section or directly contribute with a pull request.

Citations

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2025.10.2a0 pre-release

May 16, 2025

2025.10.1a0 pre-release

May 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deeplotyper-2025.10.2a0.tar.gz (40.8 kB view details)

Uploaded May 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deeplotyper-2025.10.2a0-py3-none-any.whl (29.0 kB view details)

Uploaded May 16, 2025 Python 3

File details

Details for the file deeplotyper-2025.10.2a0.tar.gz.

File metadata

Download URL: deeplotyper-2025.10.2a0.tar.gz
Upload date: May 16, 2025
Size: 40.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for deeplotyper-2025.10.2a0.tar.gz
Algorithm	Hash digest
SHA256	`0ef4b2ed5794d822ba68fed52b118187dada195db102d84168da486c99ea6045`
MD5	`b3cbc80af5451bfe287662b1f6068f0b`
BLAKE2b-256	`f5c75de1238d87a9431aeb5049ced66d054d192d660df64889a80ded1b6022e8`

See more details on using hashes here.

File details

Details for the file deeplotyper-2025.10.2a0-py3-none-any.whl.

File metadata

Download URL: deeplotyper-2025.10.2a0-py3-none-any.whl
Upload date: May 16, 2025
Size: 29.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for deeplotyper-2025.10.2a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d0a964d3a305a5778537b526ea39afc6aaa19ed3c78b31ceae943d512b8c787`
MD5	`64afa1e4845acb65fdbf56e621fa6078`
BLAKE2b-256	`8c52d56d75c0f5b9dfcd77dc9af4d191cc429d13cd2bdc96d2ab47108b943172`

See more details on using hashes here.

deeplotyper 2025.10.2a0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

deeplotyper

deeplotyper

Installation

Quickstart

Sequence Coordinate Mapping (SequenceCoordinateMapper)

Haplotype Remapping (HaplotypeRemapper)

ORF Extraction (find_orfs and get_longest_orf)

Sequence Alignment (make_aligner and apply_alignment_gaps)

Linear Coordinate Construction (build_linear_coords)

Example Use Case

Addressing Limitations of VEP and Haplosaurus

License

Contributing

Citations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes