Germline-informed reverse translation of antibody amino acid sequences to nucleotide sequences

These details have not been verified by PyPI

Project description

abverse

abverse logo

Germline-informed reverse translation of antibody amino acid sequences to nucleotide sequences.

abverse is a companion package to abstar. It takes antibody amino acid sequences — common output from mass spectrometry, proteomics, or databases — and produces nucleotide sequences that are maximally faithful to the inferred germline, so that downstream abstar annotation (V/J assignment, mutation counts, CDR/FWR regions) reflects real somatic hypermutation rather than arbitrary codon choices.

Why abverse?

abstar requires nucleotide input. Researchers with AA sequences have two options:

Naive reverse-translation → pick any codon per amino acid → run abstar → get inflated mutation counts and unreliable CDR boundaries because every codon choice that differs from germline looks like a mutation.
abverse → single-pass algorithm → germline-faithful NT → feed directly into abstar.run().

The abverse approach is provably optimal: for each codon position aligned to a germline gene, it picks the synonymous codon with the minimum Hamming distance to the germline codon (ties broken by human codon frequency). Because codons don't overlap and Hamming distance is additive, the global minimum equals the sum of per-position minima — and the entire lookup table is pre-computed at import time (O(1) per position at runtime).

Installation

pip install abverse

Requirements: Python ≥ 3.10, abutils ≥ 0.5.1, abstar (for germline databases), polars ≥ 0.20, MMseqs2 (bundled via abutils).

Quick start

import abutils
import abverse
import abstar

# Load your AA sequences (FASTA file, list of strings, or list of abutils.Sequence)
aa_seqs = abutils.io.read_fasta("antibodies_aa.fasta")

# Reverse-translate to germline-faithful NT sequences
nt_seqs = abverse.reverse_translate(aa_seqs)

# Feed directly into abstar — results will have meaningful mutation counts
results = abstar.run(nt_seqs)

The returned nt_seqs is a list[abutils.Sequence]. Each sequence carries three annotations:

Annotation	Description
`v_call`	Assigned V germline gene
`j_call`	Assigned J germline gene
`reconstruction_method`	`germline_vj`, `germline_v_only`, `germline_j_only`, or `codon_frequency`

API

`abverse.reverse_translate(sequences, ...)`

abverse.reverse_translate(
    sequences,              # FASTA path | list[str] | list[abutils.Sequence]
    species="human",        # germline species
    receptor="bcr",         # receptor type
    n_processes=None,       # worker processes (default: cpu_count)
    threads=None,           # MMseqs2 threads
    chunksize=500,          # sequences per worker batch
    force_rebuild_db=False, # force re-build of germline AA databases
    output_fasta=None,      # optional path to write NT FASTA
    verbose=False,          # print progress
) -> list[abutils.Sequence]

`abverse.build_germline_aa_db(species, receptor, force_rebuild)`

Pre-builds (or validates the cache of) the germline amino acid databases used internally. Call this once on first install to populate ~/.abverse/germline_dbs/. Subsequent calls reuse the cache unless the source germline files change (SHA-256 invalidation).

How it works

Algorithm

1. MMseqs2 protein–protein search (all AA sequences vs. V germline AA DB)
   → best V assignment per sequence

2. Extract post-V region (aa_seq[v_qend+1:]) per sequence
   → MMseqs2 protein–protein search vs. J germline AA DB
   → best J assignment per sequence

3. Parallel reconstruction (ProcessPoolExecutor):
   • 5' overhang (before V alignment)  → most frequent human codon
   • V region                           → argmin_c[Hamming(c, germline_codon)] per position
   • CDR3 (V end → J start)            → most frequent human codon
   • J region                           → argmin_c[Hamming(c, germline_codon)] per position
   • 3' overhang (after J alignment)   → most frequent human codon

4. Validate: assert translate(output_nt) == input_aa for every sequence

Germline database cache

On first use, abverse translates abstar's nucleotide V/J germlines to amino acid FASTA files, builds MMseqs2 protein databases, and caches everything under ~/.abverse/germline_dbs/. The cache is automatically invalidated and rebuilt if abstar's germline files change (checked via SHA-256).

Frame detection for J genes uses the conserved WG.G (IGH) / FG.G (IGK/IGL) motif; a stop-free-frame fallback covers unusual alleles.

Performance

Benchmarked on a single CPU core with 10,000 BCR AA sequences:

Metric	Value
Throughput	~775 sequences/second/core
abstar calls in critical path	0
translate(output) == input guarantee	100% (validated per sequence)

No iterative abstar calls occur during reverse_translate — the algorithm is a single-pass pipeline.

Integration test results

Tested on 100 real human BCR sequences with known germline assignments:

Metric	Result	Threshold
V-gene family agreement	≥ 90%	90%
J-gene family agreement	≥ 80%	80%
Exact V-call match	75%	informational
Exact J-call match	91%	informational

The exact V-call rate of 75% reflects the fundamental ambiguity of assigning a specific allele from amino acid sequence alone (multiple alleles can share the same AA sequence). Gene-family agreement — the metric that matters for mutation analysis — passes comfortably.

Edge cases

Situation	Handling
No V assignment	Human codon frequency for all positions; `reconstruction_method='codon_frequency'`
No J assignment	Germline lookup for V region; fallback elsewhere
5′ / 3′ overhangs	Human codon frequency
Germline codon truncated at gene edge	Human codon frequency
Non-standard AA (X, B, Z)	`NNN`
Stop codon in input AA	`ValueError` with position and sequence ID
V/J alignment overlap	V takes priority; J starts after V end

Development

git clone https://github.com/bnemoz/abverse.git
cd abverse
pip install -e . --no-build-isolation
pip install pytest

# Run all tests (unit + integration + scaling benchmark)
python3 -m pytest abverse/tests/ -v

The test suite (59 tests) covers the codon lookup table, germline database building, per-sequence reconstruction with all edge cases, the end-to-end pipeline, integration with real BCR sequences, and a 10k-sequence throughput benchmark.

Package structure

abverse/
├── pyproject.toml
└── abverse/
    ├── __init__.py          # public API: reverse_translate, build_germline_aa_db
    ├── _codons.py           # 1280-entry optimal codon lookup table
    ├── _germline_db.py      # germline translation, MMseqs2 DB build, cache
    ├── _search.py           # V + J protein–protein search wrappers (Polars)
    ├── _reconstruct.py      # per-sequence NT reconstruction (pure, picklable)
    ├── _pipeline.py         # orchestration and parallel dispatch
    └── tests/
        ├── test_codons.py
        ├── test_germline_db.py
        ├── test_reconstruct.py
        ├── test_pipeline.py
        ├── test_integration.py
        └── test_scaling.py

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

May 21, 2026

This version

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abverse-0.1.0.tar.gz (25.5 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

abverse-0.1.0-py3-none-any.whl (27.4 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file abverse-0.1.0.tar.gz.

File metadata

Download URL: abverse-0.1.0.tar.gz
Upload date: May 20, 2026
Size: 25.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for abverse-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cbd7932a509e838c080b1277258037f11f978dd741ae1453a3b1171dbab93172`
MD5	`d665ed6942239c62e6836fc1e37c1911`
BLAKE2b-256	`1481a6179fcfa785592edd15c9fae32cea85fc67e1429ce8241b79227c4c64b1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for abverse-0.1.0.tar.gz:

Publisher: publish.yml on bnemoz/abverse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: abverse-0.1.0.tar.gz
- Subject digest: cbd7932a509e838c080b1277258037f11f978dd741ae1453a3b1171dbab93172
- Sigstore transparency entry: 1588273716
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: bnemoz/abverse@59cfc2cf1e4d779853a664e0a423333e046532d1
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/bnemoz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@59cfc2cf1e4d779853a664e0a423333e046532d1
- Trigger Event: release

File details

Details for the file abverse-0.1.0-py3-none-any.whl.

File metadata

Download URL: abverse-0.1.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 27.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for abverse-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bae6f03200d59f662159cd4f4143f4d01e8e9b8ee0e3f501db6a2c8ae52abb44`
MD5	`2bcae54b97e68e268362e2248547e227`
BLAKE2b-256	`c9d8d5fa110fcc27fe2342ab50a041764d8f7f63ee4cb856d5337fa1a688886b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for abverse-0.1.0-py3-none-any.whl:

Publisher: publish.yml on bnemoz/abverse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: abverse-0.1.0-py3-none-any.whl
- Subject digest: bae6f03200d59f662159cd4f4143f4d01e8e9b8ee0e3f501db6a2c8ae52abb44
- Sigstore transparency entry: 1588273867
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: bnemoz/abverse@59cfc2cf1e4d779853a664e0a423333e046532d1
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/bnemoz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@59cfc2cf1e4d779853a664e0a423333e046532d1
- Trigger Event: release

abverse 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

abverse

Why abverse?

Installation

Quick start

API

abverse.reverse_translate(sequences, ...)

abverse.build_germline_aa_db(species, receptor, force_rebuild)

How it works

Algorithm

Germline database cache

Performance

Integration test results

Edge cases

Development

Package structure

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`abverse.reverse_translate(sequences, ...)`

`abverse.build_germline_aa_db(species, receptor, force_rebuild)`