Skip to main content

Flexible and fast antibody sequence annotation in Rust with python bindings

Project description

Immunum Logo

Immunum is a high-performance antibody and TCR sequence numbering tool for Rust, Python, Polars and JS/TS.

Try it in your browser: interactive demo.

Crates.io PyPI npm License: MIT CI Docs

Overview

immunum is a library for numbering antibody and T-cell receptor (TCR) variable domain sequences. It uses Needleman-Wunsch semi-global alignment against position-specific scoring matrices built from consensus sequences, with BLOSUM62-based substitution scores.

Available as:

  • Rust crate — core library and CLI
  • Python package — with a Polars plugin for vectorized batch processing
  • npm package — for Node.js and browsers

Supported chains

Antibody TCR
IGH (heavy) TRA (alpha)
IGK (kappa) TRB (beta)
IGL (lambda) TRD (delta)
TRG (gamma)

Chain codes: H (IGH), K (IGK), L (IGL), A (TRA), B (TRB), D (TRD), G (TRG).

Chain type is automatically detected by aligning against all loaded chains and selecting the best match.

Numbering schemes

  • IMGT — all 7 chain types
  • Kabat — antibody chains (IGH, IGK, IGL)

Table of Contents

Python

Installation

pip install immunum

Numbering

from immunum import Annotator

annotator = Annotator(chains=["H", "K", "L"], scheme="imgt")

sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"

result = annotator.number(sequence)
print(result.chain)       # H
print(result.confidence)  # 0.78
print(result.numbering)   # {"1": "Q", "2": "V", "3": "Q", ...}

Segmentation

segment splits the sequence into FR/CDR regions:

from immunum import Annotator

annotator = Annotator(chains=["H", "K", "L"], scheme="imgt")

sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS"

result = annotator.segment(sequence)
assert result.fr1 == 'QVQLVQSGAEVKRPGSSVTVSCKAS'
assert result.cdr1 == 'GGSFSTYA'
assert result.fr2 == 'LSWVRQAPGRGLEWMGG'
assert result.cdr2 == 'VIPLLTIT'
assert result.fr3 == 'NYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYC'
assert result.cdr3 == 'AREGTTGKPIGAFAH'
assert result.fr4 == 'WGQGTLVTVSS'

Polars plugin

For batch processing, immunum.polars registers elementwise Polars expressions:

import polars as pl
import immunum.polars as imp

df = pl.DataFrame({"sequence": [
    "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS",
    "DIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIK",
]})

# Add a struct column with chain, scheme, confidence, numbering
result = df.with_columns(
    imp.number(pl.col("sequence"), chains=["H", "K", "L"], scheme="imgt").alias("numbered")
)

# Add a struct column with FR/CDR segments
result = df.with_columns(
    imp.segment(pl.col("sequence"), chains=["H", "K", "L"], scheme="imgt").alias("segmented")
)

The number expression returns a struct with fields chain, scheme, confidence, and numbering (a struct of position→residue). The segment expression returns a struct with fields fr1, cdr1, fr2, cdr2, fr3, cdr3, fr4, prefix, postfix.

JavaScript / npm

Installation

npm install immunum

Usage

const { Annotator } = require("immunum");

const annotator = new Annotator(["H", "K", "L"], "imgt");

const sequence =
  "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS";

const result = annotator.number(sequence);
console.log(result.chain);      // "H"
console.log(result.confidence); // 0.97
console.log(result.numbering);  // { "1": "Q", "2": "V", ... }

const segments = annotator.segment(sequence);
console.log(segments.cdr3); // "AREGTTGKPIGAFAH"

annotator.free(); // or use `using annotator = new Annotator(...)` with explicit resource management

Rust

Installation

Add to Cargo.toml:

[dependencies]
immunum = "0.9"

Usage

use immunum::{Annotator, Chain, Scheme};

let annotator = Annotator::new(
    &[Chain::IGH, Chain::IGK, Chain::IGL],
    Scheme::IMGT,
    None, // uses default min_confidence of 0.5
).unwrap();

let sequence = "QVQLVQSGAEVKRPGSSVTVSCKASGGSFSTYALSWVRQAPGRGLEWMGGVIPLLTITNYAPRFQGRITITADRSTSTAYLELNSLRPEDTAVYYCAREGTTGKPIGAFAHWGQGTLVTVSS";

let result = annotator.number(sequence).unwrap();
println!("Chain: {}", result.chain);        // IGH
println!("Confidence: {:.2}", result.confidence);
for (aa, pos) in sequence.chars().zip(result.positions.iter()) {
    println!("{} -> {}", aa, pos);
}

let segments = annotator.segment(sequence).unwrap();
println!("CDR3: {}", segments.cdr3);

CLI

immunum number [OPTIONS] [INPUT] [OUTPUT]

Options

Flag Description Default
-s, --scheme Numbering scheme: imgt (i), kabat (k) imgt
-c, --chain Chain filter: h,k,l,a,b,g,d or groups: ig, tcr, all. Accepts any form (h, heavy, igh), case-insensitive. ig
-f, --format Output format: tsv, json, jsonl tsv

Input

Accepts a raw sequence, a FASTA file, or stdin (auto-detected):

immunum number EVQLVESGGGLVKPGGSLKLSCAASGFTFSSYAMS
immunum number sequences.fasta
cat sequences.fasta | immunum number
immunum number - < sequences.fasta

Output

Writes to stdout by default, or to a file if a second positional argument is given:

immunum number sequences.fasta results.tsv
immunum number -f json sequences.fasta results.json

Examples

# Kabat scheme, JSON output
immunum number -s kabat -f json EVQLVESGGGLVKPGGSLKLSCAASGFTFSSYAMS

# All chains (antibody + TCR), JSONL output
immunum number -c all -f jsonl sequences.fasta

# TCR sequences only, save to file
immunum number -c tcr tcr_sequences.fasta output.tsv

# Extract sequences from a TSV column and pipe in (see fixtures/ig.tsv)
tail -n +2 fixtures/ig.tsv | cut -f2 | immunum number
awk -F'\t' 'NR==1{for(i=1;i<=NF;i++) if($i=="sequence") c=i} NR>1{print $c}' fixtures/ig.tsv | immunum number

# Filter TSV output to CDR3 positions (111-128 in IMGT)
immunum number sequences.fasta | awk -F'\t' '$4 >= 111 && $4 <= 128'

# Filter to heavy chain results only
immunum number -c all sequences.fasta | awk -F'\t' 'NR==1 || $2=="H"'

# Extract CDR3 sequences with jq
immunum number -f json sequences.fasta | jq '[.[] | {id: .sequence_id, numbering}]'

Development

To orchestrate a project between cargo and python, we use task. You can install it with:

uv tool install go-task-bin

And then run task or task --list-all to get the full list of available tasks.

By default, dev profile will be used in all but benchmark-* tasks, but you can change it via providing PROFILE=release to your task.

Also, by default, task caches results, but you can ignore it by running task my-task -f.

Building local environment

# build a dev environment
task build-local

# build a dev environment with --release flag
task build-local PROFILE=release

Testing

task test-rust    # test only rust code
task test-python  # test only python code
task test         # test all code

Linting

task format  # formats python and rust code
task lint    # runs linting for python and rust

Benchmarking

There are multiple benchmarks in the repository. For full list, see task | grep benchmark:

$ task | grep benchmark
* benchmark-accuracy:           Accuracy benchmark across all fixtures (1k sequences, 7 rounds each)
* benchmark-cli:                Benchmark correctness of the CLI tool
* benchmark-comparison:         Speed + correctness benchmark: immunum vs antpack vs anarci (1k IGH sequences)
* benchmark-scaling:            Scaling benchmark: sizes 100..10M (10x steps), 1 round, H/imgt. Pass CLI_ARGS to filter tools, e.g. -- --tools immunum
* benchmark-speed:              Speed benchmark across dataset sizes (100 to 1M sequences, 7 rounds, H/imgt)
* benchmark-speed-polars:       Speed benchmark for immunum polars across all chain/scheme fixtures

Project structure

src/
├── main.rs          # CLI binary (immunum number ...)
├── lib.rs           # Public API
├── annotator.rs     # Sequence annotation and chain detection
├── alignment.rs     # Needleman-Wunsch semi-global alignment
├── io.rs            # Input parsing (FASTA, raw) and output formatting (TSV, JSON, JSONL)
├── numbering.rs     # Numbering module entry point
├── numbering/
│   ├── imgt.rs      # IMGT numbering rules
│   └── kabat.rs     # Kabat numbering rules
├── scoring.rs       # PSSM and scoring matrices
├── types.rs         # Core domain types (Chain, Scheme, Position)
├── validation.rs    # Validation utilities
├── error.rs         # Error types
└── bin/
    ├── benchmark.rs       # Validation metrics report
    ├── debug_validation.rs # Alignment mismatch visualization
    └── speed_benchmark.rs  # Performance benchmarks
resources/
└── consensus/       # Consensus sequence CSVs (compiled into scoring matrices)
fixtures/
├── validation/      # ANARCI-numbered reference datasets
├── ig.fasta         # Example antibody sequences
└── ig.tsv           # Example TSV input
scripts/             # Python tooling for generating consensus data
immunum/
├── _internal.pyi    # python stub file for pyo3
├── polars.py        # polars extension module
└── python.py        # python module

Design decisions

  • Semi-global alignment forces full query consumption, preventing long CDR3 regions from being treated as trailing gaps.
  • Anchor positions at highly conserved FR residues receive 3× gap penalties to stabilize alignment.
  • FR regions use alignment-based numbering; CDR regions use scheme-specific insertion rules.
  • Scoring matrices are generated at compile time from consensus data via build.rs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

immunum-1.1.0.tar.gz (88.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

immunum-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

immunum-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.6 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

immunum-1.1.0-cp39-abi3-macosx_11_0_arm64.whl (4.4 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file immunum-1.1.0.tar.gz.

File metadata

  • Download URL: immunum-1.1.0.tar.gz
  • Upload date:
  • Size: 88.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for immunum-1.1.0.tar.gz
Algorithm Hash digest
SHA256 6579fa43609d2ce26f2dc0d8ed2d4b685e5fed582c44ae0637f7b04a89b5adf9
MD5 cbc4f4de037f3db0e0dcbd6decc7c954
BLAKE2b-256 25a6e19062ef23e799e40e05689b877c02cade42d7eda960abd4bf6295bee413

See more details on using hashes here.

File details

Details for the file immunum-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

  • Download URL: immunum-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for immunum-1.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8c21b8fec9ab06685309cc4844057d05aedefe6141c41c4256a89aeca863f803
MD5 5a31e100cb03e93ba4735b470af684c5
BLAKE2b-256 ac6effefd9339fdd7c7cf80dfd50cf18ce9ac8a2d0340580972c4446314b507a

See more details on using hashes here.

File details

Details for the file immunum-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

  • Download URL: immunum-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
  • Upload date:
  • Size: 4.6 MB
  • Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for immunum-1.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1dcb61637782fb6fea7a3c2b72ff5d51de626537b0d6c482cc2ad6041acdf067
MD5 7e8ddaaf6d6bffaedf544b6299560848
BLAKE2b-256 2b65a41c1d8d5c2e1bf9566b5aae5bfd870bfc178d426958cc83632049dbdadd

See more details on using hashes here.

File details

Details for the file immunum-1.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: immunum-1.1.0-cp39-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 4.4 MB
  • Tags: CPython 3.9+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for immunum-1.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cee52731a6204302d8277412a76d780407f57a871200132d3a80897c8f73b467
MD5 88d68980260063bcb57f916877a46042
BLAKE2b-256 23dc04e2389640f7a5845d22fae87c996f855900c93e8d86e632fc667db161f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page