Skip to main content

Pure-Python port of the R package immunarch — bulk and single-cell immune-repertoire (TCR/BCR AIRR-seq) analytics.

Project description

py-immunarch

Pure-Python port of the R/CRAN package immunarch — bulk and single-cell immune-repertoire (TCR/BCR AIRR-seq) analytics, by Vadim I. Nazarov and the ImmunoMind team.

pyimmunarch is a standalone, dependency-light re-implementation of immunarch's repertoire-analysis core: data import, exploratory statistics, clonality, diversity estimators, repertoire overlap, gene usage, public clonotypes, clonotype tracking, k-mer analysis, filtering and the standard repertoire plots. It does not require R or rpy2.

PyPI / import name pyimmunarch
License Apache 2.0 (same as upstream immunarch)
Upstream immunarch 0.10.3
Numerical parity bit-exact vs immunarch (max rel diff ~1e-13)

Install

pip install pyimmunarch          # once published
# or, from a checkout:
pip install -e .

Dependencies: numpy, scipy, pandas, matplotlib, scikit-learn.

Data model

pyimmunarch follows immunarch's data model: an immune dataset is a list of per-sample repertoire DataFrames plus a sample-metadata table. The container class is ImmunData (mirroring R's immdata list, with .data and .meta). Repertoire columns use the immunarch standard (Clones, Proportion, CDR3.nt, CDR3.aa, V.name, D.name, J.name, ...).

Quick start

import pyimmunarch as pim

# 1. load repertoire files (AIRR / immunarch / MiXCR / VDJtools / 10x,
#    auto-detected) or the bundled example dataset
imm = pim.repLoad("my_repertoires/")        # directory with metadata.txt
imm = pim.load_example_immdata()            # bundled example TCR cohort

# 2. exploratory statistics
pim.repExplore(imm, method="volume")        # unique clonotypes per sample
pim.repExplore(imm, method="len", col="aa") # CDR3 length distribution

# 3. clonal-space analysis
pim.repClonality(imm, method="homeo")       # clonal space homeostasis
pim.repClonality(imm, method="top")         # top-N clonal proportion

# 4. diversity estimators
pim.repDiversity(imm, method="chao1")       # Chao1 richness
pim.repDiversity(imm, method="hill")        # Hill numbers
pim.repDiversity(imm, method="div", q=5)    # true diversity
pim.repDiversity(imm, method="gini")        # Gini coefficient
pim.repDiversity(imm, method="raref")       # rarefaction curve

# 5. repertoire overlap
ov = pim.repOverlap(imm, method="jaccard")  # public/overlap/jaccard/...
pim.repOverlapAnalysis(ov, method="mds+hclust")

# 6. gene usage
gu = pim.geneUsage(imm, gene="hs.trbv", norm=True)
pim.geneUsageAnalysis(gu, method="js")

# 7. public clonotypes & tracking
pr = pim.pubRep(imm, col="aa+v")
tc = pim.trackClonotypes(imm, which=(1, 15), col="aa")

# 8. k-mers, filtering, plotting
km  = pim.getKmers(imm[0], 3)
sub = pim.repFilter(imm, "by.meta", {"Status": pim.include("MS")})
pim.vis_overlap_heatmap(ov)

Ported API

immunarch family pyimmunarch
I/O repLoad, load_example_immdata, ImmunData, IMMCOL
Exploratory repExplore (volume / count / len / clones)
Clonality repClonality (clonal.prop / homeo / top / rare)
Diversity repDiversity (chao1, hill, div, gini.simp, inv.simp, gini, d50, dxx, raref)
Overlap repOverlap (public / overlap / jaccard / tversky / cosine / morisita), repOverlapAnalysis
Gene usage geneUsage, geneUsageAnalysis
Public clonotypes pubRep, public_matrix, pubRepStatistics, pubRepFilter, pubRepApply
Dynamics trackClonotypes
K-mers getKmers, split_to_kmers, kmer_profile, spectratype
Filtering repFilter, include, exclude, lessthan, morethan, interval
Information theory entropy, kl_div, js_div, cross_entropy
CDR3 analysis cdr3_aa_profile
Preprocessing coding, noncoding, inframes, outofframes, top, bunch_translate
Plotting vis_diversity, vis_overlap_heatmap, vis_gene_usage, vis_clonal_space, vis_tracking, vis_spectratype, vis_explore

R parity

pyimmunarch is validated against immunarch 0.10.3 on immunarch's own bundled immdata example dataset (a 12-sample TCR cohort). The diversity, overlap and gene-usage numbers are deterministic closed-form formulas, so agreement is bit-exact (maximum relative difference ~1e-13). The test suite (tests/test_r_parity.py) re-runs immunarch via Rscript and asserts rel-diff < 1e-6 for every function family; it skips gracefully when R / immunarch is unavailable.

On the example dataset the Python pipeline runs ~60x faster than the R pipeline (which includes Rscript startup).

python examples/benchmark.py --runs 3
pytest tests/ -q

See examples/compare_R_vs_Python.ipynb for a full R-vs-Python comparison (timing, accuracy table, scatter plots, diversity bar and overlap heatmap).

License

Apache License 2.0 — the same license as the upstream immunarch package. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyimmunarch-0.1.1.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyimmunarch-0.1.1-py3-none-any.whl (2.8 MB view details)

Uploaded Python 3

File details

Details for the file pyimmunarch-0.1.1.tar.gz.

File metadata

  • Download URL: pyimmunarch-0.1.1.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyimmunarch-0.1.1.tar.gz
Algorithm Hash digest
SHA256 14412646f33fca20b25fa16ab2b2d14db78e7a77e6c4ed9590454eea9035c941
MD5 b6f2bf7d932507a01b7b586891c716a7
BLAKE2b-256 527d2e02a86714166706ae4ec3dd42c6e8311b1ec450dff15e8d95f82df47d63

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyimmunarch-0.1.1.tar.gz:

Publisher: publish.yml on omicverse/py-immunarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyimmunarch-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyimmunarch-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyimmunarch-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 303f1585cd06f02cbd359309f4b7745a6feab83b4acf070166db3529f438d866
MD5 d2c247b95314be7c5b3e19bccea894ae
BLAKE2b-256 a131ff693deff6833c7464e0a5b7402812dfe288dea79046b4a14f2c3710ad9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyimmunarch-0.1.1-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-immunarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page