Skip to main content

Pure-Python port of the R package immunarch — bulk and single-cell immune-repertoire (TCR/BCR AIRR-seq) analytics.

Project description

py-immunarch

Pure-Python port of the R/CRAN package immunarch — bulk and single-cell immune-repertoire (TCR/BCR AIRR-seq) analytics, by Vadim I. Nazarov and the ImmunoMind team.

pyimmunarch is a standalone, dependency-light re-implementation of immunarch's repertoire-analysis core: data import, exploratory statistics, clonality, diversity estimators, repertoire overlap, gene usage, public clonotypes, clonotype tracking, k-mer analysis, filtering and the standard repertoire plots. It does not require R or rpy2.

PyPI / import name pyimmunarch
License Apache 2.0 (same as upstream immunarch)
Upstream immunarch 0.10.3
Numerical parity bit-exact vs immunarch (max rel diff ~1e-13)

Install

pip install pyimmunarch          # once published
# or, from a checkout:
pip install -e .

Dependencies: numpy, scipy, pandas, matplotlib, scikit-learn.

Data model

pyimmunarch follows immunarch's data model: an immune dataset is a list of per-sample repertoire DataFrames plus a sample-metadata table. The container class is ImmunData (mirroring R's immdata list, with .data and .meta). Repertoire columns use the immunarch standard (Clones, Proportion, CDR3.nt, CDR3.aa, V.name, D.name, J.name, ...).

Quick start

import pyimmunarch as pim

# 1. load repertoire files (AIRR / immunarch / MiXCR / VDJtools / 10x,
#    auto-detected) or the bundled example dataset
imm = pim.repLoad("my_repertoires/")        # directory with metadata.txt
imm = pim.load_example_immdata()            # bundled example TCR cohort

# 2. exploratory statistics
pim.repExplore(imm, method="volume")        # unique clonotypes per sample
pim.repExplore(imm, method="len", col="aa") # CDR3 length distribution

# 3. clonal-space analysis
pim.repClonality(imm, method="homeo")       # clonal space homeostasis
pim.repClonality(imm, method="top")         # top-N clonal proportion

# 4. diversity estimators
pim.repDiversity(imm, method="chao1")       # Chao1 richness
pim.repDiversity(imm, method="hill")        # Hill numbers
pim.repDiversity(imm, method="div", q=5)    # true diversity
pim.repDiversity(imm, method="gini")        # Gini coefficient
pim.repDiversity(imm, method="raref")       # rarefaction curve

# 5. repertoire overlap
ov = pim.repOverlap(imm, method="jaccard")  # public/overlap/jaccard/...
pim.repOverlapAnalysis(ov, method="mds+hclust")

# 6. gene usage
gu = pim.geneUsage(imm, gene="hs.trbv", norm=True)
pim.geneUsageAnalysis(gu, method="js")

# 7. public clonotypes & tracking
pr = pim.pubRep(imm, col="aa+v")
tc = pim.trackClonotypes(imm, which=(1, 15), col="aa")

# 8. k-mers, filtering, plotting
km  = pim.getKmers(imm[0], 3)
sub = pim.repFilter(imm, "by.meta", {"Status": pim.include("MS")})
pim.vis_overlap_heatmap(ov)

Ported API

immunarch family pyimmunarch
I/O repLoad, load_example_immdata, ImmunData, IMMCOL
Exploratory repExplore (volume / count / len / clones)
Clonality repClonality (clonal.prop / homeo / top / rare)
Diversity repDiversity (chao1, hill, div, gini.simp, inv.simp, gini, d50, dxx, raref)
Overlap repOverlap (public / overlap / jaccard / tversky / cosine / morisita), repOverlapAnalysis
Gene usage geneUsage, geneUsageAnalysis
Public clonotypes pubRep, public_matrix, pubRepStatistics, pubRepFilter, pubRepApply
Dynamics trackClonotypes
K-mers getKmers, split_to_kmers, kmer_profile, spectratype
Filtering repFilter, include, exclude, lessthan, morethan, interval
Information theory entropy, kl_div, js_div, cross_entropy
CDR3 analysis cdr3_aa_profile
Preprocessing coding, noncoding, inframes, outofframes, top, bunch_translate
Plotting vis_diversity, vis_overlap_heatmap, vis_gene_usage, vis_clonal_space, vis_tracking, vis_spectratype, vis_explore

R parity

pyimmunarch is validated against immunarch 0.10.3 on immunarch's own bundled immdata example dataset (a 12-sample TCR cohort). The diversity, overlap and gene-usage numbers are deterministic closed-form formulas, so agreement is bit-exact (maximum relative difference ~1e-13). The test suite (tests/test_r_parity.py) re-runs immunarch via Rscript and asserts rel-diff < 1e-6 for every function family; it skips gracefully when R / immunarch is unavailable.

On the example dataset the Python pipeline runs ~60x faster than the R pipeline (which includes Rscript startup).

python examples/benchmark.py --runs 3
pytest tests/ -q

See examples/compare_R_vs_Python.ipynb for a full R-vs-Python comparison (timing, accuracy table, scatter plots, diversity bar and overlap heatmap).

License

Apache License 2.0 — the same license as the upstream immunarch package. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyimmunarch-0.1.0.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyimmunarch-0.1.0-py3-none-any.whl (2.4 MB view details)

Uploaded Python 3

File details

Details for the file pyimmunarch-0.1.0.tar.gz.

File metadata

  • Download URL: pyimmunarch-0.1.0.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyimmunarch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f13f3ad5c25e07e81c0c57b374595fcc6bbc54d0662f6895f1500976fb7b864f
MD5 d12851fdb345c6b2745bc42893707501
BLAKE2b-256 9e77f8949913ed3d91865ecc349960d057ed2af499cef8e7949f011f95d38ef7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyimmunarch-0.1.0.tar.gz:

Publisher: publish.yml on omicverse/py-immunarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyimmunarch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyimmunarch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyimmunarch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22a4d4120e6a8a1b63622aa99937dc4751b033184930047be15e9470dc800b2a
MD5 96614378d3e49d491df8f63aee2ddcdf
BLAKE2b-256 53f0731cb0ccd019ef6ad939d337efc20ab431ecb542f33f18ba5ae992f3cfe5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyimmunarch-0.1.0-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-immunarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page