Pure-Python port of the R package immunarch — bulk and single-cell immune-repertoire (TCR/BCR AIRR-seq) analytics.
Project description
py-immunarch
Pure-Python port of the R/CRAN package immunarch — bulk and single-cell immune-repertoire (TCR/BCR AIRR-seq) analytics, by Vadim I. Nazarov and the ImmunoMind team.
pyimmunarch is a standalone, dependency-light re-implementation of
immunarch's repertoire-analysis core: data import, exploratory
statistics, clonality, diversity estimators, repertoire overlap, gene
usage, public clonotypes, clonotype tracking, k-mer analysis, filtering
and the standard repertoire plots. It does not require R or rpy2.
| PyPI / import name | pyimmunarch |
| License | Apache 2.0 (same as upstream immunarch) |
| Upstream | immunarch 0.10.3 |
| Numerical parity | bit-exact vs immunarch (max rel diff ~1e-13) |
Install
pip install pyimmunarch # once published
# or, from a checkout:
pip install -e .
Dependencies: numpy, scipy, pandas, matplotlib, scikit-learn.
Data model
pyimmunarch follows immunarch's data model: an immune dataset is a list
of per-sample repertoire DataFrames plus a sample-metadata table. The
container class is ImmunData (mirroring R's immdata list, with
.data and .meta). Repertoire columns use the immunarch standard
(Clones, Proportion, CDR3.nt, CDR3.aa, V.name, D.name,
J.name, ...).
Quick start
import pyimmunarch as pim
# 1. load repertoire files (AIRR / immunarch / MiXCR / VDJtools / 10x,
# auto-detected) or the bundled example dataset
imm = pim.repLoad("my_repertoires/") # directory with metadata.txt
imm = pim.load_example_immdata() # bundled example TCR cohort
# 2. exploratory statistics
pim.repExplore(imm, method="volume") # unique clonotypes per sample
pim.repExplore(imm, method="len", col="aa") # CDR3 length distribution
# 3. clonal-space analysis
pim.repClonality(imm, method="homeo") # clonal space homeostasis
pim.repClonality(imm, method="top") # top-N clonal proportion
# 4. diversity estimators
pim.repDiversity(imm, method="chao1") # Chao1 richness
pim.repDiversity(imm, method="hill") # Hill numbers
pim.repDiversity(imm, method="div", q=5) # true diversity
pim.repDiversity(imm, method="gini") # Gini coefficient
pim.repDiversity(imm, method="raref") # rarefaction curve
# 5. repertoire overlap
ov = pim.repOverlap(imm, method="jaccard") # public/overlap/jaccard/...
pim.repOverlapAnalysis(ov, method="mds+hclust")
# 6. gene usage
gu = pim.geneUsage(imm, gene="hs.trbv", norm=True)
pim.geneUsageAnalysis(gu, method="js")
# 7. public clonotypes & tracking
pr = pim.pubRep(imm, col="aa+v")
tc = pim.trackClonotypes(imm, which=(1, 15), col="aa")
# 8. k-mers, filtering, plotting
km = pim.getKmers(imm[0], 3)
sub = pim.repFilter(imm, "by.meta", {"Status": pim.include("MS")})
pim.vis_overlap_heatmap(ov)
Ported API
| immunarch family | pyimmunarch |
|---|---|
| I/O | repLoad, load_example_immdata, ImmunData, IMMCOL |
| Exploratory | repExplore (volume / count / len / clones) |
| Clonality | repClonality (clonal.prop / homeo / top / rare) |
| Diversity | repDiversity (chao1, hill, div, gini.simp, inv.simp, gini, d50, dxx, raref) |
| Overlap | repOverlap (public / overlap / jaccard / tversky / cosine / morisita), repOverlapAnalysis |
| Gene usage | geneUsage, geneUsageAnalysis |
| Public clonotypes | pubRep, public_matrix, pubRepStatistics, pubRepFilter, pubRepApply |
| Dynamics | trackClonotypes |
| K-mers | getKmers, split_to_kmers, kmer_profile, spectratype |
| Filtering | repFilter, include, exclude, lessthan, morethan, interval |
| Information theory | entropy, kl_div, js_div, cross_entropy |
| CDR3 analysis | cdr3_aa_profile |
| Preprocessing | coding, noncoding, inframes, outofframes, top, bunch_translate |
| Plotting | vis_diversity, vis_overlap_heatmap, vis_gene_usage, vis_clonal_space, vis_tracking, vis_spectratype, vis_explore |
R parity
pyimmunarch is validated against immunarch 0.10.3 on immunarch's own
bundled immdata example dataset (a 12-sample TCR cohort). The diversity,
overlap and gene-usage numbers are deterministic closed-form formulas, so
agreement is bit-exact (maximum relative difference ~1e-13). The
test suite (tests/test_r_parity.py) re-runs immunarch via Rscript and
asserts rel-diff < 1e-6 for every function family; it skips gracefully
when R / immunarch is unavailable.
On the example dataset the Python pipeline runs ~60x faster than the R
pipeline (which includes Rscript startup).
python examples/benchmark.py --runs 3
pytest tests/ -q
See examples/compare_R_vs_Python.ipynb for a full R-vs-Python
comparison (timing, accuracy table, scatter plots, diversity bar and
overlap heatmap).
License
Apache License 2.0 — the same license as the upstream immunarch package.
See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyimmunarch-0.1.0.tar.gz.
File metadata
- Download URL: pyimmunarch-0.1.0.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f13f3ad5c25e07e81c0c57b374595fcc6bbc54d0662f6895f1500976fb7b864f
|
|
| MD5 |
d12851fdb345c6b2745bc42893707501
|
|
| BLAKE2b-256 |
9e77f8949913ed3d91865ecc349960d057ed2af499cef8e7949f011f95d38ef7
|
Provenance
The following attestation bundles were made for pyimmunarch-0.1.0.tar.gz:
Publisher:
publish.yml on omicverse/py-immunarch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyimmunarch-0.1.0.tar.gz -
Subject digest:
f13f3ad5c25e07e81c0c57b374595fcc6bbc54d0662f6895f1500976fb7b864f - Sigstore transparency entry: 1590996248
- Sigstore integration time:
-
Permalink:
omicverse/py-immunarch@74303030ee77668bb66244465f8a024e7667aaae -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74303030ee77668bb66244465f8a024e7667aaae -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pyimmunarch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyimmunarch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22a4d4120e6a8a1b63622aa99937dc4751b033184930047be15e9470dc800b2a
|
|
| MD5 |
96614378d3e49d491df8f63aee2ddcdf
|
|
| BLAKE2b-256 |
53f0731cb0ccd019ef6ad939d337efc20ab431ecb542f33f18ba5ae992f3cfe5
|
Provenance
The following attestation bundles were made for pyimmunarch-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on omicverse/py-immunarch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyimmunarch-0.1.0-py3-none-any.whl -
Subject digest:
22a4d4120e6a8a1b63622aa99937dc4751b033184930047be15e9470dc800b2a - Sigstore transparency entry: 1590996306
- Sigstore integration time:
-
Permalink:
omicverse/py-immunarch@74303030ee77668bb66244465f8a024e7667aaae -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74303030ee77668bb66244465f8a024e7667aaae -
Trigger Event:
workflow_dispatch
-
Statement type: