Pure-Python port of the R/CRAN package alakazam -- the core package of the Immcantation framework for adaptive immune receptor repertoire (AIRR-seq) analysis.
Project description
pyalakazam
Pure-Python port of the R/CRAN package alakazam — the core package of the Immcantation framework for adaptive immune receptor repertoire (AIRR-seq) analysis (Gupta et al., Bioinformatics 2015).
pyalakazam re-implements the full computational API of alakazam in pure
Python (numpy / scipy / pandas / matplotlib) — no rpy2, no R, no external
PHYLIP binary. It operates on AIRR-format pandas.DataFrames with the
standard columns sequence_id, v_call, d_call, j_call, junction,
junction_aa, clone_id, … and aims for numerical parity with alakazam 1.4.3.
Installation
pip install pyalakazam
Or from source:
git clone https://github.com/omicverse/py-alakazam
cd py-alakazam && pip install .
Coverage
| Area | Functions |
|---|---|
| Gene annotation | getSegment, getAllele, getGene, getFamily, getLocus, getChain, countGenes, sortGenes |
| Diversity / abundance | alphaDiversity, rarefyDiversity, testDiversity, calcDiversity, calcCoverage, estimateAbundance, countClones |
| Amino-acid properties | aminoAcidProperties, gravy, bulk, polar, aliphatic, charge, isValidAASeq, countPatterns |
| Sequence utilities | translateDNA, maskSeqGaps, maskSeqEnds, padSeqEnds, extractVRegion, collapseDuplicates |
| Sequence distances | seqDist, seqEqual, pairwiseDist, pairwiseEqual, nonsquareDist, getDNAMatrix, getAAMatrix |
| Lineage | makeChangeoClone, buildPhylipLineage, graphToPhylo, phyloToGraph |
| Topology | getPathLengths, getMRCA, tableEdges, summarizeSubtrees, permuteLabels, testEdges, testMRCA |
| Plotting | plotGeneUsage, plotDiversityCurve, plotAbundanceCurve, plotLineageTree |
| Core / data | translateStrings, checkColumns, stoufferMeta, load_example_db, load_single_db, load_example_trees |
Quick-start
import pyalakazam as ak
# bundled example B-cell repertoire (alakazam ExampleDb)
db = ak.load_example_db()
# gene-usage frequencies
genes = ak.countGenes(db, gene="v_call", groups="sample_id", mode="family")
# clonal diversity (Hill numbers) with bootstrap CIs
div = ak.alphaDiversity(db, group="sample_id", min_q=0, max_q=4, nboot=200)
ak.plotDiversityCurve(div, legend_title="Sample")
# CDR3 physicochemical descriptors
aa = ak.aminoAcidProperties(db[["sequence_id", "junction"]], seq="junction")
# Ig lineage tree (maximum parsimony, no external PHYLIP)
clone = ak.makeChangeoClone(db[db.clone_id == 3138],
text_fields=["sample_id", "c_call"],
num_fields=["duplicate_count"])
tree = ak.buildPhylipLineage(clone)
ak.plotLineageTree(tree, label_field="c_call")
Numerical parity with R alakazam 1.4.3
tests/test_r_parity.py runs alakazam on its bundled ExampleDb /
ExampleTrees datasets and compares against pyalakazam:
- Deterministic functions — gene parsing,
countGenes,countClones,aminoAcidProperties,translateDNA, sequence distances,calcCoverageand the topology analyses (getPathLengths,summarizeSubtrees,tableEdges) — agree bit-exactly / to rel-diff < 1e-6. - Bootstrap functions —
estimateAbundanceandalphaDiversityuse multinomial resampling. R's and NumPy's RNGs differ, so the bootstrap realisations are not bit-identical; the point estimates agree to Pearson r > 0.999 and the confidence intervals to within a few percent.
The lineage caveat (no PHYLIP)
alakazam's buildPhylipLineage shells out to the external PHYLIP dnapars
binary. To stay self-contained, pyalakazam re-implements the maximum-parsimony
search in pure Python — greedy stepwise addition, NNI + SPR hill-climbing, and
a vectorised Fitch / Sankoff small-parsimony reconstruction. The germline is
rooted as the outgroup and zero-weight inferred parents are collapsed exactly
as alakazam does.
For the 49 trees in ExampleTrees, pyalakazam recovers an identical-length
tree for 31 trees; for the rest the topology and total tree length are very
close. Two effects explain the small differences: (1) PHYLIP allows ambiguous
characters (N) in inferred internal nodes, which can lower the recomputed
seqDist edge-weight sum below the strict-parsimony length, and (2) for very
large clones the heuristic search may settle in a near- rather than
globally-optimal topology. The trees produced are always valid maximum-
parsimony reconstructions.
License
AGPL-3, preserving the license of the original alakazam package. The original
alakazam is developed by the Kleinstein Lab (Yale University) as part of the
Immcantation framework. See LICENSE.
Citation
If you use this port, please cite the original alakazam publication:
Gupta NT, Vander Heiden JA, Uduman M, Gadala-Maria D, Yaari G, Kleinstein SH. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 2015.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyalakazam-0.1.1.tar.gz.
File metadata
- Download URL: pyalakazam-0.1.1.tar.gz
- Upload date:
- Size: 234.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8786b4e0d74d22df2da30ec7674e4ab15f0d2754a53dd68a0b546ec3d62b958f
|
|
| MD5 |
3db9fa3a671f33fdd19673f2e5520bcc
|
|
| BLAKE2b-256 |
e82d59fac22ef8f9ac951a6fbdf198c3a9573b85453e7c07c515d9db45e79342
|
Provenance
The following attestation bundles were made for pyalakazam-0.1.1.tar.gz:
Publisher:
publish.yml on omicverse/py-alakazam
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyalakazam-0.1.1.tar.gz -
Subject digest:
8786b4e0d74d22df2da30ec7674e4ab15f0d2754a53dd68a0b546ec3d62b958f - Sigstore transparency entry: 1591078644
- Sigstore integration time:
-
Permalink:
omicverse/py-alakazam@4a575b0cbdd82b5f43f97464987c280fdd40b95a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4a575b0cbdd82b5f43f97464987c280fdd40b95a -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pyalakazam-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pyalakazam-0.1.1-py3-none-any.whl
- Upload date:
- Size: 216.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2324154b76287eba3621527793c9e4cf6af872418d8c3668e13507d6a17a3bb
|
|
| MD5 |
73ced4238fdf52569957b5d84710341f
|
|
| BLAKE2b-256 |
12a524121c6db226ffc8d80cf3e44088270608ca0796ba356a8fc81f9f31386b
|
Provenance
The following attestation bundles were made for pyalakazam-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on omicverse/py-alakazam
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyalakazam-0.1.1-py3-none-any.whl -
Subject digest:
a2324154b76287eba3621527793c9e4cf6af872418d8c3668e13507d6a17a3bb - Sigstore transparency entry: 1591078648
- Sigstore integration time:
-
Permalink:
omicverse/py-alakazam@4a575b0cbdd82b5f43f97464987c280fdd40b95a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4a575b0cbdd82b5f43f97464987c280fdd40b95a -
Trigger Event:
workflow_dispatch
-
Statement type: