Skip to main content

Pure-Python port of the R/CRAN package alakazam -- the core package of the Immcantation framework for adaptive immune receptor repertoire (AIRR-seq) analysis.

Project description

pyalakazam

Pure-Python port of the R/CRAN package alakazam — the core package of the Immcantation framework for adaptive immune receptor repertoire (AIRR-seq) analysis (Gupta et al., Bioinformatics 2015).

pyalakazam re-implements the full computational API of alakazam in pure Python (numpy / scipy / pandas / matplotlib) — no rpy2, no R, no external PHYLIP binary. It operates on AIRR-format pandas.DataFrames with the standard columns sequence_id, v_call, d_call, j_call, junction, junction_aa, clone_id, … and aims for numerical parity with alakazam 1.4.3.

Installation

pip install pyalakazam

Or from source:

git clone https://github.com/omicverse/py-alakazam
cd py-alakazam && pip install .

Coverage

Area Functions
Gene annotation getSegment, getAllele, getGene, getFamily, getLocus, getChain, countGenes, sortGenes
Diversity / abundance alphaDiversity, rarefyDiversity, testDiversity, calcDiversity, calcCoverage, estimateAbundance, countClones
Amino-acid properties aminoAcidProperties, gravy, bulk, polar, aliphatic, charge, isValidAASeq, countPatterns
Sequence utilities translateDNA, maskSeqGaps, maskSeqEnds, padSeqEnds, extractVRegion, collapseDuplicates
Sequence distances seqDist, seqEqual, pairwiseDist, pairwiseEqual, nonsquareDist, getDNAMatrix, getAAMatrix
Lineage makeChangeoClone, buildPhylipLineage, graphToPhylo, phyloToGraph
Topology getPathLengths, getMRCA, tableEdges, summarizeSubtrees, permuteLabels, testEdges, testMRCA
Plotting plotGeneUsage, plotDiversityCurve, plotAbundanceCurve, plotLineageTree
Core / data translateStrings, checkColumns, stoufferMeta, load_example_db, load_single_db, load_example_trees

Quick-start

import pyalakazam as ak

# bundled example B-cell repertoire (alakazam ExampleDb)
db = ak.load_example_db()

# gene-usage frequencies
genes = ak.countGenes(db, gene="v_call", groups="sample_id", mode="family")

# clonal diversity (Hill numbers) with bootstrap CIs
div = ak.alphaDiversity(db, group="sample_id", min_q=0, max_q=4, nboot=200)
ak.plotDiversityCurve(div, legend_title="Sample")

# CDR3 physicochemical descriptors
aa = ak.aminoAcidProperties(db[["sequence_id", "junction"]], seq="junction")

# Ig lineage tree (maximum parsimony, no external PHYLIP)
clone = ak.makeChangeoClone(db[db.clone_id == 3138],
                            text_fields=["sample_id", "c_call"],
                            num_fields=["duplicate_count"])
tree = ak.buildPhylipLineage(clone)
ak.plotLineageTree(tree, label_field="c_call")

Numerical parity with R alakazam 1.4.3

tests/test_r_parity.py runs alakazam on its bundled ExampleDb / ExampleTrees datasets and compares against pyalakazam:

  • Deterministic functions — gene parsing, countGenes, countClones, aminoAcidProperties, translateDNA, sequence distances, calcCoverage and the topology analyses (getPathLengths, summarizeSubtrees, tableEdges) — agree bit-exactly / to rel-diff < 1e-6.
  • Bootstrap functionsestimateAbundance and alphaDiversity use multinomial resampling. R's and NumPy's RNGs differ, so the bootstrap realisations are not bit-identical; the point estimates agree to Pearson r > 0.999 and the confidence intervals to within a few percent.

The lineage caveat (no PHYLIP)

alakazam's buildPhylipLineage shells out to the external PHYLIP dnapars binary. To stay self-contained, pyalakazam re-implements the maximum-parsimony search in pure Python — greedy stepwise addition, NNI + SPR hill-climbing, and a vectorised Fitch / Sankoff small-parsimony reconstruction. The germline is rooted as the outgroup and zero-weight inferred parents are collapsed exactly as alakazam does.

For the 49 trees in ExampleTrees, pyalakazam recovers an identical-length tree for 31 trees; for the rest the topology and total tree length are very close. Two effects explain the small differences: (1) PHYLIP allows ambiguous characters (N) in inferred internal nodes, which can lower the recomputed seqDist edge-weight sum below the strict-parsimony length, and (2) for very large clones the heuristic search may settle in a near- rather than globally-optimal topology. The trees produced are always valid maximum- parsimony reconstructions.

License

AGPL-3, preserving the license of the original alakazam package. The original alakazam is developed by the Kleinstein Lab (Yale University) as part of the Immcantation framework. See LICENSE.

Citation

If you use this port, please cite the original alakazam publication:

Gupta NT, Vander Heiden JA, Uduman M, Gadala-Maria D, Yaari G, Kleinstein SH. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 2015.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyalakazam-0.1.1.tar.gz (234.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyalakazam-0.1.1-py3-none-any.whl (216.6 kB view details)

Uploaded Python 3

File details

Details for the file pyalakazam-0.1.1.tar.gz.

File metadata

  • Download URL: pyalakazam-0.1.1.tar.gz
  • Upload date:
  • Size: 234.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyalakazam-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8786b4e0d74d22df2da30ec7674e4ab15f0d2754a53dd68a0b546ec3d62b958f
MD5 3db9fa3a671f33fdd19673f2e5520bcc
BLAKE2b-256 e82d59fac22ef8f9ac951a6fbdf198c3a9573b85453e7c07c515d9db45e79342

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyalakazam-0.1.1.tar.gz:

Publisher: publish.yml on omicverse/py-alakazam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyalakazam-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyalakazam-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 216.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyalakazam-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a2324154b76287eba3621527793c9e4cf6af872418d8c3668e13507d6a17a3bb
MD5 73ced4238fdf52569957b5d84710341f
BLAKE2b-256 12a524121c6db226ffc8d80cf3e44088270608ca0796ba356a8fc81f9f31386b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyalakazam-0.1.1-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-alakazam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page