Skip to main content

Pure-Python port of the R/CRAN package scoper — spectral, hierarchical and identity clustering for B-cell clonal partitioning (Immcantation).

Project description

py-scoper

Pure-Python port of the R/CRAN package scoperSpectral Clustering for clOne Partitioning — for identifying B-cell clones from Adaptive Immune Receptor Repertoire sequencing (AIRR-Seq) data.

scoper is part of the Immcantation framework, developed by the Kleinstein Lab at Yale University. py-scoper re-implements its three clonal-partitioning methods faithfully in numpy / scipy / pandas / matplotlib — no rpy2, no R runtime required.

import pyscoper as sc

db = sc.load_example_db()                      # scoper's ExampleDb (2000 BCRs)

ident = sc.identicalClones(db)                 # clones by identical junction
hier  = sc.hierarchicalClones(db, threshold=0.15)
spect = sc.spectralClones(db, method="novj")   # scoper's signature method

print(spect["clone_id"].nunique(), "clones")

Each function returns the input AIRR data frame with an integer clone_id column.

Installation

pip install pyscoper

Optional extras pyalakazam / pyshazam supply amino-acid translation, germline-consensus and locus-inference helpers (pip install pyscoper[full]). py-scoper ships internal fallbacks, so they are not strictly required.

Methods

Function Description
identicalClones Clones share the same V gene, J gene, junction length and an identical junction sequence.
hierarchicalClones Junctions clustered by (length-normalized) nucleotide/AA Hamming distance within each V/J/length group; the dendrogram is cut at threshold. Linkages: single, average, complete.
spectralClones Adaptive-threshold spectral clustering — scoper's signature algorithm. method="novj" uses junction distance only; method="vj" additionally incorporates V-region somatic-hypermutation likelihoods.

With summarize_clones=True a ScoperClones object is returned, carrying the per-group summary table, the inter- vs intra-clone distance table, the effective clonal-distance threshold, and a plotCloneSummary plot.

res = sc.spectralClones(db, method="novj", summarize_clones=True)
res.summary()              # per V/J/length group statistics
res.eff_threshold          # effective distance threshold
sc.plotCloneSummary(res)   # clonal-distance distribution plot

R parity

py-scoper is validated against scoper 1.5.0 on its bundled ExampleDb (2000 heavy-chain BCR sequences). Agreement is measured by the Adjusted Rand Index (ARI) between the R and Python clonal partitions:

Method ARI vs scoper 1.5.0 Notes
identicalClones 1.000 deterministic — exact partition
hierarchicalClones 1.000 deterministic — exact partition
spectralClones (novj) ~0.98 spectral k-means is stochastic
spectralClones (vj) ~0.99 spectral k-means is stochastic
effective threshold exact (0.22)

The spectral method uses k-means with random initialisation, so an exact partition is not expected; the parity tests require ARI > 0.95 and the total clone count to be within a few percent of scoper's.

Run the parity suite (needs an R install with scoper):

pytest tests/ -q

Public API

identicalClones, hierarchicalClones, spectralClones, defineClonesScoper, ScoperClones, plotCloneSummary, calculate_inter_vs_intra, effective_threshold, fast_dist, pairwise_dist_dna, pairwise_dist_aa, pairwise_mut_matrix, count_invalid_bases, group_genes, pass_to_clustering, krnl_mtx_generator, laplacian_mtx, make_affinity, likelihoods, infer, find_gap_smooth, range_a_to_b, gaussian_kde_r, bw_nrd0, load_example_db.

License

AGPL-3, matching the original scoper package. See LICENSE. Credit to the Immcantation framework and the Kleinstein Lab (Yale University).

References

  • Nouri N, Kleinstein SH (2018). A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data. Bioinformatics, i341–i349.
  • Nouri N, Kleinstein SH (2020). Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data. PLOS Computational Biology, 16(6), e1007977.
  • Gupta NT, et al. (2017). Hierarchical clustering can identify B cell clones with high confidence in Ig repertoire sequencing data. The Journal of Immunology, 2489–2499.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyscoper-0.1.0.tar.gz (235.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyscoper-0.1.0-py3-none-any.whl (166.4 kB view details)

Uploaded Python 3

File details

Details for the file pyscoper-0.1.0.tar.gz.

File metadata

  • Download URL: pyscoper-0.1.0.tar.gz
  • Upload date:
  • Size: 235.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyscoper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6b434b01666c33467262343ecbf55743637592648940b7f7a34ab951add4019c
MD5 8d4e867d45b0e609feb227918f70332d
BLAKE2b-256 3e8e9106580a8a2e9c8704b23358b60dec7eefef6d9ea0f1b3b7cb6773113a8d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyscoper-0.1.0.tar.gz:

Publisher: publish.yml on omicverse/py-scoper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyscoper-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyscoper-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 166.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyscoper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 984eefbf2f5895defcbbba07d9b6e94c73245c1947641f3e2d0f83e47ac00f8f
MD5 b5234d8a9fb1791af153b76f7d2e7ca0
BLAKE2b-256 c8b26a3632db706c882549d1c8108de98164da60f5819ce6b10f47f16d0922da

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyscoper-0.1.0-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-scoper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page