Pure-Python port of the R/CRAN package scoper — spectral, hierarchical and identity clustering for B-cell clonal partitioning (Immcantation).
Project description
py-scoper
Pure-Python port of the R/CRAN package scoper
— Spectral Clustering for clOne Partitioning — for identifying B-cell clones
from Adaptive Immune Receptor Repertoire sequencing (AIRR-Seq) data.
scoper is part of the Immcantation
framework, developed by the Kleinstein Lab at Yale University. py-scoper
re-implements its three clonal-partitioning methods faithfully in
numpy / scipy / pandas / matplotlib — no rpy2, no R runtime required.
import pyscoper as sc
db = sc.load_example_db() # scoper's ExampleDb (2000 BCRs)
ident = sc.identicalClones(db) # clones by identical junction
hier = sc.hierarchicalClones(db, threshold=0.15)
spect = sc.spectralClones(db, method="novj") # scoper's signature method
print(spect["clone_id"].nunique(), "clones")
Each function returns the input AIRR data frame with an integer clone_id
column.
Installation
pip install pyscoper
Optional extras pyalakazam / pyshazam supply amino-acid translation,
germline-consensus and locus-inference helpers (pip install pyscoper[full]).
py-scoper ships internal fallbacks, so they are not strictly required.
Methods
| Function | Description |
|---|---|
identicalClones |
Clones share the same V gene, J gene, junction length and an identical junction sequence. |
hierarchicalClones |
Junctions clustered by (length-normalized) nucleotide/AA Hamming distance within each V/J/length group; the dendrogram is cut at threshold. Linkages: single, average, complete. |
spectralClones |
Adaptive-threshold spectral clustering — scoper's signature algorithm. method="novj" uses junction distance only; method="vj" additionally incorporates V-region somatic-hypermutation likelihoods. |
With summarize_clones=True a ScoperClones object is returned, carrying the
per-group summary table, the inter- vs intra-clone distance table, the
effective clonal-distance threshold, and a plotCloneSummary plot.
res = sc.spectralClones(db, method="novj", summarize_clones=True)
res.summary() # per V/J/length group statistics
res.eff_threshold # effective distance threshold
sc.plotCloneSummary(res) # clonal-distance distribution plot
R parity
py-scoper is validated against scoper 1.5.0 on its bundled ExampleDb
(2000 heavy-chain BCR sequences). Agreement is measured by the
Adjusted Rand Index (ARI) between the R and Python clonal partitions:
| Method | ARI vs scoper 1.5.0 | Notes |
|---|---|---|
identicalClones |
1.000 | deterministic — exact partition |
hierarchicalClones |
1.000 | deterministic — exact partition |
spectralClones (novj) |
~0.98 | spectral k-means is stochastic |
spectralClones (vj) |
~0.99 | spectral k-means is stochastic |
| effective threshold | exact (0.22) |
The spectral method uses k-means with random initialisation, so an exact partition is not expected; the parity tests require ARI > 0.95 and the total clone count to be within a few percent of scoper's.
Run the parity suite (needs an R install with scoper):
pytest tests/ -q
Public API
identicalClones, hierarchicalClones, spectralClones,
defineClonesScoper, ScoperClones, plotCloneSummary,
calculate_inter_vs_intra, effective_threshold, fast_dist,
pairwise_dist_dna, pairwise_dist_aa, pairwise_mut_matrix,
count_invalid_bases, group_genes, pass_to_clustering,
krnl_mtx_generator, laplacian_mtx, make_affinity, likelihoods,
infer, find_gap_smooth, range_a_to_b, gaussian_kde_r, bw_nrd0,
load_example_db.
License
AGPL-3, matching the original scoper package. See LICENSE. Credit to the
Immcantation framework and the Kleinstein Lab (Yale University).
References
- Nouri N, Kleinstein SH (2018). A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data. Bioinformatics, i341–i349.
- Nouri N, Kleinstein SH (2020). Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data. PLOS Computational Biology, 16(6), e1007977.
- Gupta NT, et al. (2017). Hierarchical clustering can identify B cell clones with high confidence in Ig repertoire sequencing data. The Journal of Immunology, 2489–2499.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyscoper-0.1.0.tar.gz.
File metadata
- Download URL: pyscoper-0.1.0.tar.gz
- Upload date:
- Size: 235.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b434b01666c33467262343ecbf55743637592648940b7f7a34ab951add4019c
|
|
| MD5 |
8d4e867d45b0e609feb227918f70332d
|
|
| BLAKE2b-256 |
3e8e9106580a8a2e9c8704b23358b60dec7eefef6d9ea0f1b3b7cb6773113a8d
|
Provenance
The following attestation bundles were made for pyscoper-0.1.0.tar.gz:
Publisher:
publish.yml on omicverse/py-scoper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyscoper-0.1.0.tar.gz -
Subject digest:
6b434b01666c33467262343ecbf55743637592648940b7f7a34ab951add4019c - Sigstore transparency entry: 1591004310
- Sigstore integration time:
-
Permalink:
omicverse/py-scoper@f609e13986c922b7c3f0fd21c4169994722a4f8a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f609e13986c922b7c3f0fd21c4169994722a4f8a -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pyscoper-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyscoper-0.1.0-py3-none-any.whl
- Upload date:
- Size: 166.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
984eefbf2f5895defcbbba07d9b6e94c73245c1947641f3e2d0f83e47ac00f8f
|
|
| MD5 |
b5234d8a9fb1791af153b76f7d2e7ca0
|
|
| BLAKE2b-256 |
c8b26a3632db706c882549d1c8108de98164da60f5819ce6b10f47f16d0922da
|
Provenance
The following attestation bundles were made for pyscoper-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on omicverse/py-scoper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyscoper-0.1.0-py3-none-any.whl -
Subject digest:
984eefbf2f5895defcbbba07d9b6e94c73245c1947641f3e2d0f83e47ac00f8f - Sigstore transparency entry: 1591004314
- Sigstore integration time:
-
Permalink:
omicverse/py-scoper@f609e13986c922b7c3f0fd21c4169994722a4f8a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f609e13986c922b7c3f0fd21c4169994722a4f8a -
Trigger Event:
workflow_dispatch
-
Statement type: