a Python interface for rapid clustering of large sets of CDR3 sequences
Project description
ImmuneWatch ClusTCR
A Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity
A two-step clustering approach that combines the speed of the Faiss Clustering Library with the accuracy of Markov Clustering Algorithm
On a standard machine*, clusTCR can cluster 1 million CDR3 sequences in under 5 minutes.
*Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz, using 8 CPUs
Compared to other state-of-the-art clustering algorithms (GLIPH2, iSMART and tcrdist), clusTCR shows comparable clustering quality, but provides a steep increase in speed and scalability.
Documentation & Install
All of our documentation, installation info and examples can be found in the above link! To get you started, here's how to install clusTCR
$ pip install immunewatch-clustcr
Development Guide
Environment
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"
Testing
# Run all tests
pytest
# Run with coverage report
pytest --cov=clustcr --cov-report=html
Build Distribution
# Install build tool
uv pip install build twine
# Build source and wheel distributions
python -m build
# Check the built distributions
twine check dist/*
# Upload to TestPyPI
twine upload --repository testpypi dist/*
# Upload to PyPI
twine upload dist/*
Cite
Please cite as:
Sebastiaan Valkiers, Max Van Houcke, Kris Laukens, Pieter Meysman, ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity, Bioinformatics, 2021;, btab446, https://doi.org/10.1093/bioinformatics/btab446
Bibtex:
@article{valkiers2021clustcr,
author = {Valkiers, Sebastiaan and Van Houcke, Max and Laukens, Kris and Meysman, Pieter},
title = "{ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity}",
journal = {Bioinformatics},
year = {2021},
month = {06},
issn = {1367-4803},
doi = {10.1093/bioinformatics/btab446},
url = {https://doi.org/10.1093/bioinformatics/btab446},
note = {btab446},
eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btab446/38660282/btab446.pdf},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file immunewatch_clustcr-1.0.0.tar.gz.
File metadata
- Download URL: immunewatch_clustcr-1.0.0.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b9e6b052c26e43fae888972d088ba54fb8d74b074d394b3c9044fd1e4116a49
|
|
| MD5 |
724c6eff1c10e3a06e518d4e11e4ba1b
|
|
| BLAKE2b-256 |
d9cb7fd180fd11be1d440629faf48fbcda6a4680f5e0fc15a7031b9566183a65
|
File details
Details for the file immunewatch_clustcr-1.0.0-py3-none-any.whl.
File metadata
- Download URL: immunewatch_clustcr-1.0.0-py3-none-any.whl
- Upload date:
- Size: 2.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce09a5624336d76676513670db4c0550fa9471a9035be2c35140ff9088f13811
|
|
| MD5 |
ebbef0c2eed5ef12c90d0777bd2c083a
|
|
| BLAKE2b-256 |
96802ad29c87f73afb1e06850543dc5b3062d3ae88ca283859cef94ab021762d
|