Fast tcrdist routines between TCR sequences.
Project description
tcrdist_rs
This is a Rust implementation of tcrdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances. Each distance function has the following functions for computing distances among sequences. We take tcrdist
as an example:
tcrdist(seq1, seq2, ...)
tcrdist_matrix(seqs, ...)
(computes the upper right triangle of the distance matrix among sequences)tcrdist_one_to_many(seq, seqs, ...)
tcrdist_many_to_many(seqs1, seqs2, ...)
where ...
implies other parameters, see the docstrings (e.g., help(tcrdist)
).
The other distances currently supported:
hamming
levenshtein
levenshtein_exp
(possibly faster Levenshtein distance which uses an exponential search)tcrdist_allele
(tcrdist computed using amino acid CDR3-V allele pairs)tcrdist_gene
(tcrdist computed using amino acid CDR3-V gene pairs)
Development environment setup
Install Rust if necessary.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Download the maturin
library, assuming you are in an environment which has Python.
pip install maturin
To compile tcrdist, run
maturin develop --release --profile release
Example
We use the same settings as Python tcrdist. Notably, the keyword parallel
enables the use of all CPU cores when computing many distances. However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.
import tcrdist_rs
tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]
phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
cdr2_weight, cdr3_weight,
gap_penalty, ntrim, ctrim, fixed_gappos,
parallel)
tcrdist can also be computed at the level of genes.
import tcrdist_rs
tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]
ntrim = 3
ctrim = 2
distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, ntrim, ctrim, parallel=False)
Outlook
- Implement monomorphization since most functions are similar?
- Improve parallelization of one-to-many computations. (Give option to choose number of threads?)
- Vectorize/simdify computations for amino acid lookup?
- Improve lookup table performance (hash map vs. match vs. array)
- Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)
- Find neighbor functionality
- Sphinx documentation
References
- Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
- Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file tcrdist_rs-0.1.2.tar.gz
.
File metadata
- Download URL: tcrdist_rs-0.1.2.tar.gz
- Upload date:
- Size: 63.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19b369198d1a73a211af93cafaddfa82f47a61ee15db76750fe55e6724a4e245 |
|
MD5 | 161fccfaa167c06b8e9bc9dc4c3cba4f |
|
BLAKE2b-256 | 2cd952e8dd2ceb867d752e35eadc8639aaca6bbb43e611650de1c3bc676802cf |
File details
Details for the file tcrdist_rs-0.1.2-cp38-abi3-win_amd64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.2-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 456.4 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e037f2fb5bfa62be4f9b2a752367373c5322a408928fa2600012c419f3ddaca |
|
MD5 | e9d4d5e0470564cebb122774fe82a847 |
|
BLAKE2b-256 | e9b1b16a1b1a2f8d53258e8f12f42c3a8c4afd3cb75cec7540e7507d7b6ed268 |
File details
Details for the file tcrdist_rs-0.1.2-cp38-abi3-win32.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.2-cp38-abi3-win32.whl
- Upload date:
- Size: 416.1 kB
- Tags: CPython 3.8+, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b74c169d8021ded8bd2a85d540bc80b8bf1838dcdf3a7b313558e74286c4851 |
|
MD5 | d6d7666d439506577d982509d43ff9fe |
|
BLAKE2b-256 | 4f05c705c282e08fbc077aecdbea843e6f3a8d3ec498bac4e493ce0b3535960d |
File details
Details for the file tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a173a5d5a27c7f519cc1e8bf8cbc57df8cc81285b7eba714696855b1fb76c867 |
|
MD5 | 6ade797edce4b8e6fac123802a7629e6 |
|
BLAKE2b-256 | a1d9c95798501a738dae0de7cfe11d4092ba86a7928ab789b8eca337d7479d87 |
File details
Details for the file tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | adf144ac36d495fb73f66fcea8d56be996806ae851b127f36664eb6a796b9edd |
|
MD5 | 555854e7fc28b23d9fcae207113a350f |
|
BLAKE2b-256 | 55f899997b0b94f4c36e75370be558d1f8170bedac41e542fcb5e3a37b414e2f |
File details
Details for the file tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74b3c29c125f792d2625c1626b3cfd4f8eced7b5b12996b55a01e28c78aff3e7 |
|
MD5 | e6bbf887624c28c35bf7d4034797db35 |
|
BLAKE2b-256 | 53dff6cf7f2afe11010d95edef5d6e8acb807042349e0854e426b59e848ee24c |
File details
Details for the file tcrdist_rs-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 523.4 kB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7b6eab15c5effe14d13f3486dedd476cdf63e9442e2953f7b8f51f117c0808c |
|
MD5 | bd9095ef925f11920714d881856cdb16 |
|
BLAKE2b-256 | 271f9661693adf355dff7f5855ccc92515ec65e39cf839e3489d3a647d1b7055 |
File details
Details for the file tcrdist_rs-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 554.0 kB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f51382b349f1b69eb16457b535a0f4681378778c1485806da06aacc0bebbb9e |
|
MD5 | a68bb73a6b19b3943ba013ed2c0f3d75 |
|
BLAKE2b-256 | 8af019072064dad1c0fcbf7c647528a7958db2cfb900fb31035436c8d7577b28 |