Fast tcrdist routines between TCR sequences.
Project description
tcrdist_rs
This is a Rust implementation of TCRdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances.
Each distance function has the following functions. We take tcrdist_gene
as an example.
Distances can be computed with
tcrdist_gene(seq1, seq2, ...)
tcrdist_gene_matrix(seqs, ...)
(computes the upper right triangle of the distance matrix among sequences)tcrdist_gene_one_to_many(seq, seqs, ...)
tcrdist_gene_many_to_many(seqs1, seqs2, ...)
tcrdist_gene_pairwise(seqs1, seqs2, ...)
where ...
implies other parameters, see the docstrings (e.g., help(tcrdist_gene)
).
The above functions yield the distances as a list.
Neighbors can be found with
tcrdist_gene_neighbor(seqs1, seqs2, threshold, ...)
tcrdist_gene_neighbor_matrix(seqs1, seqs2, threshold, ...)
tcrdist_gene_neighbor_one_to_many(seqs1, seqs2, threshold, ...)
tcrdist_gene_neighbor_many_to_many(seqs1, seqs2, threshold, ...)
tcrdist_gene_neighbor_pairwise(seqs1, seqs2, threshold, ...)
Whereas tcrdist_gene_neighbor
returns a bool, all the other functions will return a list of lists, with each list containing the indices of the neighbors and the distance between the neighbors.
All the distances currently supported:
hamming
levenshtein
levenshtein_exp
(possibly faster Levenshtein distance which uses an exponential search)tcrdist
tcrdist_allele
(tcrdist computed using amino acid CDR3-V allele pairs)tcrdist_gene
(tcrdist computed using amino acid CDR3-V gene pairs)
Presently, the non-TCRdist functions also have the following functions:
hamming_bin_many_to_many(seqs1, seqs2, parallel)
levenshtein_bin_many_to_many(seqs1, seqs2, parallel)
levenshtein_exp_bin_many_to_many(seqs1, seqs2, parallel)
which compute all pairwise distances between seqs1
and seqs2
and bin them.
These are useful for characterizing distributions of coincidence across large datasets.
Installation
Given an environment with Python >= 3.7, this package can be installed with
pip install tcrdist_rs
Development environment setup
Install Rust if necessary.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Download the maturin
library, assuming you are in an environment which has Python >= 3.7.
pip install maturin
To compile tcrdist, run
maturin develop --release --profile release
in the cloned repository.
Example
We use the same settings as Python TCRdist. Notably, the keyword parallel
enables the use of all CPU cores when computing many distances.
However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.
import tcrdist_rs
tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]
phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
cdr2_weight, cdr3_weight,
gap_penalty, ntrim, ctrim, fixed_gappos,
parallel)
TCRdist can also be computed at the level of genes.
import tcrdist_rs
tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]
distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, parallel=False)
Outlook
- Vectorize/simdify computations for amino acid lookup?
- Improve lookup table performance (hash map vs. match vs. array)
- Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)
References
- Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
- Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file tcrdist_rs-0.1.7.tar.gz
.
File metadata
- Download URL: tcrdist_rs-0.1.7.tar.gz
- Upload date:
- Size: 69.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 915e1bd0c222f9cc09ae4fdce78f75809bbbc7fe3789d88bcc073a3ded497b94 |
|
MD5 | 5fa2037e353c9dff5cc501dc4b866f8d |
|
BLAKE2b-256 | d25775998ca26531cc6556475571d44054b1907ca690c025f877f7993554eb5e |
File details
Details for the file tcrdist_rs-0.1.7-cp37-abi3-win_amd64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.7-cp37-abi3-win_amd64.whl
- Upload date:
- Size: 491.6 kB
- Tags: CPython 3.7+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d040d3f3a7c301f9442533ebe50282a99b98f6d314d9a98ad038a002feb8b84a |
|
MD5 | 1aeeff057ce8409c1f5db0306fd172d6 |
|
BLAKE2b-256 | 39817afc5939c1b28a8723760f79e063d4568e70c1a34838d17815599dc6e34c |
File details
Details for the file tcrdist_rs-0.1.7-cp37-abi3-win32.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.7-cp37-abi3-win32.whl
- Upload date:
- Size: 445.3 kB
- Tags: CPython 3.7+, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c1b19333d8caedba4f7f574c26e5023cfed78ce0de5e655bd25c696a7482c41 |
|
MD5 | dfb16ec5c766c198312374b8bd889b6e |
|
BLAKE2b-256 | 56c8696481e67cdefa05ab54f1f7a1236cccf6b2662374c0b563e7d38530a499 |
File details
Details for the file tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 631.2 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1e14ec503271e814030d3bb8e638249259703a341c91f5465a7b9bf856dedb5 |
|
MD5 | a6ef774ddb3e6fab89ac54b1bd9a13ce |
|
BLAKE2b-256 | d97aff94ac7161ea6e1002cddf7642c1e37e2b1f6fb2fb8ff3ce326949c4fd07 |
File details
Details for the file tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 642.9 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dde765f40d0c40b2231e7cfbac7e80dd8dd7438a3b831c69b6d73ada900abdd3 |
|
MD5 | c7f1492650763f61d5b653f2f9a11b12 |
|
BLAKE2b-256 | 4126abde00c6fd732c0b43c2c3c37972c23ca851285b88eba4e91cdd8d2ead42 |
File details
Details for the file tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 612.8 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1f7b265aca61b03884e7fc3a7c67b3ad34619253e6a0d121935529c80467611 |
|
MD5 | da7b7e09c55bbfbbf6013aa997d6aa9e |
|
BLAKE2b-256 | 3329cb0dda2ff2d2edea9f520b92df612169dcc32463959b8374547d40bc3936 |
File details
Details for the file tcrdist_rs-0.1.7-cp37-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.7-cp37-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 549.4 kB
- Tags: CPython 3.7+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6faa88c67ec956f246c2b0fa24914c90f26c5ab7bc0865b19aa72d6231adf85 |
|
MD5 | ab2131c9c15977a694b498f79e3456fc |
|
BLAKE2b-256 | b63969588246cd5f47a69b5f6043d197d261ca5333aea5efd579db5d96b5a225 |
File details
Details for the file tcrdist_rs-0.1.7-cp37-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: tcrdist_rs-0.1.7-cp37-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 579.5 kB
- Tags: CPython 3.7+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38ea4807d8c13b38a954be2fd41165e62144b4a0491122cf32fc1c6cd9d6cb98 |
|
MD5 | 770938bed51489d488ef141e3bf9bbf7 |
|
BLAKE2b-256 | 539945f6a8cf117680cc9b6b22b67c708a8a2747ec487ea813700e9ffd414170 |