Skip to main content

Fast tcrdist routines between TCR sequences.

Project description

tcrdist_rs

This is a Rust implementation of tcrdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances. Each distance function has the following functions for computing distances among sequences. We take tcrdist as an example:

  • tcrdist(seq1, seq2, ...)
  • tcrdist_matrix(seqs, ...) (computes the upper right triangle of the distance matrix among sequences)
  • tcrdist_one_to_many(seq, seqs, ...)
  • tcrdist_many_to_many(seqs1, seqs2, ...)

where ... implies other parameters, see the docstrings (e.g., help(tcrdist)).

The other distances currently supported:

  • hamming
  • levenshtein
  • levenshtein_exp (possibly faster Levenshtein distance which uses an exponential search)
  • tcrdist_allele (tcrdist computed using amino acid CDR3-V allele pairs)
  • tcrdist_gene (tcrdist computed using amino acid CDR3-V gene pairs)

Development environment setup

Install Rust if necessary.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Download the maturin library, assuming you are in an environment which has Python.

pip install maturin

To compile tcrdist, run

maturin develop --release --profile release

Example

We use the same settings as Python tcrdist. Notably, the keyword parallel enables the use of all CPU cores when computing many distances. However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.

import tcrdist_rs

tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
        ['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]

phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
                                             cdr2_weight, cdr3_weight,
                                             gap_penalty, ntrim, ctrim, fixed_gappos,
                                             parallel)

tcrdist can also be computed at the level of genes.

import tcrdist_rs


tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
        ['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]

ntrim = 3
ctrim = 2

distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, ntrim, ctrim, parallel=False)

Outlook

  • Implement monomorphization since most functions are similar?
  • Improve parallelization of one-to-many computations. (Give option to choose number of threads?)
  • Vectorize/simdify computations for amino acid lookup?
  • Improve lookup table performance (hash map vs. match vs. array)
  • Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)
  • Find neighbor functionality
  • Sphinx documentation

References

  • Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
  • Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcrdist_rs-0.1.6.tar.gz (65.7 kB view details)

Uploaded Source

Built Distributions

tcrdist_rs-0.1.6-cp37-abi3-win_amd64.whl (490.3 kB view details)

Uploaded CPython 3.7+ Windows x86-64

tcrdist_rs-0.1.6-cp37-abi3-win32.whl (443.3 kB view details)

Uploaded CPython 3.7+ Windows x86

tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (628.9 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (642.2 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ i686

tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (610.9 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARM64

tcrdist_rs-0.1.6-cp37-abi3-macosx_11_0_arm64.whl (547.7 kB view details)

Uploaded CPython 3.7+ macOS 11.0+ ARM64

tcrdist_rs-0.1.6-cp37-abi3-macosx_10_12_x86_64.whl (578.4 kB view details)

Uploaded CPython 3.7+ macOS 10.12+ x86-64

File details

Details for the file tcrdist_rs-0.1.6.tar.gz.

File metadata

  • Download URL: tcrdist_rs-0.1.6.tar.gz
  • Upload date:
  • Size: 65.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for tcrdist_rs-0.1.6.tar.gz
Algorithm Hash digest
SHA256 ffe42ca55b3d94f2510e3d0bda914797000dcdaf8f0418a10d6212a92bc7dd59
MD5 3130e99b14bde7d42abe713f23f4ffef
BLAKE2b-256 d28090f2eaa82b468e668a2293900adb4725c0726dc6083a66b0762565b63b25

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.6-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.6-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 78dff8ccc7d06e1ac51cf00ae0fdbe7e06d05060c55c540f20c67563c8a17082
MD5 dc12b962d539c718fbf2513893ec7531
BLAKE2b-256 e772ba0fd3154514cdc80e6aca58550d9bcb09197d64d9e494be095ac1711aa9

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.6-cp37-abi3-win32.whl.

File metadata

  • Download URL: tcrdist_rs-0.1.6-cp37-abi3-win32.whl
  • Upload date:
  • Size: 443.3 kB
  • Tags: CPython 3.7+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for tcrdist_rs-0.1.6-cp37-abi3-win32.whl
Algorithm Hash digest
SHA256 c8b9f96bc0e31812285477c93e90968105775297f97a3c1c4552454435f7ea65
MD5 286a06004a2e756da02294cc677fb108
BLAKE2b-256 b482071f19be462ad109d587b5bee7fe82cdc7d623975e681e2d048dd5961404

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 abba1376aaf451bc4e1f63312d0e807a685df66872bc94d565dc055a8f762259
MD5 0693586dd8c232527ec5053100f5288c
BLAKE2b-256 3351a9b924d92446b9d17ecf94c3d49252ee88cfd7c425d41d4ba291adc1ff74

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 da1c825469093151ba1199a81306e095ff4f3843808b177edecdcc2c2b92f7f3
MD5 d38b321804a2ba0a62e1478c5a658eb7
BLAKE2b-256 5ae35d46668194b5ff4e24d5d2421aebf0e09ef3b70b93d278c8970fa56ced23

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.6-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5f87a24a38a7b7f7346081a942f0ef749f234e0b616664657f858540ce0a6022
MD5 a49d80f1db64a1689cc4a740369c9891
BLAKE2b-256 c09c84d43cebf7f17327313c7d8d6c2ce57b3c5a4a7a1aebfff2ed9c1e5a7a49

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.6-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.6-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 582a5c5d99541f86ffd3fc0c913bbf18e1bcd1adc90a9cb6ed68308e10f984ae
MD5 1974ed873f8f270226b1d9365d2ddd5b
BLAKE2b-256 c3144266ed8484dfbe886bbf1597d74ece160ad5f5bad88b035afe6564b1bddf

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.6-cp37-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.6-cp37-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 79406ba9312e599d0b2e3ffcb173fe501c8a61eaeab16090f57a05ba99a42786
MD5 32e9d3d8ac98e14cc2bcf72e1092b0da
BLAKE2b-256 6ed7d5c4684319e15b426c051197e4d4a685a233f97617ec427bdc9469b1ad84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page