Skip to main content

Fast tcrdist routines between TCR sequences.

Project description

tcrdist_rs

This is a Rust implementation of tcrdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances. Each distance function has the following functions for computing distances among sequences. We take tcrdist as an example:

  • tcrdist(seq1, seq2, ...)
  • tcrdist_matrix(seqs, ...) (computes the upper right triangle of the distance matrix among sequences)
  • tcrdist_one_to_many(seq, seqs, ...)
  • tcrdist_many_to_many(seqs1, seqs2, ...)

where ... implies other parameters, see the docstrings (e.g., help(tcrdist)).

The other distances currently supported:

  • hamming
  • levenshtein
  • levenshtein_exp (possibly faster Levenshtein distance which uses an exponential search)
  • tcrdist_allele (tcrdist computed using amino acid CDR3-V allele pairs)
  • tcrdist_gene (tcrdist computed using amino acid CDR3-V gene pairs)

Development environment setup

Install Rust if necessary.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Download the maturin library, assuming you are in an environment which has Python.

pip install maturin

To compile tcrdist, run

maturin develop --release --profile release

Example

We use the same settings as Python tcrdist. Notably, the keyword parallel enables the use of all CPU cores when computing many distances. However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.

import tcrdist_rs

tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
        ['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]

phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
                                             cdr2_weight, cdr3_weight,
                                             gap_penalty, ntrim, ctrim, fixed_gappos,
                                             parallel)

tcrdist can also be computed at the level of genes.

import tcrdist_rs


tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
        ['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]

ntrim = 3
ctrim = 2

distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, ntrim, ctrim, parallel=False)

Outlook

  • Implement monomorphization since most functions are similar?
  • Improve parallelization of one-to-many computations. (Give option to choose number of threads?)
  • Vectorize/simdify computations for amino acid lookup?
  • Improve lookup table performance (hash map vs. match vs. array)
  • Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)
  • Find neighbor functionality
  • Sphinx documentation

References

  • Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
  • Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcrdist_rs-0.1.5.tar.gz (64.2 kB view details)

Uploaded Source

Built Distributions

tcrdist_rs-0.1.5-cp37-abi3-win_amd64.whl (467.3 kB view details)

Uploaded CPython 3.7+ Windows x86-64

tcrdist_rs-0.1.5-cp37-abi3-win32.whl (424.0 kB view details)

Uploaded CPython 3.7+ Windows x86

tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (606.0 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (616.1 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ i686

tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (588.0 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARM64

tcrdist_rs-0.1.5-cp37-abi3-macosx_11_0_arm64.whl (524.9 kB view details)

Uploaded CPython 3.7+ macOS 11.0+ ARM64

tcrdist_rs-0.1.5-cp37-abi3-macosx_10_12_x86_64.whl (553.6 kB view details)

Uploaded CPython 3.7+ macOS 10.12+ x86-64

File details

Details for the file tcrdist_rs-0.1.5.tar.gz.

File metadata

  • Download URL: tcrdist_rs-0.1.5.tar.gz
  • Upload date:
  • Size: 64.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for tcrdist_rs-0.1.5.tar.gz
Algorithm Hash digest
SHA256 e8a6d2d09160bcc70ce15f67f5be6fab11b9461613d98995337c79435fbeffab
MD5 93e165ed7befece6f57a4372fd7d77ad
BLAKE2b-256 e5ec8a9c0637ec5cf47e0b3ed5ac5b686cf5ca3694b18c990d20e1a4d9b43b5c

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.5-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.5-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 bb14b07ed69dbfa1f0a721cb0e978af5568de476742e3e514986f815a10f8bfa
MD5 72ce528bb4735ec4e48117952e878ea3
BLAKE2b-256 678a06e8bea60ea0b65b96726f3b031834e3e090e486a87dad01e051154eae8c

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.5-cp37-abi3-win32.whl.

File metadata

  • Download URL: tcrdist_rs-0.1.5-cp37-abi3-win32.whl
  • Upload date:
  • Size: 424.0 kB
  • Tags: CPython 3.7+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for tcrdist_rs-0.1.5-cp37-abi3-win32.whl
Algorithm Hash digest
SHA256 795c1442c7948e4e5617803923eef3fbe51de2805328492570e373ee329d11ad
MD5 e26db79c7cdc9e3881960374d8bb12af
BLAKE2b-256 5713b3b48f43a56318eb443e38b7ccda7867d4f0e15945da537ca134193bb3c1

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e5cade3b663bc05c404a32497b211adc2fe7caf462f199fe5865ffe98a1f0b1
MD5 ee8dda56a3e2f82a7409cf0a9c5d7483
BLAKE2b-256 9a42d75001b97eb92a40ecbda2bc2d22d70bafb9d3495b213b85534124019a17

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 b4c6c6f9272483ce8f05c0f0fc543084ca2ebe4c8493764deee2a74f900ff96d
MD5 f72cdb3f89e1968fc8289fdafb465c36
BLAKE2b-256 d2877f54081b85c8d948e943cb6bf8fa71f2553cc7dee2e217b45894f89e6452

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.5-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5d73cc812402b71b94f4f0a849f7f4dc82b2345fe42fba08c6815c15734f24b6
MD5 4d14ee8f991bd01b4a9a552d3a5affeb
BLAKE2b-256 ff602af7c16f3376fabaa22edbf031d3492723e1b4be5c54985a77fca9c2a627

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.5-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.5-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 36e393cfb985a47a5fdf2bff5ca6633358357cde9837e0f17b297b2de221d570
MD5 41939a286d69e1754c500247b8190b1f
BLAKE2b-256 d63118f6d5efe225be72ed22b3922e8bc985cd1f2fdab8e5d76364b8ed1c85fd

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.5-cp37-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.5-cp37-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 33ca499f9122933423dda82620380aa36f19c5944cc93fc4bfdd9f153da264b4
MD5 c3fc4bd08a88b27bac303deb5aaf0ee8
BLAKE2b-256 7e95d118749049d0ac268797666a461dfbfdd704b4d1f4ff5fa31fe2420e0e9f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page