Skip to main content

Fast tcrdist routines between TCR sequences.

Project description

tcrdist_rs

This is a Rust implementation of tcrdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances. Each distance function has the following functions for computing distances among sequences. We take tcrdist as an example:

  • tcrdist(seq1, seq2, ...)
  • tcrdist_matrix(seqs, ...) (computes the upper right triangle of the distance matrix among sequences)
  • tcrdist_one_to_many(seq, seqs, ...)
  • tcrdist_many_to_many(seqs1, seqs2, ...)

where ... implies other parameters, see the docstrings (e.g., help(tcrdist)).

The other distances currently supported:

  • hamming
  • levenshtein
  • levenshtein_exp (possibly faster Levenshtein distance which uses an exponential search)
  • tcrdist_allele (tcrdist computed using amino acid CDR3-V allele pairs)
  • tcrdist_gene (tcrdist computed using amino acid CDR3-V gene pairs)

Development environment setup

Install Rust if necessary.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Download the maturin library, assuming you are in an environment which has Python.

pip install maturin

To compile tcrdist, run

maturin develop --release --profile release

Example

We use the same settings as Python tcrdist. Notably, the keyword parallel enables the use of all CPU cores when computing many distances. However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.

import tcrdist_rs

tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
        ['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]

phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
                                             cdr2_weight, cdr3_weight,
                                             gap_penalty, ntrim, ctrim, fixed_gappos,
                                             parallel)

tcrdist can also be computed at the level of genes.

import tcrdist_rs


tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
        ['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]

ntrim = 3
ctrim = 2

distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, ntrim, ctrim, parallel=False)

Outlook

  • Implement monomorphization since most functions are similar?
  • Improve parallelization of one-to-many computations. (Give option to choose number of threads?)
  • Vectorize/simdify computations for amino acid lookup?
  • Improve lookup table performance (hash map vs. match vs. array)
  • Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)
  • Find neighbor functionality
  • Sphinx documentation

References

  • Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
  • Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcrdist_rs-0.1.3.tar.gz (64.2 kB view details)

Uploaded Source

Built Distributions

tcrdist_rs-0.1.3-cp38-abi3-win_amd64.whl (466.8 kB view details)

Uploaded CPython 3.8+ Windows x86-64

tcrdist_rs-0.1.3-cp38-abi3-win32.whl (424.5 kB view details)

Uploaded CPython 3.8+ Windows x86

tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (606.2 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (615.3 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (588.3 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

tcrdist_rs-0.1.3-cp38-abi3-macosx_11_0_arm64.whl (524.9 kB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

tcrdist_rs-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl (553.5 kB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file tcrdist_rs-0.1.3.tar.gz.

File metadata

  • Download URL: tcrdist_rs-0.1.3.tar.gz
  • Upload date:
  • Size: 64.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for tcrdist_rs-0.1.3.tar.gz
Algorithm Hash digest
SHA256 9c26e18733953e7e6e386a48d5bb590b2e6c1e0fd3a637eef3c16679776c3b97
MD5 d33ea3181a7239c7590a4c361bb418dd
BLAKE2b-256 b3a09a1f31cdda93b891df4b55451e9d17c30ae6901bf28d19c25fbb68aa128f

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.3-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 70ce9f74fead3b223f979405288f90856d4ed7389e2b31f117a8aaf5413d7903
MD5 3439b57339f9299d0eb0c040e0db6e2f
BLAKE2b-256 66f4bd88c36318db87871d0adceed846a78ab5b1bd655c746833c20e48b52773

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.3-cp38-abi3-win32.whl.

File metadata

  • Download URL: tcrdist_rs-0.1.3-cp38-abi3-win32.whl
  • Upload date:
  • Size: 424.5 kB
  • Tags: CPython 3.8+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for tcrdist_rs-0.1.3-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 d8e7747ef51cb9d5c795f4225ddc4fddad142f07c03a06f5b41843e45a38f24c
MD5 f2b8eb32c1afe51f2f83c639eeba933b
BLAKE2b-256 936bc4f801b3805ed92c4ee834bba559f920afff8e80f61d500533cd4c008b9e

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 47097027ab7908f568f6bf33c562c6106e78af7621edc80cc353b93cb24dd650
MD5 010b9954366bdd57b9464929fdc99645
BLAKE2b-256 fd49e0f5c7329593e84ad5881584611923e88fe488b1c9e25d8f90eddf1011e0

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 32d8a01467528ec5b06b41a3c44caf786fd9a129ac57294c6ce8ffd8511ea85c
MD5 637af4d9a94d645374ca38fed63ee773
BLAKE2b-256 cbbcf8c7eedcf5fc1df6cc81573e9404c4ebe07d7bdc30b72325a3ce9fbb21ba

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e367142b69e8871d24e70b6aff3281b2f6e52ee7e740f9f0b74262138e8650c7
MD5 d37a2d8a4dec1dedc921d47ac4f2584f
BLAKE2b-256 49cd70da2a199342f1612a5e981146f7200eb63db269dd0bbc87f34ad0d7c18d

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aa6f26bac50ae2849f27001083031d74609a28af49c0019b5aea189126975f48
MD5 111adcdf1fcb8106978e3c47259a6c39
BLAKE2b-256 42aa3497d1c8476ad952bac8ed3ad83bdd933df1d0d3ba4893e3ae52c33f9e84

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 749b07b10883de317cb87a3790cd04938230536d1968a271857a74a7839832f9
MD5 ec29a65573de7c04cdcedc01268da635
BLAKE2b-256 a5410a8a21f206e7b23131e51acb4e9ed75cfd429e906caf75405247d699f183

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page