Skip to main content

Fast tcrdist routines between TCR sequences.

Project description

tcrdist_rs

This is a Rust implementation of tcrdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances. Each distance function has the following functions for computing distances among sequences. We take tcrdist as an example:

  • tcrdist(seq1, seq2, ...)
  • tcrdist_matrix(seqs, ...) (computes the upper right triangle of the distance matrix among sequences)
  • tcrdist_one_to_many(seq, seqs, ...)
  • tcrdist_many_to_many(seqs1, seqs2, ...)

where ... implies other parameters, see the docstrings (e.g., help(tcrdist)).

The other distances currently supported:

  • hamming
  • levenshtein
  • levenshtein_exp (possibly faster Levenshtein distance which uses an exponential search)
  • tcrdist_allele (tcrdist computed using amino acid CDR3-V allele pairs)
  • tcrdist_gene (tcrdist computed using amino acid CDR3-V gene pairs)

Development environment setup

Install Rust if necessary.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Download the maturin library, assuming you are in an environment which has Python.

pip install maturin

To compile tcrdist, run

maturin develop --release --profile release

Example

We use the same settings as Python tcrdist. Notably, the keyword parallel enables the use of all CPU cores when computing many distances. However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.

import tcrdist_rs

tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
        ['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]

phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
                                             cdr2_weight, cdr3_weight,
                                             gap_penalty, ntrim, ctrim, fixed_gappos,
                                             parallel)

tcrdist can also be computed at the level of genes.

import tcrdist_rs


tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
        ['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]

ntrim = 3
ctrim = 2

distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, ntrim, ctrim, parallel=False)

Outlook

  • Implement monomorphization since most functions are similar?
  • Improve parallelization of one-to-many computations. (Give option to choose number of threads?)
  • Vectorize/simdify computations for amino acid lookup?
  • Improve lookup table performance (hash map vs. match vs. array)
  • Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)
  • Find neighbor functionality
  • Sphinx documentation

References

  • Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
  • Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcrdist_rs-0.1.2.tar.gz (63.8 kB view details)

Uploaded Source

Built Distributions

tcrdist_rs-0.1.2-cp38-abi3-win_amd64.whl (456.4 kB view details)

Uploaded CPython 3.8+ Windows x86-64

tcrdist_rs-0.1.2-cp38-abi3-win32.whl (416.1 kB view details)

Uploaded CPython 3.8+ Windows x86

tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

tcrdist_rs-0.1.2-cp38-abi3-macosx_11_0_arm64.whl (523.4 kB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

tcrdist_rs-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl (554.0 kB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file tcrdist_rs-0.1.2.tar.gz.

File metadata

  • Download URL: tcrdist_rs-0.1.2.tar.gz
  • Upload date:
  • Size: 63.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for tcrdist_rs-0.1.2.tar.gz
Algorithm Hash digest
SHA256 19b369198d1a73a211af93cafaddfa82f47a61ee15db76750fe55e6724a4e245
MD5 161fccfaa167c06b8e9bc9dc4c3cba4f
BLAKE2b-256 2cd952e8dd2ceb867d752e35eadc8639aaca6bbb43e611650de1c3bc676802cf

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.2-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3e037f2fb5bfa62be4f9b2a752367373c5322a408928fa2600012c419f3ddaca
MD5 e9d4d5e0470564cebb122774fe82a847
BLAKE2b-256 e9b1b16a1b1a2f8d53258e8f12f42c3a8c4afd3cb75cec7540e7507d7b6ed268

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.2-cp38-abi3-win32.whl.

File metadata

  • Download URL: tcrdist_rs-0.1.2-cp38-abi3-win32.whl
  • Upload date:
  • Size: 416.1 kB
  • Tags: CPython 3.8+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.6.0

File hashes

Hashes for tcrdist_rs-0.1.2-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 9b74c169d8021ded8bd2a85d540bc80b8bf1838dcdf3a7b313558e74286c4851
MD5 d6d7666d439506577d982509d43ff9fe
BLAKE2b-256 4f05c705c282e08fbc077aecdbea843e6f3a8d3ec498bac4e493ce0b3535960d

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a173a5d5a27c7f519cc1e8bf8cbc57df8cc81285b7eba714696855b1fb76c867
MD5 6ade797edce4b8e6fac123802a7629e6
BLAKE2b-256 a1d9c95798501a738dae0de7cfe11d4092ba86a7928ab789b8eca337d7479d87

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 adf144ac36d495fb73f66fcea8d56be996806ae851b127f36664eb6a796b9edd
MD5 555854e7fc28b23d9fcae207113a350f
BLAKE2b-256 55f899997b0b94f4c36e75370be558d1f8170bedac41e542fcb5e3a37b414e2f

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 74b3c29c125f792d2625c1626b3cfd4f8eced7b5b12996b55a01e28c78aff3e7
MD5 e6bbf887624c28c35bf7d4034797db35
BLAKE2b-256 53dff6cf7f2afe11010d95edef5d6e8acb807042349e0854e426b59e848ee24c

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f7b6eab15c5effe14d13f3486dedd476cdf63e9442e2953f7b8f51f117c0808c
MD5 bd9095ef925f11920714d881856cdb16
BLAKE2b-256 271f9661693adf355dff7f5855ccc92515ec65e39cf839e3489d3a647d1b7055

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4f51382b349f1b69eb16457b535a0f4681378778c1485806da06aacc0bebbb9e
MD5 a68bb73a6b19b3943ba013ed2c0f3d75
BLAKE2b-256 8af019072064dad1c0fcbf7c647528a7958db2cfb900fb31035436c8d7577b28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page