Skip to main content

Fast tcrdist routines between TCR sequences.

Project description

tcrdist_rs

This is a Rust implementation of tcrdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances. Each distance function has the following functions for computing distances among sequences. We take tcrdist as an example:

  • tcrdist(seq1, seq2, ...)
  • tcrdist_matrix(seqs, ...) (computes the upper right triangle of the distance matrix among sequences)
  • tcrdist_one_to_many(seq, seqs, ...)
  • tcrdist_many_to_many(seqs1, seqs2, ...)

where ... implies other parameters, see the docstrings (e.g., help(tcrdist)).

The other distances currently supported:

  • hamming
  • levenshtein
  • levenshtein_exp (possibly faster Levenshtein distance which uses an exponential search)
  • tcrdist_allele (tcrdist computed using amino acid CDR3-V allele pairs)
  • tcrdist_gene (tcrdist computed using amino acid CDR3-V gene pairs)

Development environment setup

Install Rust if necessary.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Download the maturin library, assuming you are in an environment which has Python.

pip install maturin

To compile tcrdist, run

maturin develop --release -F py_binds,pyo3 --profile release

Example

We use the same settings as Python tcrdist. Notably, the keyword parallel enables the use of all CPU cores when computing many distances. However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.

import tcrdist_rs

tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
        ['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]

phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
                                             cdr2_weight, cdr3_weight,
                                             gap_penalty, ntrim, ctrim, fixed_gappos,
                                             parallel)

tcrdist can also be computed at the level of genes.

import tcrdist_rs


tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
        ['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]

ntrim = 3
ctrim = 2

distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, ntrim, ctrim, parallel=False)

Outlook

  • Implement monomorphization since most functions are similar?
  • Improve parallelization of one-to-many computations. (Give option to choose number of threads?)
  • Vectorize/simdify computations for amino acid lookup?
  • Improve lookup table performance (hash map vs. match vs. array)
  • Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)
  • Find neighbor functionality
  • Sphinx documentation

References

  • Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
  • Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcrdist_rs-0.1.0.tar.gz (62.4 kB view details)

Uploaded Source

Built Distributions

tcrdist_rs-0.1.0-cp38-abi3-win_amd64.whl (64.7 kB view details)

Uploaded CPython 3.8+ Windows x86-64

tcrdist_rs-0.1.0-cp38-abi3-win32.whl (64.8 kB view details)

Uploaded CPython 3.8+ Windows x86

tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (673.0 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_12_i686.manylinux2010_i686.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.12+ i686

tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.whl (692.7 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.5+ x86-64

tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl (697.6 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.5+ i686

tcrdist_rs-0.1.0-cp38-abi3-macosx_11_0_arm64.whl (4.5 kB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

tcrdist_rs-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl (4.3 kB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file tcrdist_rs-0.1.0.tar.gz.

File metadata

  • Download URL: tcrdist_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 62.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for tcrdist_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e7e60c556d095380cbd37970004944f90210a362ddd226e33e13d5ff70ce5a96
MD5 d7e1c2e4ed890aba61b92664a51ba4cb
BLAKE2b-256 f49e9d070f869d41c2e17b2c903b271e0a750f8ba162096fe104627ff736f594

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2cf9dff0e69f1cf3cfeae0a74f53a9206d1c8280f656d23222e0f17656b4402d
MD5 5d96f36760a5be78c324089461930e5a
BLAKE2b-256 3d18c9cbaac9a1ef67c62f4bc0e3528cf78cf9f3b178e5a9e564753ef82b9d3d

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-win32.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 89237539384ff9773eccb70f6962a41154ed9968a98335f996cba15897252301
MD5 48e182dd93dce0919c31235805e6b1dc
BLAKE2b-256 0c0e373ff477b9070b3cb4329dce9886c2a7d5716debbcda3db62c6f05073187

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f19b7290e5cfbc4e1cc9eccb5938467b2b06531b9845b07d6e023de36e874d34
MD5 306619e6302070f9573a87600c3e9089
BLAKE2b-256 848a57527028785e36cfcd2c66c08be54b77b403f1f081cda5be9400600c0464

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 7d6cb35912d7a2179778be79ce3351327e65276e1531919cd8416c8519e143d2
MD5 5faaa45e150b5f1c6f8bac73c6a25523
BLAKE2b-256 9af6e94e9bb8883d995b29158ddc51ab28d4bdf8f0fa6e11bde08acfe116bd90

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 36f409b0c0f4c6ca9e5dc09a585aecc58095375a0e93ad286052d5c738a24778
MD5 b8dcda52a2088a6be14f58bffaaf1193
BLAKE2b-256 872f49165c2328a423b939b1e4b1706ebc08cd82feb708ac4885b96a61d5f4ce

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 620388543046733fc8179b4db5a89ae37d5eda7dad941b30338fb38c125da784
MD5 e2585a4a722aaa3672a70b2ae3b8d83d
BLAKE2b-256 34dadc016cffc2ffcecc63562a364fc23e98cb586c0a062b1cb54739fe7f3416

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5a1202be692c4e43c1cb6b47ee35008a7532f3884cf69fa0d13a7778891254d0
MD5 d920c4ade6184e550ed4470aaead77a0
BLAKE2b-256 10c8f67819ff229f584fcd0e2fb5ff650f226341b3c4c26c4ab0397ef311feb6

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm Hash digest
SHA256 b546ec1d5734155a90f0531cab469a2f3b1da78e20745dadcf87b65d724456d6
MD5 545a6f0489c16e20723d7d36ffd79c0d
BLAKE2b-256 0c3ed57ddea21aea9d764305b8ebaf96f7f908e8db7e5f0c767b8ed303ad25db

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cd14356f53812a9739905f0de07b44ec6bcd43a54857f45484d84a6160d95521
MD5 bda47b15a1f029185d9e824095544508
BLAKE2b-256 38eac62585e0b979965ef995b4bb1636f97b11e7ba8835dab6613b724366520b

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d37b5e70591e17c630d1659ac4be9d3d7e669136a6ba25724e7401ae2ce74d3c
MD5 c827a705f7e9458732baec842a9487f1
BLAKE2b-256 37de460a6e7d6d46f3ec1199b9e7adf37f9546f4aca0d57a3f4c0ee82ae17ac1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page