Skip to main content

Fast tcrdist routines between TCR sequences.

Project description

tcrdist_rs

This is a Rust implementation of tcrdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances. Each distance function has the following functions for computing distances among sequences. We take tcrdist as an example:

  • tcrdist(seq1, seq2, ...)
  • tcrdist_matrix(seqs, ...) (computes the upper right triangle of the distance matrix among sequences)
  • tcrdist_one_to_many(seq, seqs, ...)
  • tcrdist_many_to_many(seqs1, seqs2, ...)

where ... implies other parameters, see the docstrings (e.g., help(tcrdist)).

The other distances currently supported:

  • hamming
  • levenshtein
  • levenshtein_exp (possibly faster Levenshtein distance which uses an exponential search)
  • tcrdist_allele (tcrdist computed using amino acid CDR3-V allele pairs)
  • tcrdist_gene (tcrdist computed using amino acid CDR3-V gene pairs)

Development environment setup

Install Rust if necessary.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Download the maturin library, assuming you are in an environment which has Python.

pip install maturin

To compile tcrdist, run

maturin develop --release --profile release

Example

We use the same settings as Python tcrdist. Notably, the keyword parallel enables the use of all CPU cores when computing many distances. However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.

import tcrdist_rs

tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
        ['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]

phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
                                             cdr2_weight, cdr3_weight,
                                             gap_penalty, ntrim, ctrim, fixed_gappos,
                                             parallel)

tcrdist can also be computed at the level of genes.

import tcrdist_rs


tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
        ['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]

ntrim = 3
ctrim = 2

distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, ntrim, ctrim, parallel=False)

Outlook

  • Implement monomorphization since most functions are similar?
  • Improve parallelization of one-to-many computations. (Give option to choose number of threads?)
  • Vectorize/simdify computations for amino acid lookup?
  • Improve lookup table performance (hash map vs. match vs. array)
  • Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)
  • Find neighbor functionality
  • Sphinx documentation

References

  • Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
  • Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcrdist_rs-0.1.1.tar.gz (63.3 kB view details)

Uploaded Source

Built Distributions

tcrdist_rs-0.1.1-cp38-abi3-win_amd64.whl (450.2 kB view details)

Uploaded CPython 3.8+ Windows x86-64

tcrdist_rs-0.1.1-cp38-abi3-win32.whl (411.1 kB view details)

Uploaded CPython 3.8+ Windows x86

tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

tcrdist_rs-0.1.1-cp38-abi3-macosx_11_0_arm64.whl (519.0 kB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

tcrdist_rs-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl (547.3 kB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file tcrdist_rs-0.1.1.tar.gz.

File metadata

  • Download URL: tcrdist_rs-0.1.1.tar.gz
  • Upload date:
  • Size: 63.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for tcrdist_rs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c225064c91393f0897296571fcbd5bb4dfd1506409d0c0e640b20d98387d46eb
MD5 a410503542d92956eadfe77f74b0cb35
BLAKE2b-256 479647632257097b5cc51fd0d935514e4b811ce4a6f58b75fe576057aeeb4698

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 021a0cc00d1cdc2484e89e23e2dd5972d1efb1a8ae41b8d603c6f5d8488eff36
MD5 a3b7e630794f99475e320f5957479db0
BLAKE2b-256 cf893ea3a04d0e7add6eac1e4aac74ef446f8cd89bf6ee9cd0e03c6b714d9630

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.1-cp38-abi3-win32.whl.

File metadata

  • Download URL: tcrdist_rs-0.1.1-cp38-abi3-win32.whl
  • Upload date:
  • Size: 411.1 kB
  • Tags: CPython 3.8+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for tcrdist_rs-0.1.1-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 806e1eeaaa0b88697e19f2384c49d974fc6ffd3adc47d69730fb585c15e00066
MD5 1eb843a12f2dbf5a31aae802e4a4b40a
BLAKE2b-256 092a24bb5fc5d26951d22fb7a203f8a21ddb2c357a3c1f90b4dac3467ca30131

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9a9453a4a745a35b04b5ac534d3d798b1fae6529b28b6cb98bfa08e66b717d85
MD5 d0e834412ee1babe33b8ab52ca1a32a8
BLAKE2b-256 ede2beea98c53e6857a1bb8d91f6f93a82816575561e6f95b9e52dc7b55e8297

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 e0908ddcd7b7b69d0471dd42ffeb1414fc66547a96acf12d93e2966812593817
MD5 25a807fbd83ff49d86114747a8b134f2
BLAKE2b-256 961e3504320698bf2aa82890fc046e07a914f6ac9314f061034017586978d80f

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 162c3974d3f1b9fa9324eb1e59528ae1104acbdc5c9d3a89bf17275520048a3a
MD5 612e9e7f39340682b277f92560b3d4cd
BLAKE2b-256 9e2d1c2c25fed803380ffbf967d9cdbc22796e9e3aa58550ea4ab46e31bc0001

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5f9bcf6b3a10f54d5bca0ef0e1b939d555ba7e94864ca1fb71d3fbef9480c45e
MD5 0ecd353319b8338af434d8546bd3f9b3
BLAKE2b-256 7ed116748040b5bf55605b29f6b513c24495260426334b5d14ad3172cdb37334

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c2b7c43de985ae1b326a58301a8bcff9b2dd3c541c02dabb5c9a6660e9f38635
MD5 a419c7405ca9591f324fc94e92a5d3a6
BLAKE2b-256 7110a689e49ad472d05f74cd7e7319f2071953e9826da77cf20979ba2994c6e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page