Skip to main content

Fast tcrdist routines between TCR sequences.

Project description

tcrdist_rs

This is a Rust implementation of TCRdist. Previous versions have been written in python/numba and C++. This package also includes Python bindings to the triple_accel package for fast, SIMD Hamming and Levenshtein distances.

Each distance function has the following functions. We take tcrdist_gene as an example.

Distances can be computed with

  • tcrdist_gene(seq1, seq2, ...)
  • tcrdist_gene_matrix(seqs, ...) (computes the upper right triangle of the distance matrix among sequences)
  • tcrdist_gene_one_to_many(seq, seqs, ...)
  • tcrdist_gene_many_to_many(seqs1, seqs2, ...)
  • tcrdist_gene_pairwise(seqs1, seqs2, ...)

where ... implies other parameters, see the docstrings (e.g., help(tcrdist_gene)). The above functions yield the distances as a list.

Neighbors can be found with

  • tcrdist_gene_neighbor(seqs1, seqs2, threshold, ...)
  • tcrdist_gene_neighbor_matrix(seqs1, seqs2, threshold, ...)
  • tcrdist_gene_neighbor_one_to_many(seqs1, seqs2, threshold, ...)
  • tcrdist_gene_neighbor_many_to_many(seqs1, seqs2, threshold, ...)
  • tcrdist_gene_neighbor_pairwise(seqs1, seqs2, threshold, ...)

Whereas tcrdist_gene_neighbor returns a bool, all the other functions will return a list of lists, with each list containing the indices of the neighbors and the distance between the neighbors.

All the distances currently supported:

  • hamming
  • levenshtein
  • levenshtein_exp (possibly faster Levenshtein distance which uses an exponential search)
  • tcrdist
  • tcrdist_allele (tcrdist computed using amino acid CDR3-V allele pairs)
  • tcrdist_gene (tcrdist computed using amino acid CDR3-V gene pairs)

Presently, the non-TCRdist functions also have the following functions:

  • hamming_bin_many_to_many(seqs1, seqs2, parallel)
  • levenshtein_bin_many_to_many(seqs1, seqs2, parallel)
  • levenshtein_exp_bin_many_to_many(seqs1, seqs2, parallel)

which compute all pairwise distances between seqs1 and seqs2 and bin them. These are useful for characterizing distributions of coincidence across large datasets.

Installation

Given an environment with Python >= 3.7, this package can be installed with

pip install tcrdist_rs

Development environment setup

Install Rust if necessary.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Download the maturin library, assuming you are in an environment which has Python >= 3.7.

pip install maturin

To compile tcrdist, run

maturin develop --release --profile release

in the cloned repository.

Example

We use the same settings as Python TCRdist. Notably, the keyword parallel enables the use of all CPU cores when computing many distances. However, be mindful of enabling it and use it only when there are enough sequences such that the overhead of parallelizing is less costly than the actual computation.

import tcrdist_rs

tcrs = [['CASRTGTVYEQYF', 'TRBV2*01'], ['CASSTLDRVYNSPLHF', 'TRBV6-2*01'],
        ['CASSESGGQVDTQYF', 'TRBV6-4*01'], ['CASSPTGPTDTQYF', 'TRBV18*01'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5*01']]

phmc_weight = 1
cdr1_weight = 1
cdr2_weight = 1
cdr3_weight = 3
gap_penalty = 4
ntrim = 3
ctrim = 2
fixed_gappos = False
parallel = False
distances = tcrdist_rs.tcrdist_allele_matrix(tcrs, phmc_weight, cdr1_weight,
                                             cdr2_weight, cdr3_weight,
                                             gap_penalty, ntrim, ctrim, fixed_gappos,
                                             parallel)

TCRdist can also be computed at the level of genes.

import tcrdist_rs


tcrs = [['CASRTGTVYEQYF', 'TRBV2'], ['CASSTLDRVYNSPLHF', 'TRBV6-2'],
        ['CASSESGGQVDTQYF', 'TRBV6-4'], ['CASSPTGPTDTQYF', 'TRBV18'],
        ['CASSYPIEGGRAFTGELFF', 'TRBV6-5']]


distances = tcrdist_rs.tcrdist_gene_matrix(tcrs, parallel=False)

Outlook

  • Vectorize/simdify computations for amino acid lookup?
  • Improve lookup table performance (hash map vs. match vs. array)
  • Precompute V allele lookup tables from custom databases as opposed to hard code (or memoize V allele dists as they are used in the input sequences)

References

  • Dash et al. (2017) "Quantifiable predictive features define epitope-specific T cell receptor repertoires." Nature 547(89-93). https://doi.org/10.1038/nature22383
  • Mayer-Blackwell et al. (2021) "TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs." eLife 10:68605. https://doi.org/10.7554/eLife.68605

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcrdist_rs-0.1.7.tar.gz (69.4 kB view details)

Uploaded Source

Built Distributions

tcrdist_rs-0.1.7-cp37-abi3-win_amd64.whl (491.6 kB view details)

Uploaded CPython 3.7+ Windows x86-64

tcrdist_rs-0.1.7-cp37-abi3-win32.whl (445.3 kB view details)

Uploaded CPython 3.7+ Windows x86

tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (631.2 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (642.9 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ i686

tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (612.8 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARM64

tcrdist_rs-0.1.7-cp37-abi3-macosx_11_0_arm64.whl (549.4 kB view details)

Uploaded CPython 3.7+ macOS 11.0+ ARM64

tcrdist_rs-0.1.7-cp37-abi3-macosx_10_12_x86_64.whl (579.5 kB view details)

Uploaded CPython 3.7+ macOS 10.12+ x86-64

File details

Details for the file tcrdist_rs-0.1.7.tar.gz.

File metadata

  • Download URL: tcrdist_rs-0.1.7.tar.gz
  • Upload date:
  • Size: 69.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.0

File hashes

Hashes for tcrdist_rs-0.1.7.tar.gz
Algorithm Hash digest
SHA256 915e1bd0c222f9cc09ae4fdce78f75809bbbc7fe3789d88bcc073a3ded497b94
MD5 5fa2037e353c9dff5cc501dc4b866f8d
BLAKE2b-256 d25775998ca26531cc6556475571d44054b1907ca690c025f877f7993554eb5e

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.7-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.7-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 d040d3f3a7c301f9442533ebe50282a99b98f6d314d9a98ad038a002feb8b84a
MD5 1aeeff057ce8409c1f5db0306fd172d6
BLAKE2b-256 39817afc5939c1b28a8723760f79e063d4568e70c1a34838d17815599dc6e34c

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.7-cp37-abi3-win32.whl.

File metadata

  • Download URL: tcrdist_rs-0.1.7-cp37-abi3-win32.whl
  • Upload date:
  • Size: 445.3 kB
  • Tags: CPython 3.7+, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.0

File hashes

Hashes for tcrdist_rs-0.1.7-cp37-abi3-win32.whl
Algorithm Hash digest
SHA256 9c1b19333d8caedba4f7f574c26e5023cfed78ce0de5e655bd25c696a7482c41
MD5 dfb16ec5c766c198312374b8bd889b6e
BLAKE2b-256 56c8696481e67cdefa05ab54f1f7a1236cccf6b2662374c0b563e7d38530a499

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d1e14ec503271e814030d3bb8e638249259703a341c91f5465a7b9bf856dedb5
MD5 a6ef774ddb3e6fab89ac54b1bd9a13ce
BLAKE2b-256 d97aff94ac7161ea6e1002cddf7642c1e37e2b1f6fb2fb8ff3ce326949c4fd07

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 dde765f40d0c40b2231e7cfbac7e80dd8dd7438a3b831c69b6d73ada900abdd3
MD5 c7f1492650763f61d5b653f2f9a11b12
BLAKE2b-256 4126abde00c6fd732c0b43c2c3c37972c23ca851285b88eba4e91cdd8d2ead42

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.7-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a1f7b265aca61b03884e7fc3a7c67b3ad34619253e6a0d121935529c80467611
MD5 da7b7e09c55bbfbbf6013aa997d6aa9e
BLAKE2b-256 3329cb0dda2ff2d2edea9f520b92df612169dcc32463959b8374547d40bc3936

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.7-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.7-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a6faa88c67ec956f246c2b0fa24914c90f26c5ab7bc0865b19aa72d6231adf85
MD5 ab2131c9c15977a694b498f79e3456fc
BLAKE2b-256 b63969588246cd5f47a69b5f6043d197d261ca5333aea5efd579db5d96b5a225

See more details on using hashes here.

File details

Details for the file tcrdist_rs-0.1.7-cp37-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tcrdist_rs-0.1.7-cp37-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 38ea4807d8c13b38a954be2fd41165e62144b4a0491122cf32fc1c6cd9d6cb98
MD5 770938bed51489d488ef141e3bf9bbf7
BLAKE2b-256 539945f6a8cf117680cc9b6b22b67c708a8a2747ec487ea813700e9ffd414170

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page