Python package written in C++ for pairwise distance computation for sequences.
Project description
setriq: pairwise sequence distances
A Python
package written in C++
for computing pairwise distances between (immunoglobulin) sequences.
Install
This package is available on PyPI
pip install setriq
Quickstart
setriq
inherits from the torch
philosophy of callable objects. Each Metric
subclass is a callable upon
initialisation, taking a list of objects (usually str
) and returning a list of float
values.
import setriq
metric = setriq.CdrDist()
sequences = [
'CASSLKPNTEAFF',
'CASSAHIANYGYTF',
'CASRGATETQYF'
]
distances = metric(sequences)
The returned list is flat and contains N * (N - 1) / 2
elements, i.e. the lower (or upper) triangle of the distance
matrix. To get the square form of the matrix, use scipy.spatial.distance.squareform
on the returned distances.
About
As the header suggests, setriq
is a no-frills Python package for fast computation of pairwise sequence distances, with
a focus on immunoglobulins. It is a declarative framework and borrows many concepts from the popular torch
library. It
has been optimized for parallel compute on CPU architectures.
It can only perform pairwise, all-v-all distance computations. This decision was made to maximize consistency and cohesion.
Requirements
A Python
version of 3.7 or above is required, as well as a C++
compiler equipped with OpenMP. The package has been
tested on Linux and macOS. To get the required OpenMP resources, run:
On Linux:
sudo apt install libomp-dev && sudo apt show libomp-dev
On macOS:
brew install libomp llvm
References
- Dash, P., Fiore-Gartland, A.J., Hertz, T., Wang, G.C., Sharma, S., Souquette, A., Crawford, J.C., Clemens, E.B., Nguyen, T.H., Kedzierska, K. and La Gruta, N.L., 2017. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature, 547(7661), pp.89-93. (https://doi.org/10.1038/nature22383)
- Levenshtein, V.I., 1966, February. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710).
- python-Levenshtein (https://github.com/ztane/python-Levenshtein)
- Thakkar, N. and Bailey-Kellogg, C., 2019. Balancing sensitivity and specificity in distinguishing TCR groups by CDR sequence similarity. BMC bioinformatics, 20(1), pp.1-14. (https://doi.org/10.1186/s12859-019-2864-8)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file setriq-1.4.2.tar.gz
.
File metadata
- Download URL: setriq-1.4.2.tar.gz
- Upload date:
- Size: 82.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7a3141dcc11f35066c650adc4fccee2e35362ed0250653a34617af53c66d4d9 |
|
MD5 | e6fb206f661bbda5474f0ee92c7ba2ea |
|
BLAKE2b-256 | babc13261bd105a7c798c01153c05f0f4b38965f715c16d9c522258d3aa17c0b |