Python package written in C++ for pairwise distance computation for sequences.
Project description
setriq: pairwise sequence distances
A Python
package written in C++
for computing pairwise distances between (immunoglobulin) sequences.
Install
This package is available on PyPI
pip install setriq
Quickstart
setriq
inherits from the torch
philosophy of callable objects. Each Metric
subclass is a callable upon
initialisation, taking a list of objects (usually str
) and returning a list of float
values.
import setriq
metric = setriq.CdrDist()
sequences = [
'CASSLKPNTEAFF',
'CASSAHIANYGYTF',
'CASRGATETQYF'
]
distances = metric(sequences)
The returned list is flat and contains N * (N - 1) / 2
elements, i.e. the lower (or upper) triangle of the distance
matrix. To get the square form of the matrix, use scipy.spatial.distance.squareform
on the returned distances.
About
As the header suggests, setriq
is a no-frills Python package for fast computation of pairwise sequence distances, with
a focus on immunoglobulins. It is a declarative framework and borrows many concepts from the popular torch
library. It
has been optimized for parallel compute on CPU architectures.
Available distance functions:
- CDRdist
- Levenshtein
- TCRdist
- Hamming
- Jaro
- Jaro-Winkler
- Longest Common Substring
- Optimal String Alignment
These distance functions are available either through the object-based API (as seen above), which provides the CPU-based
parallelism, or the functional API in setriq.single_dispatch
. Unlike the object-based API, the functional API does a
single comparison between two sequences for every call, i.e. it exposes the C++
distance functions without the
parallelism wrapper. This can be useful for integration of setriq
with other tools such as PySpark
. For example:
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType
from setriq import single_dispatch as sd
spark = SparkSession \
.builder \
.appName("setriq-spark") \
.getOrCreate()
df = spark.createDataFrame([('CASSLKPNTEAFF',), ('CASSAHIANYGYTF',), ('CASRGATETQYF',)], ['sequence'])
df = df.withColumnRenamed('sequence', 'a').crossJoin(df.withColumnRenamed('sequence', 'b'))
lev_udf = udf(sd.levenshtein, returnType=DoubleType()) # single dispatch levenshtein distance
df = df.withColumn('distance', lev_udf('a', 'b'))
df.show()
It is important to note, that for setriq.single_dispatch
the returned value is always a single float value.
Requirements
A Python
version of 3.7 or above is required, as well as a C++
compiler equipped with OpenMP. The package has been
tested on Linux and macOS. To get the required OpenMP resources, run:
On Linux:
sudo apt install libomp-dev && sudo apt show libomp-dev
On macOS:
brew install libomp llvm
References
- Dash, P., Fiore-Gartland, A.J., Hertz, T., Wang, G.C., Sharma, S., Souquette, A., Crawford, J.C., Clemens, E.B., Nguyen, T.H., Kedzierska, K. and La Gruta, N.L., 2017. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature, 547(7661), pp.89-93. (https://doi.org/10.1038/nature22383)
- Jaro, M.A., 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association, 84(406), pp.414-420.
- Levenshtein, V.I., 1966, February. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710).
- python-Levenshtein (https://github.com/ztane/python-Levenshtein)
- Thakkar, N. and Bailey-Kellogg, C., 2019. Balancing sensitivity and specificity in distinguishing TCR groups by CDR sequence similarity. BMC bioinformatics, 20(1), pp.1-14. (https://doi.org/10.1186/s12859-019-2864-8)
- Van der Loo, M.P., 2014. The stringdist package for approximate string matching. R J., 6(1), p.111.
- Winkler, W.E., 1990. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file setriq-1.6.4.tar.gz
.
File metadata
- Download URL: setriq-1.6.4.tar.gz
- Upload date:
- Size: 94.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a04bbb62f402426c6e59dea6d70df7e3800f2f6b37efbf6b694c7f13fa941331 |
|
MD5 | bce3783959b2274218b776316d633d3a |
|
BLAKE2b-256 | 17b9fbeb17131e450257b9f074cfe16bf92927080da2f8117c3b85638bacf2cc |
File details
Details for the file setriq-1.6.4-cp39-cp39-musllinux_1_1_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp39-cp39-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 669.9 kB
- Tags: CPython 3.9, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f41bf709d641272d05794a7bfe15bfd14d782450234ee1643f597cf6bf8dcf44 |
|
MD5 | 3af6c7209fe2a8ae71dea1a0e76636b6 |
|
BLAKE2b-256 | 0531975878bab2bbe68dc680fd9035088426eda9832aff43a28701d54fd3419c |
File details
Details for the file setriq-1.6.4-cp39-cp39-musllinux_1_1_i686.whl
.
File metadata
- Download URL: setriq-1.6.4-cp39-cp39-musllinux_1_1_i686.whl
- Upload date:
- Size: 729.9 kB
- Tags: CPython 3.9, musllinux: musl 1.1+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13bcf8c3cad0a0914beefe8cd5b0466392d3772ea422f3b847e49756454a961b |
|
MD5 | adc5151da06f891f2e21cef4d84bb7dc |
|
BLAKE2b-256 | 0b579cc35903a53af131870b6fa1304cf1081e8c2258685ab25d2f46dd925d76 |
File details
Details for the file setriq-1.6.4-cp39-cp39-musllinux_1_1_aarch64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp39-cp39-musllinux_1_1_aarch64.whl
- Upload date:
- Size: 655.0 kB
- Tags: CPython 3.9, musllinux: musl 1.1+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6bd00eef49fe37dd2872c664b2aa6de0c639a0999bf9ab0b6ae5f20429537c1 |
|
MD5 | f0ef9a904143f9dd37d40bacfe24b82a |
|
BLAKE2b-256 | 0b7b1c4be8b5dae55ab2de1cfa4040e3b0683bf27b49d619f6cd354489b290dd |
File details
Details for the file setriq-1.6.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 152.5 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffa044680c8603e98b8ff7ee4848618f2e96142ecba82dbd214d94d23662b211 |
|
MD5 | 9950f2aadf40c48c718202d03894dfd8 |
|
BLAKE2b-256 | f4504f370db649f029d3876221eea7150d2eafb6650424a7df89397be9c54bd2 |
File details
Details for the file setriq-1.6.4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: setriq-1.6.4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 158.3 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69c1d0f2a1cef9cbec6af10596e56164c5f1926743e98b9e908d2da7e149aa54 |
|
MD5 | 08d2cbe446c1e37c03953c5462e502e9 |
|
BLAKE2b-256 | 7693603f13911cf645b962421766fc0283dd6c90ee0d6da842e1cd8e6c614ce9 |
File details
Details for the file setriq-1.6.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 147.6 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1854de00b8d9184c0343bb34002e7183cc35428f3887654cd001467a2a5aa5db |
|
MD5 | 00299cd8fed413ab65ff686074b4891a |
|
BLAKE2b-256 | 3eceb6dc8bfea8cb59a72d88c8c01a68fe5456ac297227255e66ca6d28c05cd0 |
File details
Details for the file setriq-1.6.4-cp39-cp39-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp39-cp39-macosx_10_9_x86_64.whl
- Upload date:
- Size: 116.1 kB
- Tags: CPython 3.9, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9946281f1f72e1fd30f85880387cc259483057aa07923f852dbe2e608bff382a |
|
MD5 | 7893acae6683eb5572cdc15202b70db6 |
|
BLAKE2b-256 | 7a8dd488b63f2786f3cdb80805ce466b61a2ec9373fbdb2a95dc0a459e1f0ff0 |
File details
Details for the file setriq-1.6.4-cp38-cp38-musllinux_1_1_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp38-cp38-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 669.6 kB
- Tags: CPython 3.8, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36929f824fcc5d50772a09838a8b08bb13fcdc38c3318906439778b36f2d3d3e |
|
MD5 | 1694bc9b2d960f99b828ebf8aa5922a1 |
|
BLAKE2b-256 | bf929704a93375fb2478a8d4ccc7b9ee485972d6526a43ada3457cf367684ce8 |
File details
Details for the file setriq-1.6.4-cp38-cp38-musllinux_1_1_i686.whl
.
File metadata
- Download URL: setriq-1.6.4-cp38-cp38-musllinux_1_1_i686.whl
- Upload date:
- Size: 729.6 kB
- Tags: CPython 3.8, musllinux: musl 1.1+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf003eda388afe461f046b688ac77bc46fb66439c6763495f66fe7e3f3cc50d0 |
|
MD5 | 8282cde38ca635d323f431b217078692 |
|
BLAKE2b-256 | aab6429250071e4f0933e4d57c129ed8a48f57b783f6a83a0dabafedd8634b9e |
File details
Details for the file setriq-1.6.4-cp38-cp38-musllinux_1_1_aarch64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp38-cp38-musllinux_1_1_aarch64.whl
- Upload date:
- Size: 654.6 kB
- Tags: CPython 3.8, musllinux: musl 1.1+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6453c745fcaadb1d1fd6f3cf4da3ee21270344b23203bed65bd2b03e62bbc563 |
|
MD5 | 3418d6ea633a5fb116896780dab97d38 |
|
BLAKE2b-256 | 23cd38e3ddfc0d6be3ba5bf624f97160b3cd9740617d0ba082602fa8211e1abf |
File details
Details for the file setriq-1.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 151.9 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b63adc132191a3d0cdefff15e4853ec3719774cf201b4ac814e80177e80902ec |
|
MD5 | 5c59a9966b98b1462ffe9a41335c844e |
|
BLAKE2b-256 | ab6312f58472926444f6c38f815dd54a1dc3544161cc22e4779716e8aa2aa6f5 |
File details
Details for the file setriq-1.6.4-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: setriq-1.6.4-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 158.5 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a38f162df7784020aafbbe71018eb22643431d2ad481b1c29c46d2bd456cb972 |
|
MD5 | 5392f6432e91dbbd8c5f19c21264dd94 |
|
BLAKE2b-256 | e75f4a2ec73bac91d9ab4e5b1a1085d28c7a9a1a8d48cf86bed67f0c58081579 |
File details
Details for the file setriq-1.6.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 147.4 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4523d68a632d5ec6b23a9bbdcb416d9d548d3ab7cf1cbcf3a730d575d476979d |
|
MD5 | af8fdbda1ed34866d4a48162f00c9dad |
|
BLAKE2b-256 | 66570a257a11401e5baafd52687e416846e361f1ddd60ed9959469dd9a87616b |
File details
Details for the file setriq-1.6.4-cp38-cp38-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp38-cp38-macosx_10_9_x86_64.whl
- Upload date:
- Size: 115.8 kB
- Tags: CPython 3.8, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 905d10782314269e47bc04628a60b392949b735680ee09fc4425d78c9c95e26d |
|
MD5 | f3d1daa37aa1e4aee1bd6e4d0af14c06 |
|
BLAKE2b-256 | 422aacd7c8e19bd97911a10f9fa38b31c442c57c0e0fba6527057ec612b54dca |
File details
Details for the file setriq-1.6.4-cp37-cp37m-musllinux_1_1_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp37-cp37m-musllinux_1_1_x86_64.whl
- Upload date:
- Size: 673.6 kB
- Tags: CPython 3.7m, musllinux: musl 1.1+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28ea5a33f75b769e35dd33cb5ad72bdabb1eda490f35d0c3b3307a5e4a195c92 |
|
MD5 | 64fcb91931b1222c57ea66c1dfe9db30 |
|
BLAKE2b-256 | 9ce6a573821694abd4bd376f5dcafcb6974c26c711f15b6d4b56631b8eb5be3f |
File details
Details for the file setriq-1.6.4-cp37-cp37m-musllinux_1_1_i686.whl
.
File metadata
- Download URL: setriq-1.6.4-cp37-cp37m-musllinux_1_1_i686.whl
- Upload date:
- Size: 734.0 kB
- Tags: CPython 3.7m, musllinux: musl 1.1+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c7333f1559362c6966fce142637fd71b82d40e794b8210644d68386766f7718 |
|
MD5 | b7feba8684208199f04b4c13a03f7c0f |
|
BLAKE2b-256 | 1cb67821ddfa7e53121d1ef721804c0b82f88967427d7d92889d7bec25052c20 |
File details
Details for the file setriq-1.6.4-cp37-cp37m-musllinux_1_1_aarch64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp37-cp37m-musllinux_1_1_aarch64.whl
- Upload date:
- Size: 658.5 kB
- Tags: CPython 3.7m, musllinux: musl 1.1+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28a6e99fdbf077c8f7fe9fe3fe2cf8f5e81a23a4b3b745b53f44fa437677d72c |
|
MD5 | 70a380c80306294f7a8aa555111f0a02 |
|
BLAKE2b-256 | c8c9e9cb623bae262833dfe4ea4b815a92c59106926583c86e6c9434e1fa8c35 |
File details
Details for the file setriq-1.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 154.8 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ea8cc9cd4c9354062d37dea464e96d070eee813cda9af11cd354e13777bee55 |
|
MD5 | ce8c503f612d6a6766151abebb323e7f |
|
BLAKE2b-256 | e52f3d6d1c4ce6a29882f7455d4288e017333017607df2beea53d3a4ed9636dd |
File details
Details for the file setriq-1.6.4-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: setriq-1.6.4-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 162.3 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0307c782e4e84021c0ee249513fdece9c6a9921af37adc1658fe2b947f539bb0 |
|
MD5 | c93461cd4482b1a705d6cc0fbbc1c6ee |
|
BLAKE2b-256 | def9f96e18fa8a979353d18bd0de350f32d271ef705de6e4ac9e0823380bccd7 |
File details
Details for the file setriq-1.6.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 150.0 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89f14f42685e08ef0d6af4e19bebe7df6d96b0c5c161ea0b6f2c04abc99b3b45 |
|
MD5 | 742e6b9fb6456557bfd0cb6ad43f67d4 |
|
BLAKE2b-256 | cc1435fec140a1a89afcb0a45ebb78f5e294e9f833894a7e611a8dc856697cb7 |
File details
Details for the file setriq-1.6.4-cp37-cp37m-macosx_10_9_x86_64.whl
.
File metadata
- Download URL: setriq-1.6.4-cp37-cp37m-macosx_10_9_x86_64.whl
- Upload date:
- Size: 115.2 kB
- Tags: CPython 3.7m, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d78d66a79fd6537ac87803c2d42cfb5b902fd8f4e6efd06bbebb76aae5af918c |
|
MD5 | e20ea2b36efb5dc1b8b1768381aa2087 |
|
BLAKE2b-256 | 3a3b457b06ae874ce8db563f4952a42003d61b6aa8bc6dc631feb58eccbfb2a5 |