Skip to main content

Python package written in C++ for pairwise distance computation for sequences.

Project description

setriq: pairwise sequence distances

CircleCI codecov CodeFactor License: MIT

logo

A Python package written in C++ for computing pairwise distances between (immunoglobulin) sequences.

Documentation

Install

This package is available on PyPI

pip install setriq

Quickstart

setriq inherits from the torch philosophy of callable objects. Each Metric subclass is a callable upon initialisation, taking a list of objects (usually str) and returning a list of float values.

import setriq
metric = setriq.CdrDist()

sequences = [
    'CASSLKPNTEAFF',
    'CASSAHIANYGYTF',
    'CASRGATETQYF'
]
distances = metric(sequences)

The returned list is flat and contains N * (N - 1) / 2 elements, i.e. the lower (or upper) triangle of the distance matrix. To get the square form of the matrix, use scipy.spatial.distance.squareform on the returned distances.

About

As the header suggests, setriq is a no-frills Python package for fast computation of pairwise sequence distances, with a focus on immunoglobulins. It is a declarative framework and borrows many concepts from the popular torch library. It has been optimized for parallel compute on CPU architectures.

Available distance functions:

  • CDRdist
  • Levenshtein
  • TCRdist
  • Hamming
  • Jaro
  • Jaro-Winkler
  • Longest Common Substring
  • Optimal String Alignment

These distance functions are available either through the object-based API (as seen above), which provides the CPU-based parallelism, or the functional API in setriq.single_dispatch. Unlike the object-based API, the functional API does a single comparison between two sequences for every call, i.e. it exposes the C++ distance functions without the parallelism wrapper. This can be useful for integration of setriq with other tools such as PySpark. For example:

from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType

from setriq import single_dispatch as sd

spark = SparkSession \
   .builder \
   .appName("setriq-spark") \
   .getOrCreate()

df = spark.createDataFrame([('CASSLKPNTEAFF',), ('CASSAHIANYGYTF',), ('CASRGATETQYF',)], ['sequence'])
df = df.withColumnRenamed('sequence', 'a').crossJoin(df.withColumnRenamed('sequence', 'b'))

lev_udf = udf(sd.levenshtein, returnType=DoubleType())  # single dispatch levenshtein distance
df = df.withColumn('distance', lev_udf('a', 'b'))
df.show()

It is important to note, that for setriq.single_dispatch the returned value is always a single float value.

Requirements

A Python version of 3.7 or above is required, as well as a C++ compiler equipped with OpenMP. The package has been tested on Linux and macOS. To get the required OpenMP resources, run:

On Linux:

sudo apt install libomp-dev && sudo apt show libomp-dev

On macOS:

brew install libomp llvm

References

  1. Dash, P., Fiore-Gartland, A.J., Hertz, T., Wang, G.C., Sharma, S., Souquette, A., Crawford, J.C., Clemens, E.B., Nguyen, T.H., Kedzierska, K. and La Gruta, N.L., 2017. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature, 547(7661), pp.89-93. (https://doi.org/10.1038/nature22383)
  2. Jaro, M.A., 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association, 84(406), pp.414-420.
  3. Levenshtein, V.I., 1966, February. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710).
  4. python-Levenshtein (https://github.com/ztane/python-Levenshtein)
  5. Thakkar, N. and Bailey-Kellogg, C., 2019. Balancing sensitivity and specificity in distinguishing TCR groups by CDR sequence similarity. BMC bioinformatics, 20(1), pp.1-14. (https://doi.org/10.1186/s12859-019-2864-8)
  6. Van der Loo, M.P., 2014. The stringdist package for approximate string matching. R J., 6(1), p.111.
  7. Winkler, W.E., 1990. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

setriq-1.6.4.tar.gz (94.1 kB view details)

Uploaded Source

Built Distributions

setriq-1.6.4-cp39-cp39-musllinux_1_1_x86_64.whl (669.9 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

setriq-1.6.4-cp39-cp39-musllinux_1_1_i686.whl (729.9 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

setriq-1.6.4-cp39-cp39-musllinux_1_1_aarch64.whl (655.0 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ ARM64

setriq-1.6.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (152.5 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

setriq-1.6.4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (158.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

setriq-1.6.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (147.6 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

setriq-1.6.4-cp39-cp39-macosx_10_9_x86_64.whl (116.1 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

setriq-1.6.4-cp38-cp38-musllinux_1_1_x86_64.whl (669.6 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

setriq-1.6.4-cp38-cp38-musllinux_1_1_i686.whl (729.6 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

setriq-1.6.4-cp38-cp38-musllinux_1_1_aarch64.whl (654.6 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ ARM64

setriq-1.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (151.9 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

setriq-1.6.4-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (158.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

setriq-1.6.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (147.4 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

setriq-1.6.4-cp38-cp38-macosx_10_9_x86_64.whl (115.8 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

setriq-1.6.4-cp37-cp37m-musllinux_1_1_x86_64.whl (673.6 kB view details)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

setriq-1.6.4-cp37-cp37m-musllinux_1_1_i686.whl (734.0 kB view details)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

setriq-1.6.4-cp37-cp37m-musllinux_1_1_aarch64.whl (658.5 kB view details)

Uploaded CPython 3.7m musllinux: musl 1.1+ ARM64

setriq-1.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (154.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

setriq-1.6.4-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl (162.3 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686

setriq-1.6.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (150.0 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

setriq-1.6.4-cp37-cp37m-macosx_10_9_x86_64.whl (115.2 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

File details

Details for the file setriq-1.6.4.tar.gz.

File metadata

  • Download URL: setriq-1.6.4.tar.gz
  • Upload date:
  • Size: 94.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.12

File hashes

Hashes for setriq-1.6.4.tar.gz
Algorithm Hash digest
SHA256 a04bbb62f402426c6e59dea6d70df7e3800f2f6b37efbf6b694c7f13fa941331
MD5 bce3783959b2274218b776316d633d3a
BLAKE2b-256 17b9fbeb17131e450257b9f074cfe16bf92927080da2f8117c3b85638bacf2cc

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 f41bf709d641272d05794a7bfe15bfd14d782450234ee1643f597cf6bf8dcf44
MD5 3af6c7209fe2a8ae71dea1a0e76636b6
BLAKE2b-256 0531975878bab2bbe68dc680fd9035088426eda9832aff43a28701d54fd3419c

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 13bcf8c3cad0a0914beefe8cd5b0466392d3772ea422f3b847e49756454a961b
MD5 adc5151da06f891f2e21cef4d84bb7dc
BLAKE2b-256 0b579cc35903a53af131870b6fa1304cf1081e8c2258685ab25d2f46dd925d76

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp39-cp39-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp39-cp39-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 b6bd00eef49fe37dd2872c664b2aa6de0c639a0999bf9ab0b6ae5f20429537c1
MD5 f0ef9a904143f9dd37d40bacfe24b82a
BLAKE2b-256 0b7b1c4be8b5dae55ab2de1cfa4040e3b0683bf27b49d619f6cd354489b290dd

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ffa044680c8603e98b8ff7ee4848618f2e96142ecba82dbd214d94d23662b211
MD5 9950f2aadf40c48c718202d03894dfd8
BLAKE2b-256 f4504f370db649f029d3876221eea7150d2eafb6650424a7df89397be9c54bd2

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 69c1d0f2a1cef9cbec6af10596e56164c5f1926743e98b9e908d2da7e149aa54
MD5 08d2cbe446c1e37c03953c5462e502e9
BLAKE2b-256 7693603f13911cf645b962421766fc0283dd6c90ee0d6da842e1cd8e6c614ce9

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1854de00b8d9184c0343bb34002e7183cc35428f3887654cd001467a2a5aa5db
MD5 00299cd8fed413ab65ff686074b4891a
BLAKE2b-256 3eceb6dc8bfea8cb59a72d88c8c01a68fe5456ac297227255e66ca6d28c05cd0

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9946281f1f72e1fd30f85880387cc259483057aa07923f852dbe2e608bff382a
MD5 7893acae6683eb5572cdc15202b70db6
BLAKE2b-256 7a8dd488b63f2786f3cdb80805ce466b61a2ec9373fbdb2a95dc0a459e1f0ff0

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 36929f824fcc5d50772a09838a8b08bb13fcdc38c3318906439778b36f2d3d3e
MD5 1694bc9b2d960f99b828ebf8aa5922a1
BLAKE2b-256 bf929704a93375fb2478a8d4ccc7b9ee485972d6526a43ada3457cf367684ce8

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 cf003eda388afe461f046b688ac77bc46fb66439c6763495f66fe7e3f3cc50d0
MD5 8282cde38ca635d323f431b217078692
BLAKE2b-256 aab6429250071e4f0933e4d57c129ed8a48f57b783f6a83a0dabafedd8634b9e

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp38-cp38-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp38-cp38-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 6453c745fcaadb1d1fd6f3cf4da3ee21270344b23203bed65bd2b03e62bbc563
MD5 3418d6ea633a5fb116896780dab97d38
BLAKE2b-256 23cd38e3ddfc0d6be3ba5bf624f97160b3cd9740617d0ba082602fa8211e1abf

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b63adc132191a3d0cdefff15e4853ec3719774cf201b4ac814e80177e80902ec
MD5 5c59a9966b98b1462ffe9a41335c844e
BLAKE2b-256 ab6312f58472926444f6c38f815dd54a1dc3544161cc22e4779716e8aa2aa6f5

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 a38f162df7784020aafbbe71018eb22643431d2ad481b1c29c46d2bd456cb972
MD5 5392f6432e91dbbd8c5f19c21264dd94
BLAKE2b-256 e75f4a2ec73bac91d9ab4e5b1a1085d28c7a9a1a8d48cf86bed67f0c58081579

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4523d68a632d5ec6b23a9bbdcb416d9d548d3ab7cf1cbcf3a730d575d476979d
MD5 af8fdbda1ed34866d4a48162f00c9dad
BLAKE2b-256 66570a257a11401e5baafd52687e416846e361f1ddd60ed9959469dd9a87616b

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 905d10782314269e47bc04628a60b392949b735680ee09fc4425d78c9c95e26d
MD5 f3d1daa37aa1e4aee1bd6e4d0af14c06
BLAKE2b-256 422aacd7c8e19bd97911a10f9fa38b31c442c57c0e0fba6527057ec612b54dca

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp37-cp37m-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 28ea5a33f75b769e35dd33cb5ad72bdabb1eda490f35d0c3b3307a5e4a195c92
MD5 64fcb91931b1222c57ea66c1dfe9db30
BLAKE2b-256 9ce6a573821694abd4bd376f5dcafcb6974c26c711f15b6d4b56631b8eb5be3f

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp37-cp37m-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 3c7333f1559362c6966fce142637fd71b82d40e794b8210644d68386766f7718
MD5 b7feba8684208199f04b4c13a03f7c0f
BLAKE2b-256 1cb67821ddfa7e53121d1ef721804c0b82f88967427d7d92889d7bec25052c20

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp37-cp37m-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp37-cp37m-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 28a6e99fdbf077c8f7fe9fe3fe2cf8f5e81a23a4b3b745b53f44fa437677d72c
MD5 70a380c80306294f7a8aa555111f0a02
BLAKE2b-256 c8c9e9cb623bae262833dfe4ea4b815a92c59106926583c86e6c9434e1fa8c35

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2ea8cc9cd4c9354062d37dea464e96d070eee813cda9af11cd354e13777bee55
MD5 ce8c503f612d6a6766151abebb323e7f
BLAKE2b-256 e52f3d6d1c4ce6a29882f7455d4288e017333017607df2beea53d3a4ed9636dd

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 0307c782e4e84021c0ee249513fdece9c6a9921af37adc1658fe2b947f539bb0
MD5 c93461cd4482b1a705d6cc0fbbc1c6ee
BLAKE2b-256 def9f96e18fa8a979353d18bd0de350f32d271ef705de6e4ac9e0823380bccd7

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 89f14f42685e08ef0d6af4e19bebe7df6d96b0c5c161ea0b6f2c04abc99b3b45
MD5 742e6b9fb6456557bfd0cb6ad43f67d4
BLAKE2b-256 cc1435fec140a1a89afcb0a45ebb78f5e294e9f833894a7e611a8dc856697cb7

See more details on using hashes here.

File details

Details for the file setriq-1.6.4-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.4-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d78d66a79fd6537ac87803c2d42cfb5b902fd8f4e6efd06bbebb76aae5af918c
MD5 e20ea2b36efb5dc1b8b1768381aa2087
BLAKE2b-256 3a3b457b06ae874ce8db563f4952a42003d61b6aa8bc6dc631feb58eccbfb2a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page