Skip to main content

Python package written in C++ for pairwise distance computation for sequences.

Project description

setriq: pairwise sequence distances

CircleCI codecov CodeFactor License: MIT

logo

A Python package written in C++ for computing pairwise distances between (immunoglobulin) sequences.

Install

This package is available on PyPI

pip install setriq

Quickstart

setriq inherits from the torch philosophy of callable objects. Each Metric subclass is a callable upon initialisation, taking a list of objects (usually str) and returning a list of float values.

import setriq
metric = setriq.CdrDist()

sequences = [
    'CASSLKPNTEAFF',
    'CASSAHIANYGYTF',
    'CASRGATETQYF'
]
distances = metric(sequences)

The returned list is flat and contains N * (N - 1) / 2 elements, i.e. the lower (or upper) triangle of the distance matrix. To get the square form of the matrix, use scipy.spatial.distance.squareform on the returned distances.

About

As the header suggests, setriq is a no-frills Python package for fast computation of pairwise sequence distances, with a focus on immunoglobulins. It is a declarative framework and borrows many concepts from the popular torch library. It has been optimized for parallel compute on CPU architectures.

Available distance functions:

  • CDRdist
  • Levenshtein
  • TCRdist
  • Hamming
  • Jaro
  • Jaro-Winkler
  • Longest Common Substring
  • Optimal String Alignment

These distance functions are available either through the object-based API (as seen above), which provides the CPU-based parallelism, or the functional API in setriq.single_dispatch. Unlike the object-based API, the functional API does a single comparison between two sequences for every call, i.e. it exposes the C++ distance functions without the parallelism wrapper. This can be useful for integration of setriq with other tools such as PySpark. For example:

from pyspark.sql import SparkSession
from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType

from setriq import single_dispatch as sd

spark = SparkSession \
   .builder \
   .appName("setriq-spark") \
   .getOrCreate()

df = spark.createDataFrame([('CASSLKPNTEAFF',), ('CASSAHIANYGYTF',), ('CASRGATETQYF',)], ['sequence'])
df = df.withColumnRenamed('sequence', 'a').crossJoin(df.withColumnRenamed('sequence', 'b'))

lev_udf = udf(sd.levenshtein, returnType=DoubleType())  # single dispatch levenshtein distance
df = df.withColumn('distance', lev_udf('a', 'b'))
df.show()

It is important to note, that for setriq.single_dispatch the returned value is always a single float value.

Requirements

A Python version of 3.7 or above is required, as well as a C++ compiler equipped with OpenMP. The package has been tested on Linux and macOS. To get the required OpenMP resources, run:

On Linux:

sudo apt install libomp-dev && sudo apt show libomp-dev

On macOS:

brew install libomp llvm

References

  1. Dash, P., Fiore-Gartland, A.J., Hertz, T., Wang, G.C., Sharma, S., Souquette, A., Crawford, J.C., Clemens, E.B., Nguyen, T.H., Kedzierska, K. and La Gruta, N.L., 2017. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature, 547(7661), pp.89-93. (https://doi.org/10.1038/nature22383)
  2. Jaro, M.A., 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association, 84(406), pp.414-420.
  3. Levenshtein, V.I., 1966, February. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710).
  4. python-Levenshtein (https://github.com/ztane/python-Levenshtein)
  5. Thakkar, N. and Bailey-Kellogg, C., 2019. Balancing sensitivity and specificity in distinguishing TCR groups by CDR sequence similarity. BMC bioinformatics, 20(1), pp.1-14. (https://doi.org/10.1186/s12859-019-2864-8)
  6. Van der Loo, M.P., 2014. The stringdist package for approximate string matching. R J., 6(1), p.111.
  7. Winkler, W.E., 1990. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

setriq-1.6.3-cp39-cp39-musllinux_1_1_x86_64.whl (669.9 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

setriq-1.6.3-cp39-cp39-musllinux_1_1_i686.whl (729.8 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

setriq-1.6.3-cp39-cp39-musllinux_1_1_aarch64.whl (655.0 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ ARM64

setriq-1.6.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (152.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

setriq-1.6.3-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (158.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

setriq-1.6.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (147.5 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

setriq-1.6.3-cp39-cp39-macosx_10_9_x86_64.whl (116.0 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

setriq-1.6.3-cp38-cp38-musllinux_1_1_x86_64.whl (669.6 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

setriq-1.6.3-cp38-cp38-musllinux_1_1_i686.whl (729.5 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

setriq-1.6.3-cp38-cp38-musllinux_1_1_aarch64.whl (654.6 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ ARM64

setriq-1.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (151.8 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

setriq-1.6.3-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (158.4 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

setriq-1.6.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (147.3 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

setriq-1.6.3-cp38-cp38-macosx_10_9_x86_64.whl (115.8 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

setriq-1.6.3-cp37-cp37m-musllinux_1_1_x86_64.whl (673.5 kB view details)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

setriq-1.6.3-cp37-cp37m-musllinux_1_1_i686.whl (734.0 kB view details)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

setriq-1.6.3-cp37-cp37m-musllinux_1_1_aarch64.whl (658.5 kB view details)

Uploaded CPython 3.7m musllinux: musl 1.1+ ARM64

setriq-1.6.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (154.8 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

setriq-1.6.3-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl (162.3 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686

setriq-1.6.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (150.0 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

setriq-1.6.3-cp37-cp37m-macosx_10_9_x86_64.whl (115.2 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

File details

Details for the file setriq-1.6.3-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 ac887fd8bad096013ceaf3503c3b2cb84f93dcbe99fbcb559fa0dd02fedb6487
MD5 60007f8f92787974815534e57d61e42e
BLAKE2b-256 a0d264a8075d1481b0121027330828a70bfe07c1a1660c9b2b98e44a6806e9cd

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 6b7b849ac6044bc0f50eaa2b81935c2f062ca2a2279941681dde2f8e35850a88
MD5 0c3c175295c8e1d6243394f1707d2942
BLAKE2b-256 d52bfcc0c9214116e1fcfcad15a52ead1ea052a9f2f31d9456227441d1edc84f

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp39-cp39-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp39-cp39-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 c89f9d534dbda6b62b1e1c968ad117eff7931d086fc01e50fa61ca9412e1cce3
MD5 a6ab367765bd72dcf70cb1a3a2fdcd5b
BLAKE2b-256 6eaf45f7e380d8ff0c3917e7df6e5b83f36eb2883137917f8a845c039336ba09

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 99f5fa127e43613ff5f2e4ab046e9555e085f15ebf42996a9eaea1bff159ad5e
MD5 e688582ea680277d6d929d8f61fea414
BLAKE2b-256 ecda5d7c9a796e83e6db60007e7c3364b3b11f061994293183354c8bc6038429

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 3821f594d9d285822bf840a1508a775b82b8bd66d7e616ba452433b46ea4bbe3
MD5 8f0092ac07589173dfa78e336fc77a0b
BLAKE2b-256 7da910ea6c85b6ec02f1fb4dcb092f64699c9434d2a392a0f39b6e1106295bf3

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9d561b881ad8459c076a44099a9107a46024c6caf8cc5ae02b44f7c6b9cd6ee3
MD5 b637ca88b546aa30b17d19306253f0be
BLAKE2b-256 f4b678f08fb7e90941b7dd17cb0fc29763f8f93a920e129d73111e2f96a8c5f8

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 afb54a9279f30cb5aa6552ab8ccc6fb7a3637c81edb2daf464c8736dc24e6e65
MD5 6e6b5421d8af3eece318a33d06f725a8
BLAKE2b-256 651a07c1b0e4e57b31ea51ac0fc13c4b3152c77b311e48ec0187fbd4d834dc06

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 d38a02d5594fc51cd39285749c77dbf2b3f46c2f886b912cccd6684b862174b7
MD5 777038dd13a6f8780ce78e6f3b0c2e75
BLAKE2b-256 904ab881bc55f6e93dd68887eb46804482471bc8a58ac8631b31887ef99cc065

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 24b34a74eba39bca559ddab973c8a43458e32f9e8c2ead2d2d78db97dd6296c3
MD5 fe6ffbddfd7c059b600c6c840d08c70b
BLAKE2b-256 ea0f394e580d1a958e6c73715accb62799b618a330ed382144d07c1a77c0495b

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp38-cp38-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp38-cp38-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 a70ebaa6e3181289a7eb1545e7604e36eaedb08ba40298ba90fcb5730fda3e53
MD5 a5443f5308399712a21ef648167a3431
BLAKE2b-256 6b3d220c68a33e552d97b3d078f7efc94a7c8101e551424ed8a057d5cae73ccf

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f4056ebf8df61cdaa9bf8b0f15fb7f5864f72a5f6cc978ef26618e6e4e1cc282
MD5 ec3aadb412dc371bdfa5643ff0ce7779
BLAKE2b-256 3beaafc805a4ef4d9a8da501f23e6c84ea6801c660fc2d275aa5aaf8e1f91f71

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 d88c3369756f6b02b30ca96f01a7c0f1588e76874235b8b3785d18a06f3643d2
MD5 baa3ff06f1242560330281539a9392cb
BLAKE2b-256 4a61548304e7f9ce69407a05b9e7d90e61282c38f9d6c6698722034155c697a1

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6e73bf407ecbd368214803e150d47d1b77ac7f20c3fa02718250905127301247
MD5 edd0f5d63015c1c24c306ed3590fe678
BLAKE2b-256 cc26aca5c00834f916778631b00f6583f48b3ced8a074892f58a2d2ce1495935

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 278536046e91a6dac6a60fe18f5e52ed64066a775ed1acf1df025ad94c31091e
MD5 0355b3afad572c1f6cfe59a50df59ca7
BLAKE2b-256 b619f05b4ed982f4129c7af4bb29798c03a7762d8ed36953d647016a22a3b688

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp37-cp37m-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 1c6bebdb1fed839f0d1e3ccf9b85aabe24eb25486c30c40f253d4ba4089348c1
MD5 b4543c0e0211aac88a049f1b95293ccb
BLAKE2b-256 9d4d8f797a13464be08aa798359f8ad8702976c846db542269d702b390f50dc1

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp37-cp37m-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp37-cp37m-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 f0ac69b10f539bb358c24ac87594b1e7748f0eea7d46ae095f5c309232146291
MD5 0e6b435d74a9c291d1346fe2916780de
BLAKE2b-256 6bc15dcd68a59b3c6e5a07af4b54e9feb93a9d577c822074555271646bf6273f

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp37-cp37m-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp37-cp37m-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 90965c0bfd210d37151e6312f98ef651694753a6e236560266bd3fe46f278597
MD5 ed782cfae2d057973327630df771daa5
BLAKE2b-256 f8ccdb1ec7cbca8bcfdbdf4f8e2b529185080f257916be7393cb02469d6f9dc0

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 908f71c29912208b67c0bae5d1cd3d780b2f74436eb4c3d1ab03f2e401794608
MD5 542c36372f901240adfbf05e98de6dc6
BLAKE2b-256 bc7421abd04a2f5e9edb18a147cab65cbf9011d12d3008a76415eb4f99b93739

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 8fc518239f807f23e35cd0bb6049e560b922a46f37344d94f86756f6cf5880d2
MD5 9f41dec961006778c1dc733329785a83
BLAKE2b-256 855ef37f1b3db4adb05f4d9609d4d43bd0d7f8bf54c73f657da52f4bea60cd46

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0677df6242795488c3159b97a477505298c1f033cd99a8ed7f1c7acaf41c4a9a
MD5 80ba7a30e978cbd035d7f74f23737d18
BLAKE2b-256 fa2da113d9a714ce1255425859865ea3d3efbd23dc649a17d7ad8293d59f1af6

See more details on using hashes here.

File details

Details for the file setriq-1.6.3-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for setriq-1.6.3-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 79265caa90e8b280ce86daf01d9e0713060c3e17740c35d522bf058d74e537bc
MD5 e9a12e66aa38cff1b0912493f0d18282
BLAKE2b-256 3e4f1529ecc19c3d30a1183e47d5ab0ec1182280cda12107be22379a45d96344

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page