Skip to main content

Python package written in C++ for pairwise distance computation for sequences.

Project description

setriq: pairwise sequence distances

CircleCI codecov License: MIT

logo

A Python package written in C++ for computing pairwise distances between (immunoglobulin) sequences.

Install

This package is available on PyPI

pip install setriq

Quickstart

setriq inherits from the torch philosophy of callable objects. Each Metric subclass is a callable upon initialisation, taking a list of objects (usually str) and returning a list of float values.

import setriq
metric = setriq.CdrDist()

sequences = [
    'CASSLKPNTEAFF',
    'CASSAHIANYGYTF',
    'CASRGATETQYF'
]
distances = metric(sequences)

The returned list is flat and contains N * (N - 1) / 2 elements, i.e. the lower (or upper) triangle of the distance matrix. To get the square form of the matrix, use scipy.spatial.distance.squareform on the returned distances.

About

As the header suggests, setriq is a no-frills Python package for fast computation of pairwise sequence distances, with a focus on immunoglobulins. It is a declarative framework and borrows many concepts from the popular torch library. It has been optimized for parallel compute on CPU architectures.

It can only perform pairwise, all-v-all distance computations. This decision was made to maximize consistency and cohesion.

Requirements

A Python version of 3.7 or above is required, as well as a C++ compiler equipped with OpenMP. The package has been tested on Linux and macOS. To get the required OpenMP resources, run:

On Linux:

sudo apt install libomp-dev && sudo apt show libomp-dev

On macOS:

brew install libomp llvm

References

  1. Dash, P., Fiore-Gartland, A.J., Hertz, T., Wang, G.C., Sharma, S., Souquette, A., Crawford, J.C., Clemens, E.B., Nguyen, T.H., Kedzierska, K. and La Gruta, N.L., 2017. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature, 547(7661), pp.89-93. (https://doi.org/10.1038/nature22383)
  2. Levenshtein, V.I., 1966, February. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710).
  3. python-Levenshtein (https://github.com/ztane/python-Levenshtein)
  4. Thakkar, N. and Bailey-Kellogg, C., 2019. Balancing sensitivity and specificity in distinguishing TCR groups by CDR sequence similarity. BMC bioinformatics, 20(1), pp.1-14. (https://doi.org/10.1186/s12859-019-2864-8)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

setriq-1.4.2.tar.gz (82.8 kB view details)

Uploaded Source

File details

Details for the file setriq-1.4.2.tar.gz.

File metadata

  • Download URL: setriq-1.4.2.tar.gz
  • Upload date:
  • Size: 82.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for setriq-1.4.2.tar.gz
Algorithm Hash digest
SHA256 e7a3141dcc11f35066c650adc4fccee2e35362ed0250653a34617af53c66d4d9
MD5 e6fb206f661bbda5474f0ee92c7ba2ea
BLAKE2b-256 babc13261bd105a7c798c01153c05f0f4b38965f715c16d9c522258d3aa17c0b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page