Skip to main content

Feature extraction library for sequences and structures

Project description

lXtractor

Coverage Status Documentation status PyPi status Python version Hatch project

lXt_diagram

Introduction

lXtractor is a toolbox devoted to feature extraction from macromolecular sequences and structures. It's tailored towards creating shareable local data collections anchored to a reference sequence-based object: a single sequence, MSA, or an HMM model. Currently, it doesn't define any unique algorithms, aiming at simplicity and transparency. It simply provides a (hopefully) convenient interface simplifying mundane tasks, such as fetching the data, extracting domains, mapping sequences, and computing sequential and structural variables. Sequences and structures anchored to a single reference object have a benefit of interpretability in downstream applications, such as fitting interpretable ML models.

Installation

lXtractor requires python>=3.10 installed on a Unix system and is installable via pip

pip install lXtractor

We encourage users to first create a virtual environment via conda or mamba.

Usage

lXtractor is designed to be flexible and its usage is defined by the initial hypothesis or a reference object that one wants to extrapolate towards the existing sequences or structures. Below, we'll provide a very abstract description of what this package is intended for.

In creating data collections, one could define the following steps::

  1. Assemble the data.
  2. Map reference object to assembled entries' sequences.
  3. Filter hits.
  4. Define and calculate variables -- sequence or structure descriptors.
  5. Save the data for later usage or modifications.

lXtractor defines objects and routines helpful throughout this process. Namely, PDB, SIFTS, AlphaFold, fetch_uniprot() can aid in the first step. Then, Alignment and PyHMMer can facilitate step 2. At the end of the step 2 one will get a collection of Chain*-type objects. If working with sequence-only collections, these are going to be ChainSequence objects. For structure-only data, these are going to be ChainStructure containers, embedding ChainSequence and GenericStructure objects. Finally, dealing with mappings between canonical sequence associated with a group of structures will result in Chain objects.

ChainList wraps Chain*-type objects into a list-like collection with useful operations allowing to quickly filter and bulk-modify Chain*-type objects. Thus, filtering typically comes down to using ChainList.filter() method that accepts a Callable[Chain*, bool] and returns a filtered ChainList. One can save/load the collected objects using ChainIO and proceed with the feature extraction.

lXtractor defines various sequence and structure variables. Variable-related operations are handled by GenericCalculator and Manager classes. The former defines the calculation strategy and how the calculations are parallelized, while the latter handles the calculations and aggregates the results into a pandas DataFrame.

As a result, one is left with a collection of Chain*-type objects and a table with calculated variables. In addition, one can store the calculated variables within the objects themselves, although we currently do not encourage this practice.

lXtractor is in the experimental stage and under active development. Thus, objects' interfaces may change.

For the time being, one can check the examples of

  1. finding sequence determinants of tyrosine and serine-threonine kinases and
  2. a protocol to build a complete structural collection of protein kinase domains.

More examples are to come in the future, so stay tuned. If you know a good example to apply lXtractor, feel free to raise an issue or reach out ivan.reveguk@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lxtractor-0.1.6.tar.gz (187.1 kB view details)

Uploaded Source

Built Distribution

lxtractor-0.1.6-py3-none-any.whl (214.9 kB view details)

Uploaded Python 3

File details

Details for the file lxtractor-0.1.6.tar.gz.

File metadata

  • Download URL: lxtractor-0.1.6.tar.gz
  • Upload date:
  • Size: 187.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.24.0

File hashes

Hashes for lxtractor-0.1.6.tar.gz
Algorithm Hash digest
SHA256 397b162debc11930f123c9ee48eebed9d9a9de17bb592f66dcabb3912411395e
MD5 52a685faf5866f5947c408eb7f835df1
BLAKE2b-256 2c193927e602bbd4c37222a62f93b287dc02ff36ad8336bba75f41ba387c91dd

See more details on using hashes here.

File details

Details for the file lxtractor-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: lxtractor-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 214.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.24.0

File hashes

Hashes for lxtractor-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c89cc44d54ba8fd86dd4cd4e0daec6b3d98cecf4ba393b0eda2a95c7cca1fa88
MD5 835699ff432dc5c332bf9fa829c78b46
BLAKE2b-256 f84f8f3fc8115fb3bdf861e4c86011c38b67b77c485a19e1e7ba99fb3e2cfc82

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page