Feature extraction library for sequences and structures
Project description
lXtractor
Introduction
lXtractor is a toolbox devoted to feature extraction from macromolecular
sequences and structures.
It's tailored towards creating shareable local data collections anchored to
a reference sequence-based object: a single sequence, MSA, or an HMM model.
Currently, it doesn't define any unique algorithms, aiming at simplicity and
transparency.
It simply provides a (hopefully) convenient interface simplifying mundane tasks,
such as fetching the data, extracting domains, mapping sequences, and computing
sequential and structural variables.
Sequences and structures anchored to a single reference object have a benefit
of interpretability in downstream applications, such as fitting interpretable
ML models.
Installation
lXtractor requires python>=3.10 installed on a Unix system and is
installable via pip
pip install lXtractor
We encourage users to first create a virtual environment via conda or mamba.
Usage
lXtractor is designed to be flexible and its usage is defined by the initial
hypothesis or a reference object that one wants to extrapolate towards the
existing sequences or structures.
Below, we'll provide a very abstract description of what this package is
intended for.
In creating data collections, one could define the following steps::
- Assemble the data.
- Map reference object to assembled entries' sequences.
- Filter hits.
- Define and calculate variables -- sequence or structure descriptors.
- Save the data for later usage or modifications.
lXtractor defines objects and routines helpful throughout this process.
Namely, PDB, SIFTS, AlphaFold, fetch_uniprot()
can aid in the first step.
Then, Alignment and PyHMMer can facilitate step 2.
At the end of the step 2 one will get a collection of Chain*-type objects.
If working with sequence-only collections, these are going to be
ChainSequence objects.
For structure-only data, these are going to be ChainStructure containers,
embedding ChainSequence and GenericStructure objects.
Finally, dealing with mappings between canonical sequence associated with
a group of structures will result in Chain objects.
ChainList wraps Chain*-type objects into a list-like collection with
useful operations allowing to quickly filter and bulk-modify Chain*-type
objects.
Thus, filtering typically comes down to using ChainList.filter() method that
accepts a Callable[Chain*, bool] and returns a filtered ChainList.
One can save/load the collected objects using ChainIO and proceed
with the feature extraction.
lXtractor defines various sequence and structure variables.
Variable-related operations are handled by GenericCalculator and
Manager classes. The former defines the calculation strategy and how
the calculations are parallelized, while the latter handles the calculations
and aggregates the results into a pandas DataFrame.
As a result, one is left with a collection of Chain*-type objects and a
table with calculated variables. In addition, one can store the calculated
variables within the objects themselves, although we currently do not encourage
this practice.
lXtractor is in the experimental stage and under active development.
Thus, objects' interfaces may change.
For the time being, one can check the examples of
- finding sequence determinants of tyrosine and serine-threonine kinases and
- a protocol to build a complete structural collection of protein kinase domains.
More examples are to come in the future, so stay tuned. If you know a good example to apply lXtractor, feel free to raise an issue or reach out ivan.reveguk@gmail.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lxtractor-0.1.7.tar.gz.
File metadata
- Download URL: lxtractor-0.1.7.tar.gz
- Upload date:
- Size: 192.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
660bbd62874c0102ef9a5505a5ac23ead4eee4c9fa81ddc1f1ab43b5bac39812
|
|
| MD5 |
febc4943f0a12aa930369dd6c3113df3
|
|
| BLAKE2b-256 |
5680a6c9c631d2bf0937ad26ec26579a18bbeb206113585134ccf6e2530d8b02
|
File details
Details for the file lxtractor-0.1.7-py3-none-any.whl.
File metadata
- Download URL: lxtractor-0.1.7-py3-none-any.whl
- Upload date:
- Size: 220.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
666989833344858fb839838db5eb9d3cf30487cb001ff96e1100ae953c2ab1e7
|
|
| MD5 |
4e7431fbaabb57bc56a1e6ebb762a23f
|
|
| BLAKE2b-256 |
f519dd2ef82d6f6da436cd41ab92f1a97cfda80c026e330f1cbfcda63ee2f3be
|