Peptide classifier for ChEBI / PubChem
Project description
ChemLog is a framework for rule-based ontology extension. This repository implements a classification of peptides on the ChEBI and PubChem datasets.
How are peptides classified?
4 methods for classification are implemented:
- Using Monadic Second-Order Logic (MSOL) formulas with the MSOL model finder MONA
- Turning an MSOL model finding problem into a QBF satisfiability problem and solving that with CAQE or DepQBF, using the Bloqqer preprocessor.
- Turning an MOSL model finding problem partially into First-Order Logic (FOL) and solving that with a custom FOL model checker (since not all MSOL axioms are translatable, the non-translatable parts are calculated algorithmically).
- Using an algorithmic implementation
If you are just interested in the results, we recommend choosing the algorithmic implementation, as it is the fastest and can handle complex molecules.
The classification covers the following aspects:
- Number of amino acids (up to 10, except for the algorithmic method, which covers arbitrary sizes)
- Charge category (either salt, anion, cation, zwitterion or neutral)
- Proteinogenic amino acids present
- Emericellamides and 2,5-diketopiperazines
ChemLog will also return the ChEBI classes that match this classification. Currently supported are:
| ChEBI ID | name |
|---|---|
| 16670 | peptide |
| 60194 | peptide cation |
| 60334 | peptide anion |
| 60466 | peptide zwitterion |
| 25676 | oligopeptide |
| 46761 | dipeptide |
| 47923 | tripeptide |
| 48030 | tetrapeptide |
| 48545 | pentapeptide |
| 15841 | polypeptide |
| 90799 | dipeptide zwitterion |
| 155837 | tripeptide zwitterion |
| 64372 | emericellamide |
| 65061 | 2,5-diketopiperazines |
| 24866 | salt |
| 25696 | organic anion |
| 25697 | organic cation |
| 27369 | zwitterion |
All implementations are based on the same natural language definitions and have been developed jointly. Therefore, it is expected that all methods yield the same result. If you make a different experience, please open an issue.
If you face problems using ChemLog or have other questions, feel free to open an issue as well.
Installation
Download the source code from this repository.
Install with
pip install .
If you want to use the MONA reasoner, you have to install it separately (the classifier expects the mona command to be available).
Run the classification
ChemLog provides a command line interface for the classification. Results are in JSON format for each run, alongside a log and a config file. Currently, classification of ChEBI and PubChem data is supported. Download and preprocessing of the data are handled automatically. For instances, the following command classifies the 1,000 smallest peptides in ChEBI with the algorithmic method:
python -m chemlog classify-chebi --chebi-version 239 --strategy algo --only-peptides --n-molecules 1000
For more details on the available command line options run
python -m chemlog --help
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chemlog-1.0.5.tar.gz.
File metadata
- Download URL: chemlog-1.0.5.tar.gz
- Upload date:
- Size: 69.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28a0df30d85eb7d7726efe5153c09c2cc1e09401a1a37166296abce3b8789c99
|
|
| MD5 |
e167f42299fb2de1d05fbcfccde3f911
|
|
| BLAKE2b-256 |
8417379123eda1cc9f1f81d4fe3a9b65c502341e7fd606be44f148b3441eddc4
|
Provenance
The following attestation bundles were made for chemlog-1.0.5.tar.gz:
Publisher:
python-publish.yml on sfluegel05/chemlog-peptides
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemlog-1.0.5.tar.gz -
Subject digest:
28a0df30d85eb7d7726efe5153c09c2cc1e09401a1a37166296abce3b8789c99 - Sigstore transparency entry: 274722621
- Sigstore integration time:
-
Permalink:
sfluegel05/chemlog-peptides@28f65a76355b4f5dc25ac810f605ac6e437c4330 -
Branch / Tag:
refs/tags/v1.0.5 - Owner: https://github.com/sfluegel05
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@28f65a76355b4f5dc25ac810f605ac6e437c4330 -
Trigger Event:
release
-
Statement type:
File details
Details for the file chemlog-1.0.5-py3-none-any.whl.
File metadata
- Download URL: chemlog-1.0.5-py3-none-any.whl
- Upload date:
- Size: 82.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
207f1f69ea80b70c1320139fc4d5d3381e71486dac2ffb7512c17270770e34ef
|
|
| MD5 |
dbd36d52fd95e0035ea16a4e9bfcb9db
|
|
| BLAKE2b-256 |
c5464edb1efeaae49eeb1078308eb0c45a58f43a5ba17d2ecccef4a83bbdbe13
|
Provenance
The following attestation bundles were made for chemlog-1.0.5-py3-none-any.whl:
Publisher:
python-publish.yml on sfluegel05/chemlog-peptides
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemlog-1.0.5-py3-none-any.whl -
Subject digest:
207f1f69ea80b70c1320139fc4d5d3381e71486dac2ffb7512c17270770e34ef - Sigstore transparency entry: 274722623
- Sigstore integration time:
-
Permalink:
sfluegel05/chemlog-peptides@28f65a76355b4f5dc25ac810f605ac6e437c4330 -
Branch / Tag:
refs/tags/v1.0.5 - Owner: https://github.com/sfluegel05
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@28f65a76355b4f5dc25ac810f605ac6e437c4330 -
Trigger Event:
release
-
Statement type: