Peptide classifier for ChEBI / PubChem
Project description
ChemLog is a framework for rule-based ontology extension. This repository implements a classification of peptides on the ChEBI and PubChem datasets.
3 methods for classification are implemented:
- Using Monadic Second-Order Logic (MSOL) formulas and the MSOL reasoner MONA
- Using First-Order Logic (FOL) formulas and a custom FOL model checker
- Using an algorithmic implementation
The classification covers the following aspects:
- Number of amino acids (in MSOL / FOL: up to 10)
- Charge category (either salt, anion, cation, zwitterion or neutral)
- Proteinogenic amino acids present
If the corresponding flag is set, ChemLog will also return the ChEBI classes that match this classification. Currently supported are:
| ChEBI ID | name |
|---|---|
| 16670 | peptide |
| 60194 | peptide cation |
| 60334 | peptide anion |
| 60466 | peptide zwitterion |
| 25676 | oligopeptide |
| 46761 | dipeptide |
| 47923 | tripeptide |
| 48030 | tetrapeptide |
| 48545 | pentapeptide |
| 15841 | polypeptide |
| 90799 | dipeptide zwitterion |
| 155837 | tripeptide zwitterion |
| 64372 | emericellamide |
| 65061 | 2,5-diketopiperazines |
| 24866 | salt |
| 25696 | organic anion |
| 25697 | organic cation |
| 27369 | zwitterion |
All implementations are based on the same natural language definitions and have been developed jointly. Therefore, it is expected that all methods yield the same result. If you make a different experience, please open an issue. If you are just interested in the results, we recommend using the algorithmic implementation, as it is the fastest one.
If you face problems using ChemLog or have other questions, feel free to open an issue.
Installation
Download the source code from this repository.
Install with
pip install .
If you want to use the MONA reasoner, you have to install it separately (the classifier expects the mona command to be available).
Run the classification
ChemLog provides a command line interface for the classification. Results are in JSON format for each run, alongside a log and a config file.
Command:
python -m chemlog classify
Apply the algorithmic implementation to ChEBI data.
Options:
-v, --chebi-version INTEGER ChEBI version [required]
-m, --molecules TEXT List of ChEBI IDs to classify. Default: all
ChEBI classes.
-c, --return-chebi-classes Return ChEBI classes
-n, --run-name TEXT Results will be stored at
results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode Logs at debug level
-o, --additional-output Returns intermediate steps in output, useful
for explainability and verification
-3, --only-3star Only consider 3-star molecules
--help Show this message and exit.
Command:
python -m chemlog classify-pubchem
Apply the algorithmic implementation to PubChem data.
Options:
-f, --from-batch INTEGER Start at this PubChem batch (each batch consists of 500,000 ids)
-t, --to-batch INTEGER End at this PubChem batch (exclusive)
-c, --return-chebi-classes Return assigned ChEBI classes
-m, --molecules TEXT List of PubChem IDs to classify. Default: all
PubChem entries.
--help Show this message and exit.
Command:
python -m chemlog classify-fol
Apply the FOL implementation to PubChem data.
Options:
-v, --chebi-version INTEGER ChEBI version [required]
-m, --molecules TEXT List of ChEBI IDs to classify. Default: all
ChEBI classes.
-c, --return-chebi-classes Return ChEBI classes
-n, --run-name TEXT Results will be stored at
results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode Logs at debug level
-o, --additional-output Returns intermediate steps in output, useful
for explainability and verification
-3, --only-3star Only consider 3-star molecules
--help Show this message and exit.
Command:
python -m chemlog classify-msol
Apply the MSOL implementation to PubChem data.
Options:
-v, --chebi-version INTEGER ChEBI version [required]
-m, --molecules TEXT List of ChEBI IDs to classify. Default: all
ChEBI classes.
-n, --run-name TEXT Results will be stored at
results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode Logs at debug level
-p, --only-peptides Only consider peptide molecules
--help Show this message and exit.
Command:
python -m chemlog verify
Given a results file, run the FOL classification for the same classes. This is typically used to check if the algorithmic and FOL classifications match for certain classes.
Options:
-v, --chebi-version INTEGER ChEBI version [required]
-r, --results-dir TEXT Directory where results.json to analyse is
located [required]
-d, --debug-mode Returns additional states
-m, --molecules TEXT List of ChEBI IDs to verify. Default: all ChEBI
classes.
-3, --only-3star Only consider 3-star molecules
--help Show this message and exit.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chemlog-1.0.4.tar.gz.
File metadata
- Download URL: chemlog-1.0.4.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1823389c74b033e07015ad56aa4fd9ea105f95ed2c8d2b3c3cb577457d8717c4
|
|
| MD5 |
bbde63bda560e12c497bcd8d2a158257
|
|
| BLAKE2b-256 |
c4adf36685b7ff90070f5dd46bbe62dc1281628219299b6cc9b63fc55b4b9ad6
|
Provenance
The following attestation bundles were made for chemlog-1.0.4.tar.gz:
Publisher:
python-publish.yml on sfluegel05/chemlog-peptides
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemlog-1.0.4.tar.gz -
Subject digest:
1823389c74b033e07015ad56aa4fd9ea105f95ed2c8d2b3c3cb577457d8717c4 - Sigstore transparency entry: 202061181
- Sigstore integration time:
-
Permalink:
sfluegel05/chemlog-peptides@3fb7a3595be83c73b4f349e88d3e879021f731f7 -
Branch / Tag:
refs/tags/v1.0.4 - Owner: https://github.com/sfluegel05
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3fb7a3595be83c73b4f349e88d3e879021f731f7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file chemlog-1.0.4-py3-none-any.whl.
File metadata
- Download URL: chemlog-1.0.4-py3-none-any.whl
- Upload date:
- Size: 54.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf432539de9bcb282b47822670d075eda94175ba546f943ba50585cd32c4406e
|
|
| MD5 |
3edd9ebe1dcc8f457292da9db03d0329
|
|
| BLAKE2b-256 |
1783da237151b2d9497dba3320929a4a8d99db32a4907fab919831ac2990ea57
|
Provenance
The following attestation bundles were made for chemlog-1.0.4-py3-none-any.whl:
Publisher:
python-publish.yml on sfluegel05/chemlog-peptides
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemlog-1.0.4-py3-none-any.whl -
Subject digest:
bf432539de9bcb282b47822670d075eda94175ba546f943ba50585cd32c4406e - Sigstore transparency entry: 202061184
- Sigstore integration time:
-
Permalink:
sfluegel05/chemlog-peptides@3fb7a3595be83c73b4f349e88d3e879021f731f7 -
Branch / Tag:
refs/tags/v1.0.4 - Owner: https://github.com/sfluegel05
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3fb7a3595be83c73b4f349e88d3e879021f731f7 -
Trigger Event:
release
-
Statement type: