Skip to main content

Peptide classifier for ChEBI / PubChem

Project description

ChemLog is a framework for rule-based ontology extension. This repository implements a classification of peptides on the ChEBI and PubChem datasets.

3 methods for classification are implemented:

  1. Using Monadic Second-Order Logic (MSOL) formulas and the MSOL reasoner MONA
  2. Using First-Order Logic (FOL) formulas and a custom FOL model checker
  3. Using an algorithmic implementation

The classification covers the following aspects:

  1. Number of amino acids (in MSOL / FOL: up to 10)
  2. Charge category (either salt, anion, cation, zwitterion or neutral)
  3. Proteinogenic amino acids present

If the corresponding flag is set, ChemLog will also return the ChEBI classes that match this classification. Currently supported are:

ChEBI ID name
16670 peptide
60194 peptide cation
60334 peptide anion
60466 peptide zwitterion
25676 oligopeptide
46761 dipeptide
47923 tripeptide
48030 tetrapeptide
48545 pentapeptide
15841 polypeptide
90799 dipeptide zwitterion
155837 tripeptide zwitterion
64372 emericellamide
65061 2,5-diketopiperazines
24866 salt
25696 organic anion
25697 organic cation
27369 zwitterion

All implementations are based on the same natural language definitions and have been developed jointly. Therefore, it is expected that all methods yield the same result. If you make a different experience, please open an issue. If you are just interested in the results, we recommend using the algorithmic implementation, as it is the fastest one.

If you face problems using ChemLog or have other questions, feel free to open an issue.

Installation

Download the source code from this repository.

Install with

pip install .

If you want to use the MONA reasoner, you have to install it separately (the classifier expects the mona command to be available).

Run the classification

ChemLog provides a command line interface for the classification. Results are in JSON format for each run, alongside a log and a config file.

Command:

python -m chemlog classify

Apply the algorithmic implementation to ChEBI data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-c, --return-chebi-classes   Return ChEBI classes
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-o, --additional-output      Returns intermediate steps in output, useful
                             for explainability and verification
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Command:

python -m chemlog classify-pubchem

Apply the algorithmic implementation to PubChem data.

Options:

-f, --from-batch INTEGER    Start at this PubChem batch (each batch consists of 500,000 ids)
-t, --to-batch INTEGER      End at this PubChem batch (exclusive)
-c, --return-chebi-classes  Return assigned ChEBI classes
-m, --molecules TEXT        List of PubChem IDs to classify. Default: all
                            PubChem entries.
--help                      Show this message and exit.

Command:

python -m chemlog classify-fol

Apply the FOL implementation to PubChem data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-c, --return-chebi-classes   Return ChEBI classes
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-o, --additional-output      Returns intermediate steps in output, useful
                             for explainability and verification
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Command:

python -m chemlog classify-msol

Apply the MSOL implementation to PubChem data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-p, --only-peptides          Only consider peptide molecules
--help                       Show this message and exit.

Command:

python -m chemlog verify

Given a results file, run the FOL classification for the same classes. This is typically used to check if the algorithmic and FOL classifications match for certain classes.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-r, --results-dir TEXT       Directory where results.json to analyse is
                             located  [required]
-d, --debug-mode             Returns additional states
-m, --molecules TEXT         List of ChEBI IDs to verify. Default: all ChEBI
                             classes.
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemlog-1.0.3.tar.gz (39.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemlog-1.0.3-py3-none-any.whl (46.1 kB view details)

Uploaded Python 3

File details

Details for the file chemlog-1.0.3.tar.gz.

File metadata

  • Download URL: chemlog-1.0.3.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for chemlog-1.0.3.tar.gz
Algorithm Hash digest
SHA256 6db7ac7636f6a7964a304af17f5da9e9bf9990aff454d0ee18c06864d92375cf
MD5 6af2f6dc6722d955e80d8f248cd8974f
BLAKE2b-256 c6d7f45af76f25de38bc3491443684f10fd16305ca229bd530ce14fe8f5995bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemlog-1.0.3.tar.gz:

Publisher: python-publish.yml on sfluegel05/chemlog-peptides

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemlog-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: chemlog-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 46.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for chemlog-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d54b50ed71a409590cb046145e707efd37c9526b51ac2994f4084996ad619b85
MD5 3c6fa9f29fb7f47ad1012479165683eb
BLAKE2b-256 98a07de8083330c8808a75ac5c03522a25e8748562198ce733a64d1f5c421b57

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemlog-1.0.3-py3-none-any.whl:

Publisher: python-publish.yml on sfluegel05/chemlog-peptides

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page