Skip to main content

Peptide classifier for ChEBI / PubChem

Project description

ChemLog is a framework for rule-based ontology extension. This repository implements a classification of peptides on the ChEBI and PubChem datasets.

3 methods for classification are implemented:

  1. Using Monadic Second-Order Logic (MSOL) formulas and the MSOL reasoner MONA
  2. Using First-Order Logic (FOL) formulas and a custom FOL model checker
  3. Using an algorithmic implementation

The classification covers the following aspects:

  1. Number of amino acids (in MSOL / FOL: up to 10)
  2. Charge category (either salt, anion, cation, zwitterion or neutral)
  3. Proteinogenic amino acids present

If the corresponding flag is set, ChemLog will also return the ChEBI classes that match this classification. Currently supported are:

ChEBI ID name
16670 peptide
60194 peptide cation
60334 peptide anion
60466 peptide zwitterion
25676 oligopeptide
46761 dipeptide
47923 tripeptide
48030 tetrapeptide
48545 pentapeptide
15841 polypeptide
90799 dipeptide zwitterion
155837 tripeptide zwitterion
64372 emericellamide
65061 2,5-diketopiperazines
24866 salt
25696 organic anion
25697 organic cation
27369 zwitterion

All implementations are based on the same natural language definitions and have been developed jointly. Therefore, it is expected that all methods yield the same result. If you make a different experience, please open an issue. If you are just interested in the results, we recommend using the algorithmic implementation, as it is the fastest one.

If you face problems using ChemLog or have other questions, feel free to open an issue.

Installation

Download the source code from this repository.

Install with

pip install .

If you want to use the MONA reasoner, you have to install it separately (the classifier expects the mona command to be available).

Run the classification

ChemLog provides a command line interface for the classification. Results are in JSON format for each run, alongside a log and a config file.

Command:

python -m chemlog classify

Apply the algorithmic implementation to ChEBI data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-c, --return-chebi-classes   Return ChEBI classes
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-o, --additional-output      Returns intermediate steps in output, useful
                             for explainability and verification
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Command:

python -m chemlog classify-pubchem

Apply the algorithmic implementation to PubChem data.

Options:

-f, --from-batch INTEGER    Start at this PubChem batch (each batch consists of 500,000 ids)
-t, --to-batch INTEGER      End at this PubChem batch (exclusive)
-c, --return-chebi-classes  Return assigned ChEBI classes
-m, --molecules TEXT        List of PubChem IDs to classify. Default: all
                            PubChem entries.
--help                      Show this message and exit.

Command:

python -m chemlog classify-fol

Apply the FOL implementation to PubChem data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-c, --return-chebi-classes   Return ChEBI classes
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-o, --additional-output      Returns intermediate steps in output, useful
                             for explainability and verification
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Command:

python -m chemlog classify-msol

Apply the MSOL implementation to PubChem data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-p, --only-peptides          Only consider peptide molecules
--help                       Show this message and exit.

Command:

python -m chemlog verify

Given a results file, run the FOL classification for the same classes. This is typically used to check if the algorithmic and FOL classifications match for certain classes.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-r, --results-dir TEXT       Directory where results.json to analyse is
                             located  [required]
-d, --debug-mode             Returns additional states
-m, --molecules TEXT         List of ChEBI IDs to verify. Default: all ChEBI
                             classes.
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemlog-1.0.1.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemlog-1.0.1-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file chemlog-1.0.1.tar.gz.

File metadata

  • Download URL: chemlog-1.0.1.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for chemlog-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ac7954cc89c8415dddc611d5a23447f674024e983ab33f7a25c974112c64021e
MD5 17163f994618c69bdd0e4d830b600726
BLAKE2b-256 d7e8243981994df82ffe2175dea122d11e31a817bee2a2c1ee93fdea59b17e57

See more details on using hashes here.

File details

Details for the file chemlog-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: chemlog-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for chemlog-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc33c1e090cdfca6557c4aeaced47f004c34ec36e6e7393509d1a1bb06d04d88
MD5 f06e7f4ba900148f318452e297e23d3b
BLAKE2b-256 4cb2dec03aa6f9357dee3643f8c001834bde0ee5eeaaf65fe6ed18a06bcc7fc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page