Skip to main content

Peptide classifier for ChEBI / PubChem

Project description

ChemLog is a framework for rule-based ontology extension. This repository implements a classification of peptides on the ChEBI and PubChem datasets.

3 methods for classification are implemented:

  1. Using Monadic Second-Order Logic (MSOL) formulas and the MSOL reasoner MONA
  2. Using First-Order Logic (FOL) formulas and a custom FOL model checker
  3. Using an algorithmic implementation

The classification covers the following aspects:

  1. Number of amino acids (in MSOL / FOL: up to 10)
  2. Charge category (either salt, anion, cation, zwitterion or neutral)
  3. Proteinogenic amino acids present

If the corresponding flag is set, ChemLog will also return the ChEBI classes that match this classification. Currently supported are:

ChEBI ID name
16670 peptide
60194 peptide cation
60334 peptide anion
60466 peptide zwitterion
25676 oligopeptide
46761 dipeptide
47923 tripeptide
48030 tetrapeptide
48545 pentapeptide
15841 polypeptide
90799 dipeptide zwitterion
155837 tripeptide zwitterion
64372 emericellamide
65061 2,5-diketopiperazines
24866 salt
25696 organic anion
25697 organic cation
27369 zwitterion

All implementations are based on the same natural language definitions and have been developed jointly. Therefore, it is expected that all methods yield the same result. If you make a different experience, please open an issue. If you are just interested in the results, we recommend using the algorithmic implementation, as it is the fastest one.

If you face problems using ChemLog or have other questions, feel free to open an issue.

Installation

Download the source code from this repository.

Install with

pip install .

If you want to use the MONA reasoner, you have to install it separately (the classifier expects the mona command to be available).

Run the classification

ChemLog provides a command line interface for the classification. Results are in JSON format for each run, alongside a log and a config file.

Command:

python -m chemlog classify

Apply the algorithmic implementation to ChEBI data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-c, --return-chebi-classes   Return ChEBI classes
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-o, --additional-output      Returns intermediate steps in output, useful
                             for explainability and verification
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Command:

python -m chemlog classify-pubchem

Apply the algorithmic implementation to PubChem data.

Options:

-f, --from-batch INTEGER    Start at this PubChem batch (each batch consists of 500,000 ids)
-t, --to-batch INTEGER      End at this PubChem batch (exclusive)
-c, --return-chebi-classes  Return assigned ChEBI classes
-m, --molecules TEXT        List of PubChem IDs to classify. Default: all
                            PubChem entries.
--help                      Show this message and exit.

Command:

python -m chemlog classify-fol

Apply the FOL implementation to PubChem data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-c, --return-chebi-classes   Return ChEBI classes
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-o, --additional-output      Returns intermediate steps in output, useful
                             for explainability and verification
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Command:

python -m chemlog classify-msol

Apply the MSOL implementation to PubChem data.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-m, --molecules TEXT         List of ChEBI IDs to classify. Default: all
                             ChEBI classes.
-n, --run-name TEXT          Results will be stored at
                             results/%y%m%d_%H%M_{run_name}/
-d, --debug-mode             Logs at debug level
-p, --only-peptides          Only consider peptide molecules
--help                       Show this message and exit.

Command:

python -m chemlog verify

Given a results file, run the FOL classification for the same classes. This is typically used to check if the algorithmic and FOL classifications match for certain classes.

Options:

-v, --chebi-version INTEGER  ChEBI version  [required]
-r, --results-dir TEXT       Directory where results.json to analyse is
                             located  [required]
-d, --debug-mode             Returns additional states
-m, --molecules TEXT         List of ChEBI IDs to verify. Default: all ChEBI
                             classes.
-3, --only-3star             Only consider 3-star molecules
--help                       Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemlog-1.0.2.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemlog-1.0.2-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file chemlog-1.0.2.tar.gz.

File metadata

  • Download URL: chemlog-1.0.2.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for chemlog-1.0.2.tar.gz
Algorithm Hash digest
SHA256 c658e642512b78df1bacef11a60df847747567d0bb5df1a60b61fcfaad2be01c
MD5 5810782df2a4de660519a4c6ecdce36e
BLAKE2b-256 be5d6cb5f94e8e12778128aa579e8b4a14852c70cfbbd40870b557b36d2b9e32

See more details on using hashes here.

File details

Details for the file chemlog-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: chemlog-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 24.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for chemlog-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 01de8c5ea4a48216fb89665405b3a53c3579df26e3553731486b5a4c9a93ac8d
MD5 8d85f3b96f0ca93c9d397db5d7f48dfe
BLAKE2b-256 519bdb776a8235b29ab135660d669f2fe5e87a9dd54febf59ee98689f7e0348e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page