Skip to main content

Peptide classifier for ChEBI / PubChem

Project description

ChemLog is a framework for rule-based ontology extension. This repository implements a classification of peptides on the ChEBI and PubChem datasets.

Installation

You can install ChemLog with pip:

pip install chemlog

To get the latest development version, download the source code and install with

pip install .

If you want to use the MONA reasoner, you have to install it separately (the classifier expects the mona command to be available).

Run the classification

ChemLog provides a command line interface for the classification. Results are in JSON format for each run, alongside a log and a config file. Currently, classification of ChEBI and PubChem data is supported. Download and preprocessing of the data are handled automatically. For instances, the following command classifies the 1,000 smallest peptides in ChEBI with the algorithmic method:

python -m chemlog classify-chebi --chebi-version 239 --strategy algo --only-peptides --n-molecules 1000

For more details on the available command line options run

python -m chemlog --help

Publication

Flügel et al. (2025): ChemLog: Making MSOL Viable for Ontological Classification and Learning

How are peptides classified?

4 methods for classification are implemented:

  1. Using Monadic Second-Order Logic (MSOL) formulas with the MSOL model finder MONA
  2. Turning an MSOL model finding problem into a QBF satisfiability problem and solving that with CAQE or DepQBF, using the Bloqqer preprocessor.
  3. Turning an MOSL model finding problem partially into First-Order Logic (FOL) and solving that with a custom FOL model checker (since not all MSOL axioms are translatable, the non-translatable parts are calculated algorithmically).
  4. Using an algorithmic implementation

If you are just interested in the results, we recommend choosing the algorithmic implementation, as it is the fastest and can handle complex molecules.

The classification covers the following aspects:

  1. Number of amino acids (up to 10, except for the algorithmic method, which covers arbitrary sizes)
  2. Charge category (either salt, anion, cation, zwitterion or neutral)
  3. Proteinogenic amino acids present
  4. Emericellamides and 2,5-diketopiperazines

ChemLog will also return the ChEBI classes that match this classification. Currently supported are:

ChEBI ID name
16670 peptide
60194 peptide cation
60334 peptide anion
60466 peptide zwitterion
25676 oligopeptide
46761 dipeptide
47923 tripeptide
48030 tetrapeptide
48545 pentapeptide
15841 polypeptide
90799 dipeptide zwitterion
155837 tripeptide zwitterion
64372 emericellamide
65061 2,5-diketopiperazines
24866 salt
25696 organic anion
25697 organic cation
27369 zwitterion

All implementations are based on the same natural language definitions and have been developed jointly. Therefore, it is expected that all methods yield the same result. If you make a different experience, please open an issue.

If you face problems using ChemLog or have other questions, feel free to open an issue as well.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemlog-1.0.6.tar.gz (78.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemlog-1.0.6-py3-none-any.whl (96.2 kB view details)

Uploaded Python 3

File details

Details for the file chemlog-1.0.6.tar.gz.

File metadata

  • Download URL: chemlog-1.0.6.tar.gz
  • Upload date:
  • Size: 78.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chemlog-1.0.6.tar.gz
Algorithm Hash digest
SHA256 b959f21a9ee04f3e59875682668ded1cbc1ee59a05c7111e18835691c455b5ab
MD5 76d3a18c8af7ffbc3d5e22e81659e493
BLAKE2b-256 3154c407e66aacf0974ec1c653db709fe816ee80248fef5595a57963bd2abafb

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemlog-1.0.6.tar.gz:

Publisher: python-publish.yml on sfluegel05/chemlog-peptides

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemlog-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: chemlog-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 96.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chemlog-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 93a9fb8258043be56d22e8036599744e204ef4d42627cd184f1249802dfaaa01
MD5 37d0edc17812b0ff1d682cb00e9c674a
BLAKE2b-256 01799dc1dff03dd991581ed95ee7f79b15303b7911a075f1a55f073c32c714ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemlog-1.0.6-py3-none-any.whl:

Publisher: python-publish.yml on sfluegel05/chemlog-peptides

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page