An Inductive Logic Programming framework for classifying chemical compounds into ChEBI classes.

Project description

chebILP

An Inductive Logic Programming (ILP) framework for classifying chemical compounds into ChEBI classes. Rules are learned with Popper and evaluated with Clingo (Answer Set Programming).

Installation

Prerequesites

SWI-Prolog must be installed and on PATH (required by Popper). Popper must be installed as well. You can either install the latest version of Popper with

pip install https://github.com/logic-and-learning-lab/Popper

or a forked, slightly outdated version with

pip install https://github.com/sfluegel05/Popper

With the latter, you can use the --mdl_weight_fn, --mdl_weight_fp and --mdl_weight_seize options of the learn command.

Core package

pip install chebILP

Extras:

pip install chebILP[explain] adds xclingo and Pillow for the explain command
pip install chebILP[llm] adds anthropic, langsmith, and python-dotenv for LLM-enhanced rule learning (enhance_with_llms, experimental)

The prepare_dl_preds utility (one-time DL tensor extraction) additionally requires torch, which must be installed separately in an environment that has the DL model checkpoint.

Usage

To get a list of available commands, run

python -m chebILP -h

To get help for a specific command, run

python -m chebILP {command} -h

Workflows

1. Generating new data

An ILP dataset for ChEBI version 248 is available on HuggingFace. However, you can also create your own dataset.

Step 1 — Download ChEBI data and build the dataset (downloads chebi.obo and chebi.sdf.gz, builds cached graph and molecule files, selects label classes, and creates a train/val/test split):

python -m chebILP prepare_dataset \
  --chebi_version 248 \
  --min_pos_samples 25

This writes to data/chebi_v248/:

chebi_graph.pkl — hierarchy graph (networkx DiGraph)
molecules.pkl — molecule DataFrame (index = ChEBI ID)
min50/labels.txt — selected class IDs (one per line)
min50/splits.csv — molecule-level train/val/test split

Step 2 — Build ILP example files (positive/negative molecules per class):

python -m chebILP build_samples \
  --labels_file data/chebi_v248/ChEBI25_3_STAR/labels.txt \
  --chebi_split data/chebi_v248/ChEBI25_3_STAR/splits.csv \
  --chebi_graph_path data/chebi_v248/chebi_graph.pkl \
  --molecules_path data/chebi_v248/ChEBI25_3_STAR/molecules.pkl

Step 3 — Build ILP background knowledge files (molecule features as logic facts):

python -m chebILP build_bk \
  --labels_file data/chebi_v248/ChEBI25_3_STAR/labels.txt \
  --chebi_split data/chebi_v248/ChEBI25_3_STAR/splits.csv \
  --chebi_graph_path data/chebi_v248/chebi_graph.pkl \
  --molecules_path data/chebi_v28/ChEBI25_3_STAR/molecules.pkl

Steps 2 and 3 write files into data/ilp_problems/ (one subdirectory per class). Available predicate sets: atoms, chembl_fgs, chebi_fgs, chebi_fg_rules and chebi_fg_learned_rules.

2. Learning ILP rules

Learn Prolog classification rules for each class using the examples and background knowledge from workflow 1. The learn function will create an updated bias file based on the max_vars, max_body and max_clauses parameters.

Learn rules:

python -m chebILP learn \
  --labels_file data/chebi_v248/ChEBI25_3_STAR/labels.txt \
  --chebi_split data/chebi_v248/ChEBI25_3_STAR/splits.csv \
  --timeout 60

Output is written to a timestamped directory data/results/run_YYYYMMDD_HHMMSS/ containing results.json (one entry per class with the learned program and training score) and config.yml.

Evaluate on test/validation set:

python -m chebILP test \
  --run_to_evaluate data/results/run_20260101_120000 \
  --test_on test

Optional: LLM-enhanced rules

To improve learned programs with an LLM (requires ANTHROPIC_API_KEY in .env):

python -m chebILP.enhance_with_llms \
  --input data/enhance_with_llms/best_ilp_programs_for_leaves.csv \
  --output data/enhance_with_llms/enhanced_run \
  --chebi_version 248

Input CSV must have columns chebi_id, program, run_name. The output directory is readable by the test command.

3. Building an ensemble (ILP + DL)

Combine ILP rules with a deep learning (DL) model for hierarchical multi-label classification. The ensemble uses DL predictions for non-leaf classes and selects either ILP or DL for each leaf class based on validation F1.

Step 1 — Build full ILP prediction tensors (run once per ILP run, for the validation and/or test split):

python -m chebILP build_ilp_preds_for_ensemble \
  --run_dir data/results_val/run_20260101_120000 \
  --predict_on validation \
  --chebi_split data/chebi_v248/ChEBI25_3_STAR/processed/splits.csv \
  --chebi_version 248

This writes full_val_preds.npy and full_val_preds_metadata.json into the run directory. Repeat with --predict_on test for the test split.

Step 2 — Model selection and ILP tensor assembly:

python -m chebILP ensemble_construct \
  --chebi_split data/chebi_v248/ChEBI25_3_STAR/processed/splits.csv \
  --dl_val_preds_npy data/preds/val_preds.npy \
  --dl_val_preds_meta data/preds/val_preds_metadata.json \
  --ilp_val_runs data/results_val/run_A data/results_val/run_B \
  --label_stats data/chebi_v248/ChEBI25_3_STAR/processed/class_stats.csv \
  --predict_on test \
  --output data/ensemble_predictions/ensemble

For each leaf class, selects the ILP run whose ensemble F1 (ILP prediction AND all DL parent predictions >= 0.5) is highest; falls back to DL if no ILP run beats it. Outputs:

ensemble_trusted_models.csv — which model is used per class
ensemble_ilp_preds.npy + ensemble_ilp_preds_metadata.json — ILP tensor for the target split

Step 3 — Aggregate into final predictions:

python -m chebILP ensemble_aggregate \
  --dl_preds_npy data/preds/test_preds.npy \
  --dl_preds_meta data/preds/test_preds_metadata.json \
  --ilp_preds_npy data/ensemble_predictions/ensemble_ilp_preds.npy \
  --ilp_preds_meta data/ensemble_predictions/ensemble_ilp_preds_metadata.json \
  --trusted_models data/ensemble_predictions/ensemble_trusted_models.csv \
  --label_stats data/chebi_v248/ChEBI25_3_STAR/processed/class_stats.csv \
  --output data/ensemble_predictions/final_predictions.npy

DL predictions propagate freely through the class hierarchy; ILP and always-positive classes only predict a class if all label-set parents are already predicted positive. Output is a boolean NumPy array with a matching _metadata.json.

Other utilities

Translate a rule to natural language:

python -m chebILP rule_to_nl --rule_file my_rule.pl --class_parents data/class_parents.json

Explain why a molecule satisfies a rule:

python -m chebILP explain \
  --smiles "CCO" \
  --rule_file my_rule.pl \
  --label_parents_json data/class_parents.json \
  --output explanation.png

Project details

Release history Release notifications | RSS feed

1.0.2

Jun 5, 2026

1.0.1

May 27, 2026

This version

1.0.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chebilp-1.0.0.tar.gz (51.8 kB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chebilp-1.0.0-py3-none-any.whl (56.4 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file chebilp-1.0.0.tar.gz.

File metadata

Download URL: chebilp-1.0.0.tar.gz
Upload date: May 27, 2026
Size: 51.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chebilp-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`3aaffa8da02965c833c865b99a169cd4d756f245463aac79071a92e352a08d5d`
MD5	`15de2f89d43f3bbce5bf318d7589366b`
BLAKE2b-256	`faf7a7beadcf7dce5bc86be508bb59b9b64f11a994319b8104c4c4950068d7e3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chebilp-1.0.0.tar.gz:

Publisher: python-publish.yml on ChEB-AI/chebILP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chebilp-1.0.0.tar.gz
- Subject digest: 3aaffa8da02965c833c865b99a169cd4d756f245463aac79071a92e352a08d5d
- Sigstore transparency entry: 1643373518
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: ChEB-AI/chebILP@753576adfe7eeb05dc30d9bb2ed66b183d9322f7
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/ChEB-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@753576adfe7eeb05dc30d9bb2ed66b183d9322f7
- Trigger Event: release

File details

Details for the file chebilp-1.0.0-py3-none-any.whl.

File metadata

Download URL: chebilp-1.0.0-py3-none-any.whl
Upload date: May 27, 2026
Size: 56.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chebilp-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d90884d9efa6413cf7ded068f49c63c6b0e11f5cec142226671f80004cb99ab`
MD5	`ce7dffef5b3bb67e588c894fc7e01848`
BLAKE2b-256	`b84ddc80e7ca264c2ca625e31743a31f747ce7d84587c8c4c7abb136bc5ce442`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chebilp-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on ChEB-AI/chebILP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chebilp-1.0.0-py3-none-any.whl
- Subject digest: 0d90884d9efa6413cf7ded068f49c63c6b0e11f5cec142226671f80004cb99ab
- Sigstore transparency entry: 1643373637
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: ChEB-AI/chebILP@753576adfe7eeb05dc30d9bb2ed66b183d9322f7
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/ChEB-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@753576adfe7eeb05dc30d9bb2ed66b183d9322f7
- Trigger Event: release

chebilp 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

chebILP

Installation

Prerequesites

Core package

Usage

Workflows

1. Generating new data

2. Learning ILP rules

3. Building an ensemble (ILP + DL)

Other utilities

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance