Translation of Natural language to First Order Logic for ChEBI.

Project description

chebai-NL2FOL

AI workflow for natural language to First-Order Logic (FOL) translation for ChEBI.

Data Files

The learning and validation pipelines expect the C3PO slim dataset files under data/ by default:

data/classes_slim.csv
data/structures.csv
dataset.json

Download them from the C3PO dataset on Hugging Face: https://huggingface.co/datasets/MonarchInit/C3PO/tree/main

These are the same source links referenced in nl_2_fol/inference/cli.py and nl_2_fol/inference/preprocessing/c3po_slim_data.py. The C3PO dataset is associated with https://github.com/chemkg/c3p.

If your files live somewhere else, pass explicit paths to the learning or validation commands:

python nl_2_fol/inference/cli.py learn \
  --slim_dataset_path "/path/to/classes_slim.csv" \
  --structures_data_path "/path/to/structures.csv"

python nl_2_fol/inference/cli.py validate \
  --defs_file_path "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl" \
  --class_name "all" \
  --slim_dataset_path "/path/to/classes_slim.csv" \
  --structures_data_path "/path/to/structures.csv"

The C3P comparison utilities also expect score JSON files from the C3P train/validation score output referenced in the utility help text: https://github.com/chemkg/c3p/pull/23

Start the Learning Pipeline

Run commands from the repository root so the default data/ and prompt-template paths resolve correctly.

To learn definitions with the default Anthropic configuration:

python nl_2_fol/inference/cli.py learn

To learn definitions with the local Ollama Mistral configuration:

python nl_2_fol/inference/cli.py learn_mistral

To learn a single ChEBI class instead of all classes:

python nl_2_fol/inference/cli.py learn --class_name "ethanol"
python nl_2_fol/inference/cli.py learn_mistral --class_name "ethanol"

Useful options:

python nl_2_fol/inference/cli.py learn \
  --api_platform "anthropic" \
  --model_name "claude-opus-4-6" \
  --max_attempts 3 \
  --f1_threshold 0.8

Learning output is saved under:

nl_2_fol/inference/learner/learned/<model_name>/learned_definitions_a<max_attempts>.pkl

For example, with model_name="claude-opus-4-6" and max_attempts=3, the definitions file is:

nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl

Start the Validation Pipeline

After learning has produced a definitions pickle, validate the learned definitions with:

python nl_2_fol/inference/cli.py validate \
  --defs_file_path "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl" \
  --class_name "all"

To validate only one class:

python nl_2_fol/inference/cli.py validate \
  --defs_file_path "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl" \
  --class_name "ethanol"

Single-class validation writes a small result pickle named after the resolved class in the current working directory, for example ethanol.pkl.

For HPC or long validation runs, split the work across jobs by passing a text file with one class name per line. Use a unique file_save_index for each job:

python nl_2_fol/inference/cli.py validate \
  --defs_file_path "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl" \
  --class_names_txt_file_path "classes_0.txt" \
  --file_save_index 0

Full or split validation writes a new definitions pickle next to the input file, for example:

nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3_with_val_file_idx_None_.pkl

When class_names_txt_file_path is used, the index appears in the file name, for example:

nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3_with_val_file_idx_0_.pkl

Use --help to inspect the full set of options:

python nl_2_fol/inference/cli.py learn --help
python nl_2_fol/inference/cli.py learn_mistral --help
python nl_2_fol/inference/cli.py validate --help

Utility Scripts

Helper scripts for inspecting, editing, merging, and comparing learned definitions live in:

nl_2_fol/inference/utils/

Most scripts expect paths to learned definition pickles produced by the learning or validation pipeline.

Inspect or Edit Learned Definitions

Use show_learned_content.py to inspect a learned definitions pickle:

python nl_2_fol/inference/utils/show_learned_content.py \
  --pickle-file "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl" \
  show

Show one class:

python nl_2_fol/inference/utils/show_learned_content.py \
  --pickle-file "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl" \
  show \
  --class-name "ethanol"

Include prompt history while inspecting a class:

python nl_2_fol/inference/utils/show_learned_content.py \
  --pickle-file "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl" \
  show \
  --class-name "ethanol" \
  --system-prompt \
  --conversation-history

Merge Validation Metrics

Use merge_validation_metrics.py to merge validation metrics from one validated pickle into another definitions pickle:

python nl_2_fol/inference/utils/merge_validation_metrics.py \
  "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3.pkl" \
  "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3_with_val_file_idx_0_.pkl" \
  "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3_merged.pkl"

The first path is the target/base pickle, the second path is the source pickle containing validation metrics, and the third path is the output pickle.

Compare With C3P

Use compare_with_c3p.py to compare validated learned definitions against C3P score JSON files and export a CSV:

python nl_2_fol/inference/utils/compare_with_c3p.py \
  --ensemble-c3p-json "c3p_ensemble_train_val_scores.json" \
  --o3-mini-c3p-json "c3p_o3_mini_train_val_scores.json" \
  --learned-pickle "nl_2_fol/inference/learner/learned/claude-opus-4-6/learned_definitions_a3_with_val_file_idx_0_.pkl" \
  --output-csv "comparison_with_c3p_ensemble_o3_mini.csv"

Guide: Run a custom model with Ollama on a computing cluster

This example uses the Mistral FOL model: https://huggingface.co/fvossel/Mistral-Small-24B-Instruct-2501-nl-to-fol

1. Prepare model weights for conversion

Convert the Mistral model to a merged format by calling convert_mistral_to_gguf from: nl_2_fol/prompting/custom_api/_to_gguf.py

Why this step matters:

Hugging Face checkpoints are often split across multiple files.
The conversion pipeline expects a clean merged model directory as input.

What this step does:

Collects and organizes model artifacts into a local mistral-merged folder.
Ensures the tokenizer/config/weights are in a format that llama.cpp conversion can read.

Expected result:

A mistral-merged directory exists in your workspace and is ready for GGUF conversion.

2. Build tools and install local Ollama (no root required)

This step prepares two required components:

llama.cpp, which provides the convert_hf_to_gguf.py conversion script.
A user-local Ollama installation, useful on clusters where you do not have sudo access.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
pip install -r requirements.txt

If you do not have root access on the HPC cluster, install Ollama in your home directory:

mkdir -p "$HOME/ollama"
cd "$HOME/ollama"
curl -L -o ollama-linux-amd64.tar.zst https://ollama.com/download/ollama-linux-amd64.tar.zst
unzstd ollama-linux-amd64.tar.zst
tar -xf ollama-linux-amd64.tar

echo 'export PATH=$HOME/ollama/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
ollama --version

Expected result:

ollama --version prints a version string.
You can run ollama commands without system-wide installation.

3. Convert model to GGUF

From inside llama.cpp:

python convert_hf_to_gguf.py ../mistral-merged --outfile mistral.gguf

Why this step matters:

Ollama loads local models through GGUF files.
This command translates the merged Hugging Face model into a runtime format Ollama can serve.

Expected result:

A file named mistral.gguf is created.
The conversion may take time and use significant CPU/RAM depending on model size.

4. Start Ollama server

Run the Ollama server in background using below command, so it keeps running while you execute your script or commands in same terminal.

export OLLAMA_HOST=http://localhost:<your_custom_port>
export OLLAMA_TIMEOUT=180 # in seconds
ollama serve > ollama.log 2>&1 &
OLLAMA_PID=$!

After you are done with ollama, cleanly stop ollama server using below commands

kill $OLLAMA_PID 2>/dev/null
wait $OLLAMA_PID 2>/dev/null

5. Register the model in Ollama

Create a Modelfile in the directory containing mistral.gguf with:

FROM ./mistral.gguf

Then run:

ollama create my-mistral -f Modelfile
ollama list

Why this step matters:

ollama create registers your GGUF file under a model name (my-mistral).
After registration, you can refer to the model by name in CLI calls.

Expected result:

ollama list shows my-mistral.
You only need to run ollama create ... once per model build.

6. Run NL-to-FOL inference with Ollama

This final step sends requests from your project CLI to the locally running Ollama server. On some clusters, proxy variables can interfere with localhost routing, so unset them first if needed.

export NO_PROXY=127.0.0.1,localhost,.local
export no_proxy=127.0.0.1,localhost,.local

Then run:

python nl_2_fol/inference/cli.py --api_platform="ollama" --model_name="my-mistral"

IMPORTANT: Ensure ollama serve and the inference command run on the same compute node or same allocated job/session if applicable. For example, if ollama serve started on hpc3-52 but the inference command runs on hpc3-54, the connection might fail.

Expected result:

The CLI connects to your local Ollama instance.
The my-mistral model is used for NL-to-FOL inference.

Project details

Release history Release notifications | RSS feed

This version

0.0.1

Jun 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chebai_nl2fol-0.0.1.tar.gz (85.6 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chebai_nl2fol-0.0.1-py3-none-any.whl (89.3 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file chebai_nl2fol-0.0.1.tar.gz.

File metadata

Download URL: chebai_nl2fol-0.0.1.tar.gz
Upload date: Jun 30, 2026
Size: 85.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chebai_nl2fol-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`bfe14c2f196de2f098df9014b9de6d78dd06b42c6b66d611ab93d4755820da0b`
MD5	`312d42adc8c007e2a9438ab3f4b9095f`
BLAKE2b-256	`36e1f0c4ee2009adb1295d9316ab827e150f7e6ce53635e95575bb84c7c65d2e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chebai_nl2fol-0.0.1.tar.gz:

Publisher: python-publish.yml on ChEB-AI/chebai-NL2FOL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chebai_nl2fol-0.0.1.tar.gz
- Subject digest: bfe14c2f196de2f098df9014b9de6d78dd06b42c6b66d611ab93d4755820da0b
- Sigstore transparency entry: 2022266491
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: ChEB-AI/chebai-NL2FOL@7b7f75384e8e3053f30a184390850c8335018bb3
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ChEB-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@7b7f75384e8e3053f30a184390850c8335018bb3
- Trigger Event: release

File details

Details for the file chebai_nl2fol-0.0.1-py3-none-any.whl.

File metadata

Download URL: chebai_nl2fol-0.0.1-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 89.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chebai_nl2fol-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f2ea0b8be53f94e0634d1264791038e0c509dc944e9a59ea0fecd0ed69264a0d`
MD5	`54522638b96ab00f0423b80e84015638`
BLAKE2b-256	`5d9c787d7692ac715cb6bd2bf714d11689d5122e64bc6df679d6177970a28b4d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chebai_nl2fol-0.0.1-py3-none-any.whl:

Publisher: python-publish.yml on ChEB-AI/chebai-NL2FOL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chebai_nl2fol-0.0.1-py3-none-any.whl
- Subject digest: f2ea0b8be53f94e0634d1264791038e0c509dc944e9a59ea0fecd0ed69264a0d
- Sigstore transparency entry: 2022266566
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: ChEB-AI/chebai-NL2FOL@7b7f75384e8e3053f30a184390850c8335018bb3
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ChEB-AI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@7b7f75384e8e3053f30a184390850c8335018bb3
- Trigger Event: release

chebai-nl2fol 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

chebai-NL2FOL

Data Files

Start the Learning Pipeline

Start the Validation Pipeline

Utility Scripts

Inspect or Edit Learned Definitions

Merge Validation Metrics

Compare With C3P

Guide: Run a custom model with Ollama on a computing cluster

1. Prepare model weights for conversion

2. Build tools and install local Ollama (no root required)

3. Convert model to GGUF

4. Start Ollama server

5. Register the model in Ollama

6. Run NL-to-FOL inference with Ollama

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance