Formal verification as agentic training signal — CLI + self-hosted runner

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

Athanor

athanor-ai

Lean 4 proof verification as agentic training signal.
Turn formal proofs into reward functions. Score agent output with compilers, not judges.

athanor-ai.com

Your agent writes code. Then it writes a proof that the code is correct. The Lean 4 compiler checks the proof. The result is a training signal with no ambiguity.

import athanor

# Verify a Lean 4 proof
result = athanor.verify_proof("""
theorem add_comm (a b : Nat) : a + b = b + a := by
  omega
""")

print(result.compiles)    # True
print(result.has_sorry)   # False
print(result.score)       # 1.0

Install

pip install git+https://github.com/athanor-ai/athanor-sdk.git@main

What this solves

You have domain expertise. You know what correct code looks like. You want an AI agent to produce verified solutions, not guesses.

The problem: LLM judges are noisy. Unit tests are brittle. Benchmarks don't produce training signal.

The solution: Lean 4 formal proofs are deterministic, machine-checked, and produce continuous reward signal (full proof = 1.0, partial = 0.35, broken = 0.25).

Verify proofs

Check if a Lean 4 proof compiles. Detect sorry placeholders. Catch banned constructs (axiom, import Mathlib, unsafe).

from athanor import verify_proof, check_sorry, score_proof

# Full verification with detailed result
result = verify_proof(proof_code)
result.compiles      # did it compile?
result.has_sorry     # any incomplete proof markers?
result.sorry_count   # how many sorry placeholders?
result.score         # 0.0 - 1.0
result.status        # "full_proof" | "partial_proof" | "compile_error" | "banned"
result.errors        # compiler error messages

# Quick score (just the float)
score = score_proof(proof_code)  # 1.0, 0.35, 0.25, or 0.0

# Check for sorry without full compilation
has_sorry, count = check_sorry(proof_code)

Works with local Lean 4 installation or via Docker (ghcr.io/leanprover/lean4).

Score agent output

Pair code with a proof. Score both. Use the result as reward.

import athanor

env = athanor.make("my-environment", task="my-task")
env.reset()

result = env.score({
    "kernel.py": agent_code,
    "proof.lean": agent_proof,
})

# Scoring layers:
# 1. Does the code work? (verifier checks)
# 2. Does the proof compile? (Lean compiler)
# 3. Is the proof complete? (no sorry)
print(result.score)        # combined score
print(result.lean_status)  # proof status

Agent retry with verifier feedback

Agent gets the scoring output and tries again. No human in the loop. The verifier feedback is the teacher.

results = env.run(
    model="anthropic/claude-sonnet-4-6",
    api_key="...",
    max_retries=3,
    target_score=0.95,
)
# Attempt 1: 0.35 (code correct, proof has sorry)
# Attempt 2: 0.72 (proof compiles, 2 sorry remaining)
# Attempt 3: 0.98 (full proof, verified)

RL training

Use proof scores as reward signal in any RL framework.

from trl import PPOTrainer

env = athanor.make("my-environment")
trainer = PPOTrainer(
    reward_fn=lambda completions: env.reward_fn(completions),
    ...
)

Compatible with TRL, veRL, NeMo-RL, or any custom training loop.

Proof scoring

proof_multiplier:
  1.00  full proof (compiles, no sorry)
  0.35  partial proof (compiles with sorry)
  0.25  broken proof (does not compile)
  0.15  no proof submitted
  0.00  banned construct (axiom, Mathlib, unsafe)

Partial proofs produce gradient. An agent that proves 4 of 7 theorems scores higher than one that proves 0. This is the training signal.

Getting environments

The verify_proof and score_proof functions work standalone with any Lean 4 code. For full environment scoring (code + proof + property tests), contact athanor-ai.com.

CLI

# Container sanity checks — 7 gates including shim re-export + reward-leak probe
athanor preflight --env <env_slug>

# Batch evaluate all tasks in an environment (or filter/patch)
athanor evaluate --env hw-cbmc --model anthropic/claude-sonnet-4-6
athanor evaluate --env hw-cbmc --tasks fix_arb_lock,fix_tlb_ctrl    # subset
athanor evaluate --env hw-cbmc --patch runs/prev_run1.json          # resume

# Score, solve, lint, estimate
athanor score --env <env> --task <slug> --file solution.py
athanor solve --env <env> --task <slug> --model claude-opus-4-6
athanor lint Proof.lean                   # positional
athanor estimate --model claude-sonnet-4-6 --tasks 26

# Run/calibrate/stats
athanor runs --env <env>
athanor calibrate --run runs/<model>_run1.json
athanor stats runs/*.json
athanor compare runs/a.json runs/b.json
athanor eval-status [env_dir]

Results save incrementally after each task (crash-resilient). --patch merges into an existing run file so you can resume interrupted sweeps.

Requirements

Python >= 3.9
Lean 4 or Docker (for proof verification)

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.8.12

Jun 9, 2026

0.8.11

Jun 8, 2026

0.8.10

Jun 6, 2026

0.8.9

Jun 6, 2026

0.8.8

Jun 5, 2026

0.8.7

Jun 5, 2026

0.8.6

Jun 2, 2026

0.8.5

Jun 2, 2026

0.8.3

May 30, 2026

0.8.2

May 30, 2026

0.8.1

May 29, 2026

0.7.5

May 25, 2026

0.7.1

May 19, 2026

0.7.0 yanked

May 19, 2026

0.6.4

May 16, 2026

0.6.3

May 15, 2026

0.6.2

May 15, 2026

0.6.1

May 15, 2026

0.6.0

May 15, 2026

0.5.4

May 9, 2026

0.5.3

May 9, 2026

0.5.2

May 9, 2026

0.5.1

May 9, 2026

0.5.0

May 9, 2026

0.4.9

May 9, 2026

0.4.8

May 9, 2026

0.4.7

May 9, 2026

0.4.6

May 9, 2026

0.4.5

May 9, 2026

0.4.4

May 9, 2026

0.4.3

May 9, 2026

0.4.2

May 8, 2026

0.4.1

Apr 25, 2026

This version

0.4.0

Apr 20, 2026

0.3.5 yanked

Apr 16, 2026

0.3.2 yanked

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

athanor_sdk-0.4.0.tar.gz (380.2 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

athanor_sdk-0.4.0-py3-none-any.whl (205.7 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file athanor_sdk-0.4.0.tar.gz.

File metadata

Download URL: athanor_sdk-0.4.0.tar.gz
Upload date: Apr 20, 2026
Size: 380.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for athanor_sdk-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`2abe9e0a029e7566aa74a3755cd8d859a74d30d2c25ec58b6761a08a76566ea5`
MD5	`4be43e309c2ca189751a130afde1cc7e`
BLAKE2b-256	`b3e1e58fdc3d26cbd37d430a1d8cfe2c8f03f2ab22b771a78b97a1e5b0738b21`

See more details on using hashes here.

File details

Details for the file athanor_sdk-0.4.0-py3-none-any.whl.

File metadata

Download URL: athanor_sdk-0.4.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 205.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for athanor_sdk-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5cef2a2355a2ecba7c259913d3d3424b0a511e1482a8315b67284d0902faecc`
MD5	`7194f70c7c81c37b5ae71d570926ec8c`
BLAKE2b-256	`ddb22809b2388876acd553d61bab297f6895e66e8329fb4b227d696d3df14266`

See more details on using hashes here.

athanor-sdk 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

athanor-ai

Install

What this solves

Verify proofs

Score agent output

Agent retry with verifier feedback

RL training

Proof scoring

Getting environments

CLI

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes