LLM-based reasoning using Z3 theorem proving

These details have not been verified by PyPI

Project links

Project description

ProofOfThought

LLM-based reasoning using Z3 theorem proving with multiple backend support (SMT2 and JSON).

Features

Dual Backend Support: Choose between SMT2 (default) or JSON execution backends
Azure OpenAI Integration: Native support for Azure GPT-4o and GPT-5 models
Comprehensive Benchmarks: Evaluated on 5 reasoning datasets (ProntoQA, FOLIO, ProofWriter, ConditionalQA, StrategyQA)
High-level API: Simple Python interface for reasoning tasks
Batch Evaluation Pipeline: Built-in tools for dataset evaluation and metrics
Postprocessing Techniques: Self-Refine, Self-Consistency, Decomposed Prompting, and Least-to-Most Prompting for enhanced reasoning quality

Release Channels

Stable: pip install proofofthought
Nightly: pip install --pre proofofthought

Nightly releases are built from the current main branch and may contain breaking changes. They use PyPI prerelease versions in the form BASE.devYYYYMMDDHHMM.

Installation

Stable From PyPI

Install the latest stable release:

pip install proofofthought

Nightly From PyPI

Install the latest nightly prerelease:

pip install --pre proofofthought

To pin a specific nightly once it has been published:

pip install "proofofthought==1.0.1.dev202604011230"

Nightly builds are intentionally unstable and may change behavior without a stable compatibility guarantee.

Note: Package name is proofofthought, but imports use z3adapter:

from z3adapter.reasoning import ProofOfThought

From Source (Development)

For contributing or using the latest development version:

git clone https://github.com/debarghaG/proofofthought.git
cd proofofthought
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Prerequisites

Python 3.13
An OpenAI API key or Azure OpenAI endpoint
Z3 solver (z3-solver installs the z3 binary into the active virtual environment)
For nightly documentation, see https://debarghaG.github.io/proofofthought/nightly/

Setup

Environment Variables

Create a .env file in your project directory:

For OpenAI:

OPENAI_API_KEY=your-api-key-here

For Azure OpenAI:

AZURE_OPENAI_ENDPOINT=https://your-endpoint.openai.azure.com/
AZURE_OPENAI_KEY=your-azure-key-here
AZURE_DEPLOYMENT_NAME=gpt-5  # or gpt-4o
AZURE_API_VERSION=2024-12-01-preview

You can also set these as system environment variables instead of using a .env file.

Quick Start

Using OpenAI

import os
from dotenv import load_dotenv
from openai import OpenAI
from z3adapter.reasoning import ProofOfThought

# Load environment variables
load_dotenv()

# Create OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Initialize ProofOfThought
pot = ProofOfThought(llm_client=client, model="gpt-4o")

# Ask a question
result = pot.query("Would Nancy Pelosi publicly denounce abortion?")
print(result.answer)  # False

Using Azure OpenAI

import os
from dotenv import load_dotenv
from openai import AzureOpenAI
from z3adapter.reasoning import ProofOfThought

# Load environment variables
load_dotenv()

# Create Azure OpenAI client
client = AzureOpenAI(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    api_version=os.getenv("AZURE_API_VERSION")
)

# Initialize ProofOfThought with your deployment name
pot = ProofOfThought(
    llm_client=client,
    model=os.getenv("AZURE_DEPLOYMENT_NAME")  # e.g., "gpt-4o" or "gpt-5"
)

# Ask a question
result = pot.query("Would Nancy Pelosi publicly denounce abortion?")
print(result.answer)  # False

Batch Evaluation

from z3adapter.reasoning import EvaluationPipeline, ProofOfThought

evaluator = EvaluationPipeline(proof_of_thought=pot, output_dir="results/")
result = evaluator.evaluate(
    dataset="data/strategyQA_train.json",
    question_field="question",
    answer_field="answer",
    max_samples=10
)
print(f"Accuracy: {result.metrics.accuracy:.2%}")

Backend Selection

ProofOfThought supports two execution backends:

# SMT2 backend (default) - Standard SMT-LIB 2.0 via Z3 CLI
pot = ProofOfThought(llm_client=client, backend="smt2")

# JSON backend - Custom DSL via Python Z3 API
pot = ProofOfThought(llm_client=client, backend="json")

See docs/backends.md for details on choosing a backend.

Postprocessing Techniques

Enhance reasoning quality with advanced postprocessing techniques:

# Enable Self-Refine for iterative refinement
pot = ProofOfThought(
    llm_client=client,
    postprocessors=["self_refine"],
    postprocessor_configs={"self_refine": {"num_iterations": 2}}
)

# Use Self-Consistency for improved reliability via majority voting
pot = ProofOfThought(
    llm_client=client,
    postprocessors=["self_consistency"],
    postprocessor_configs={"self_consistency": {"num_samples": 5}}
)

# Chain multiple postprocessors
pot = ProofOfThought(
    llm_client=client,
    postprocessors=["self_refine", "self_consistency"]
)

Available techniques:

Self-Refine: Iterative refinement through self-critique
Self-Consistency: Majority voting across multiple reasoning paths
Decomposed Prompting: Breaking complex questions into sub-questions
Least-to-Most Prompting: Progressive problem solving from simple to complex

See docs/postprocessors.md for complete documentation and usage examples.

Architecture

The system has two layers:

High-level API (z3adapter.reasoning) - Simple Python interface for reasoning tasks
Low-level execution (z3adapter.backends) - JSON DSL or SMT2 backend for Z3

Most users should use the high-level API.

Examples

The examples/ directory contains complete working examples for various use cases:

simple_usage.py - Basic usage with OpenAI
azure_simple_example.py - Simple Azure OpenAI integration
backend_comparison.py - Comparing SMT2 vs JSON backends
batch_evaluation.py - Evaluating on datasets
postprocessor_example.py - Using postprocessing techniques

Running Examples After pip Install

If you installed via pip install proofofthought, you can create your own scripts anywhere using the Quick Start examples above. The examples directory is primarily for development and testing.

Running Examples in Development Mode

If you cloned the repository:

cd /path/to/proofofthought
source venv/bin/activate
python examples/simple_usage.py

Note: Run examples from the repository root with the project virtual environment activated so both z3adapter and venv/bin/z3 are available.

Running Experiments

You can use this repository as a strong baseline for LLM+Solver methods. This code is generally benchmarked with GPT-5 on the first 100 samples of 5 datasets, as an indicator of whether we broke something during development. These numbers are not the best, and you can certainly get better numbers with better prompt engineering with this same tooling. Please feel free to put in a PR if you get better numbers with modified prompts.

To run all benchmarks with both backends and generate results:

python experiments_pipeline.py

This will:

Run all 5 benchmarks (ProntoQA, FOLIO, ProofWriter, ConditionalQA, StrategyQA)
Test both SMT2 and JSON backends
Generate results tables in results/
Automatically update the benchmark results section below

Benchmark Results

Last Updated: 2025-10-16 18:14:07

Benchmark	Backend	Samples	Accuracy	Precision	Recall	F1 Score	Success Rate
PRONTOQA	SMT2	100	100.00%	1.0000	1.0000	1.0000	100.00%
FOLIO	SMT2	100	69.00%	0.6949	0.7736	0.7321	99.00%
PROOFWRITER	SMT2	96	98.96%	1.0000	1.0000	1.0000	98.96%
CONDITIONALQA	SMT2	100	83.00%	0.9375	0.8219	0.8759	100.00%
STRATEGYQA	SMT2	100	84.00%	0.8205	0.7805	0.8000	100.00%
PRONTOQA	JSON	100	99.00%	1.0000	0.9815	0.9907	100.00%
FOLIO	JSON	100	76.00%	0.7619	0.9412	0.8421	94.00%
PROOFWRITER	JSON	96	95.83%	1.0000	1.0000	1.0000	95.83%
CONDITIONALQA	JSON	100	76.00%	0.9180	0.8750	0.8960	89.00%
STRATEGYQA	JSON	100	68.00%	0.7500	0.7895	0.7692	86.00%

Citations

Please consider citing our work if you find this useful.

@inproceedings{
ganguly2024proof,
title={{PROOF} {OF} {THOUGHT} : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning},
author={Debargha Ganguly and Srinivasan Iyengar and Vipin Chaudhary and Shivkumar Kalyanaraman},
booktitle={The First Workshop on System-2 Reasoning at Scale, NeurIPS'24},
year={2024},
url={https://openreview.net/forum?id=Pxx3r14j3U}
}

@inproceedings{
ganguly2025grammars,
title={Grammars of Formal Uncertainty: When to Trust {LLM}s in Automated Reasoning Tasks},
author={Debargha Ganguly and Vikash Singh and Sreehari Sankar and Biyao Zhang and Xuecen Zhang and Srinivasan Iyengar and Xiaotian Han and Amit Sharma and Shivkumar Kalyanaraman and Vipin Chaudhary},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=QfKpJ00t2L}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.0rc1 pre-release

Jun 12, 2026

1.0.1

Oct 18, 2025

This version

1.0.1.dev202604011558 pre-release

Apr 1, 2026

1.0.0

Oct 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proofofthought-1.0.1.dev202604011558.tar.gz (78.0 kB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

proofofthought-1.0.1.dev202604011558-py3-none-any.whl (96.4 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file proofofthought-1.0.1.dev202604011558.tar.gz.

File metadata

Download URL: proofofthought-1.0.1.dev202604011558.tar.gz
Upload date: Apr 1, 2026
Size: 78.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for proofofthought-1.0.1.dev202604011558.tar.gz
Algorithm	Hash digest
SHA256	`4d5a05dfc52e822af73e0d65e1ed64dc2da2518850f3c4391c8289c62ab3ae11`
MD5	`95a9dafcba5ea178df1d275d4035a1bf`
BLAKE2b-256	`a199be67877ebe5417d5b5bfebe3e0c400c0d30eff7d75de0d4a2a538d5510fb`

See more details on using hashes here.

File details

Details for the file proofofthought-1.0.1.dev202604011558-py3-none-any.whl.

File metadata

Download URL: proofofthought-1.0.1.dev202604011558-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 96.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for proofofthought-1.0.1.dev202604011558-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8466ea0a315481ed339d70d920e475bc740eac1d3db8c5fbf163584392aec47c`
MD5	`60008355586544f702b9eb49182161f5`
BLAKE2b-256	`d33fad81cffcf93e1fd8bd1e35f5441998f5de2c79285f3e35701b6f1a9c89ce`

See more details on using hashes here.

proofofthought 1.0.1.dev202604011558

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ProofOfThought

Features

Release Channels

Installation

Stable From PyPI

Nightly From PyPI

From Source (Development)

Prerequisites

Setup

Environment Variables

Quick Start

Using OpenAI

Using Azure OpenAI

Batch Evaluation

Backend Selection

Postprocessing Techniques

Architecture

Examples

Running Examples After pip Install

Running Examples in Development Mode

Running Experiments

Benchmark Results

Citations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes