Skip to main content

Python library for running proof search using CoPra

Project description

Build Status PyPI version PyPI downloads

COPRA

COPRA: An in-COntext PRoof Agent which uses LLMs like GPTs to prove theorems in formal languages.

Table of Contents


What's New

🤖 vLLM Support for Open Source Models (NEW!)

COPRA now supports vLLM for running open-source LLMs locally! This enables you to use models like Llama, Mistral, DeepSeek, and more for theorem proving.

Features:

  • OpenAI-Compatible API: Works with existing COPRA code via OpenAI-compatible interface
  • GPU Acceleration: Full CUDA support with multi-GPU tensor parallelism
  • Automatic Server Management: Start/stop vLLM servers programmatically
  • Model Flexibility: Support for any Hugging Face model compatible with vLLM
  • ⚠️ Python Requirement: vLLM requires Python ≤ 3.12 (not compatible with 3.14t)

Jump to vLLM Setup →

🚀 Parallel Theorem Execution

COPRA now supports executing proof search for multiple theorems in parallel! This significantly speeds up evaluation on multi-core systems.

Features:

  • Automatic Python Version Detection: Uses threading for Python 3.14t+ and multiprocessing for older versions
  • Configurable Worker Count: Control parallelism with environment variables
  • Backward Compatible: Disabled by default - opt-in via environment variable
  • Better Resource Utilization: Leverage all CPU cores for faster evaluations

Jump to Parallel Execution Setup →

🐍 Python 3.14t Free-Threading Support

COPRA now fully supports Python 3.14+ with free-threaded (GIL-free) execution for improved performance!

Automatic Execution Mode Selection:

  • Python 3.14t+ with GIL disabled: Native free-threading for true parallel execution
  • Python 3.14t+ with GIL enabled: Threading (still benefits from improved performance)
  • Python < 3.14: Multiprocessing (traditional approach)

Setup

Quick Setup for Lean 4

  1. Install itp-interface using the following command: (Our package is available on PyPI: https://pypi.org/project/copra-theorem-prover/)
pip install copra-theorem-prover
  1. Run the following command to prepare the REPL for Lean 4. (The default version is 4.7.0-rc2. You can change the version by setting the LEAN_VERSION environment variable. If no version is set, then 4.7.0-rc2 is used.)

NOTE: The Lean 4 version must match the version of the Lean 4 project you are working with.

export LEAN_VERSION="4.15.0"
install-lean-repl
  1. Run the following command to build the REPL for Lean 4. Make sure that lean --version returns the correct version before running the command below. If not then check if $HOME/.elan/bin is in your path. Recommended to run source $HOME/.elan/env before running the command below.
install-itp-interface

NOTE: These steps are only tested on Linux. For Windows, you can use WSL. These steps will not setup the Coq interface.

Python 3.14t Setup (Free-threaded Python - Optional)

🆕 NEW: COPRA now supports Python 3.14+ with free-threaded (GIL-free) support for improved performance!

  1. Create a Conda environment with Python 3.14t (free-threaded):
# Create environment with free-threaded Python 3.14
conda create -n py314-ft python=3.14 python-freethreading -c conda-forge

# Activate the environment
conda activate py314-ft

# Verify Python version and free-threading support
python --version  # Should show Python 3.14.x
  1. Install COPRA theorem prover:
# Install from PyPI
pip install copra-theorem-prover

# OR for development, install from source
pip install -e .
  1. Run experiments with Python 3.14t:
# Use run.py which automatically detects Python 3.14+ and uses Hydra-free mode
python -m copra.main.run --config-name lean4_simple_experiment

# Or if installed from source
python src/copra/main/run.py --config-name lean4_simple_experiment

Note: Python 3.14t is experimental. Some packages may show compatibility warnings (especially Pydantic and OpenAI), but COPRA has been refactored to work with Python 3.14t.

🚀 Automatic Parallel Execution Selection: COPRA automatically detects your Python version and chooses the best parallel execution strategy:

  • Python 3.14t+ with GIL disabled: Uses native free-threading for true parallel execution
  • Python 3.14t+ with GIL enabled: Uses threading (still benefits from improved performance)
  • Python < 3.14: Uses multiprocessing (traditional approach)

Check your execution mode:

python demo_parallel_execution.py

Full Setup for Coq and Lean

  1. Install OCaml first. Use the instructions here: https://opam.ocaml.org/doc/Install.html. Note that OCaml officially only supports Linux installations. One can use WSL on Windows machines.

  2. Run the following to install Coq on Linux.

sudo apt install build-essential unzip bubblewrap
sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)
  1. Add the following to your .bashrc file: (sometimes the path ~/.opam/default might not exist, so use the directory with version number present in the ~/.opam directory)
export PATH="/home/$USER/.opam/default/bin:$PATH"
  1. Create a Miniconda environment and activate it.

  2. Run the commands for installing the Lean 4 interface as mentioned in Quick Setup for Lean 4.

  3. Add the following to your .bashrc file for Lean:

export PATH="/home/$USER/.elan/bin:$PATH"

vLLM Setup for Open Source Models

🆕 NEW: Run open-source LLMs locally with GPU acceleration via vLLM!

Quick Setup:

# Install with vLLM support
pip install copra-theorem-prover[os_models]

# Or for development
pip install -e .[os_models]

Usage:

Create a config file with vllm: prefix:

# config/eval_settings/my_vllm_config.yaml
gpt_model_name: vllm:codellama/CodeLlama-7b-Instruct-hf
temperature: 0.0
max_tokens_per_action: 100
# ... other settings

Run your experiment:

python -m copra.main.eval_benchmark eval_settings=my_vllm_config benchmark=miniF2F

The vLLM server starts automatically on port 48000. Override with VLLM_PORT environment variable if needed.

Supported Models:

  • vllm:codellama/CodeLlama-7b-Instruct-hf (open-source code model)
  • vllm:meta-llama/Llama-2-7b-chat-hf (general LLM)
  • vllm:EleutherAI/llemma_7b (math-focused LLM)
  • vllm:deepseek-ai/deepseek-math-7b-instruct (math reasoning)
  • Any HuggingFace model compatible with vLLM

Note: vLLM requires Python ≤ 3.12 and CUDA-capable GPU


Running Experiments

Setting up OpenAI API and Running Experiments

  1. You need to create a file .secrets/openai_key.json in the root directory of the project with the OpenAI API key. The file should contain the following:
{
    "organization": "<your-organization-id>",
    "api_key": "<your-api-key>"
}
  1. The experiments are not necessarily thread safe. So, it is recommended to run them sequentially. The commands to run the desired experiments can be found in the file ./src/copra/main/config/experiments.yaml.

  2. Run the following command to run the experiments:

For Python 3.14+ (with free-threaded support):

python src/copra/main/run.py --config-name lean4_simple_experiment
# This uses a Hydra-free implementation compatible with Python 3.14+
# You can change the config name to run different experiments

For Python < 3.14:

python src/copra/main/eval_benchmark.py
# This will run the experiments mentioned in the file `./src/copra/main/config/experiments.yaml`.
# Change the file path in the command above to run other experiments.

Universal command (auto-detects Python version):

python src/copra/main/run.py --config-name lean4_simple_experiment
# This automatically uses Hydra-free mode for Python 3.14+ and Hydra mode for older versions

Note: run.py is the recommended entry point for all Python versions. It automatically detects your Python version and uses the appropriate implementation (Hydra-free for 3.14+, standard Hydra for older versions).

Starting Required Services

Before running COPRA, you need to ensure the required services are running based on your proof language:

For Lean 4 Projects:

No additional services are required. Lean 4 uses a built-in REPL that COPRA manages automatically.

For Isabelle Projects:

You need to start the PISA (Portal for Isabelle) service:

  1. Set the PISA port (optional, defaults to 17000):
export PISA_PORT=17000
  1. Start the PISA service:
# COPRA will automatically start the PISA service when needed
# You can also manually start it if required by your setup
  1. Check if PISA is running:
# COPRA will log whether PISA service is up or needs to be restarted

Note: COPRA automatically manages the PISA service lifecycle, including health checks and restarts. You'll see log messages about PISA service status during execution.

For Coq Projects:

Ensure the correct Coq version is active:

  1. Check your Coq version:
coqc --version
  1. Switch Coq version if needed (using opam):
opam switch <your-coq-version>
eval $(opam env)
  1. Build your Coq project before running COPRA:
cd /path/to/your/coq/project
make  # or your project's build command

For LLM Services (Llama models):

If using non-OpenAI models (like Llama):

  1. COPRA will automatically initialize the Llama service when needed
  2. Check the logs in .log/evals/benchmark/<benchmark-name>/<timestamp>/llama.log
  3. If the service goes down, COPRA will automatically restart it

Important: The ITP projects must be built before running COPRA. Make sure the correct version/switch is active for Coq projects, as different projects may use different Coq versions.

Parallel Theorem Execution (NEW!)

COPRA now supports executing proof search for multiple theorems in parallel, significantly speeding up evaluations on multi-core systems.

Quick Start

Enable parallel execution:

export ENABLE_PARALLEL_THEOREMS=True
python src/copra/main/run.py --config-name lean4_simple_experiment

Control number of workers:

export ENABLE_PARALLEL_THEOREMS=True
export MAX_PARALLEL_WORKERS=4  # Use 4 parallel workers
python src/copra/main/run.py --config-name lean4_simple_experiment

Configuration Options

Environment Variable Description Default Example
ENABLE_PARALLEL_THEOREMS Enable/disable parallel theorem execution False True or False
MAX_PARALLEL_WORKERS Maximum number of parallel workers Auto (CPU cores / 2) 4, 8, 16

How It Works

Sequential Execution (Default):

  • Theorems are processed one at a time
  • Each theorem proof runs in isolation with timeout
  • Original behavior is preserved

Parallel Execution (Enabled):

  • Multiple theorems are processed concurrently
  • Each theorem proof still runs in its own process/thread with timeout
  • Results are collected as they complete
  • Automatic Python version detection:
    • Python < 3.14: Uses ProcessPoolExecutor (multiprocessing)
    • Python 3.14t+: Uses ThreadPoolExecutor (threading with free-threading support)

Examples

Example 1: Basic parallel execution

# Enable parallel execution with auto-detected worker count
export ENABLE_PARALLEL_THEOREMS=True
python -m copra.main.run --config-name lean4_simple_experiment

Example 2: Custom worker count

# Use 8 parallel workers for large theorem sets
export ENABLE_PARALLEL_THEOREMS=True
export MAX_PARALLEL_WORKERS=8
python -m copra.main.run --config-name lean4_simple_experiment

Example 3: Python 3.14t with free-threading

# On Python 3.14t, this will automatically use threading
conda activate py314-ft
export ENABLE_PARALLEL_THEOREMS=True
python -m copra.main.run --config-name lean4_simple_experiment

Example 4: Conservative parallel execution

# Use only 2 workers for resource-constrained systems
export ENABLE_PARALLEL_THEOREMS=True
export MAX_PARALLEL_WORKERS=2
python -m copra.main.run --config-name lean4_simple_experiment

Performance Tips

  1. Match workers to CPU cores: Set MAX_PARALLEL_WORKERS to your number of physical CPU cores
  2. Consider theorem complexity: Complex theorems benefit more from parallelism
  3. Monitor resources: Watch CPU, memory, and I/O usage during execution
  4. Service capacity: Ensure LLM services can handle concurrent requests
  5. Sequential for small sets: Use sequential execution for fewer than 4 theorems (less overhead)

Limitations

  • Memory usage: Parallel execution uses more memory (multiple processes/threads running simultaneously)
  • Shared resources: Services (Llama, Isabelle) must handle concurrent requests
  • Time budget tracking: Time budgets are tracked per-theorem, not globally across parallel executions

Troubleshooting

Issue: Out of memory errors

  • Solution: Reduce MAX_PARALLEL_WORKERS to 2 or 4

Issue: Service connection errors

  • Solution: Reduce parallel workers or use sequential execution for service-heavy workloads

Issue: Slower than sequential execution

  • Cause: Small theorem sets, I/O-bound workloads, or limited CPU cores
  • Solution: Disable parallel execution or reduce worker count

Important Notes

  • The ITP projects must be built before running COPRA
  • For Coq projects, ensure the correct switch/version is active
  • Services (PISA, Llama) are automatically managed by COPRA
  • Check logs in .log/evals/benchmark/<name>/<timestamp>/ for debugging

Citation

You can cite our paper:

@inproceedings{thakur2024context,
  title={An in-context learning agent for formal theorem-proving},
  author={Thakur, Amitayush and Tsoukalas, George and Wen, Yeming and Xin, Jimmy and Chaudhuri, Swarat},
  booktitle={First Conference on Language Modeling},
  year={2024}
}

Our paper can be found here: OpenReview and ArXiv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

copra_theorem_prover-1.1.14.tar.gz (5.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

copra_theorem_prover-1.1.14-py3-none-any.whl (215.1 kB view details)

Uploaded Python 3

File details

Details for the file copra_theorem_prover-1.1.14.tar.gz.

File metadata

  • Download URL: copra_theorem_prover-1.1.14.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for copra_theorem_prover-1.1.14.tar.gz
Algorithm Hash digest
SHA256 9ebc2f6689868eb9ddf50949e306eca67e45700c1be7c9e7b3958953117ca06c
MD5 720dbc4d129563fc58298ff960d3601c
BLAKE2b-256 9f8e5875bb357d7b9795205d1811a5a45b22d91ae812ea9459a4b53534a81962

See more details on using hashes here.

File details

Details for the file copra_theorem_prover-1.1.14-py3-none-any.whl.

File metadata

File hashes

Hashes for copra_theorem_prover-1.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 c2161a07f767c25e0ab20fc210e15486c56d5f4eff98f16eacc7b23a84fc5b6d
MD5 caf164b5dd8e60ba96d0de4af12835ac
BLAKE2b-256 90ae90873d100f83520e68512415cef755a9dc3755c2f4ad0672f7322ba8d50a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page