Python library for training LLMs with Interactive Theorem Proving data.
Project description
ProofWala
Introduction
Neural networks have shown substantial promise at automatic theorem-proving in interactive proof assistants (ITPs) like Lean and Coq. However, most neural theorem-proving models are restricted to specific ITPs, leaving out opportunities for cross-lingual transfer between ITPs. We address this weakness with a multilingual proof framework, ProofWala, that allows a standardized form of interaction between neural theorem-provers and two established ITPs (Coq and Lean). It enables the collection of multilingual proof step data---data recording the result of proof actions on ITP states---for training neural provers. ProofWala allows the systematic evaluation of a model's performance across different ITPs and problem domains via efficient parallel proof search algorithms. We show that multilingual training enabled by ProofWala can lead to successful transfer across ITPs.
Framework Details
ProofWala, a unified framework for extracting and organizing training data for formal theorem proving in Lean and Coq. The framework supports the generation of training data for these ITPs from any formal theorem-proving Git repository (such as Mathlib, CompCert, MathComp, GeoCoq, & Category Theory) and the subsequent training of LLMs for single proof step generation. Our data collection format is standardized across ITPs, and we have created generic prompt formatting schemes across different ITPs and domains. The framework also helps with end-to-end proof search and collection of the annotated proof trees which can be further used for analysis and visualization.
Our framework has three components: (i) the interface module: for the execution of proof steps (tactics) on various ITPs (see ITP Interface Repository), (ii) the proof step generation & training module: for generating proof step data and training proof step generation model, and (iii) the parallel proof search module: for using the guidance from proof step generation model to do end-to-end the proof search. The Figure below shows the interaction between our different modules.
Quick Installation
- Pip install the package:
pip install proof-wala
- Install the itp-interface dependencies: (follow the instructions in the itp-interface repository for more details and custom version of Lean 4 and Coq installation)
install-lean-repl
install-itp-interface
Note: The above step will only install Lean 4 and interface with Lean 4. To enable both Lean 4 and Coq interface, follow the instructions in the itp-interface repository.
- Install pytorch dependency. This cannot be installed via
pip install proof-walaas it requires the user to install the correct version of torch based on the CUDA version and the GPU availability. (See the official pytorch website for more details). For example, to install pytorch of CUDA 12.4, run the following command:
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124
NOTE: The above command is for CUDA 12.4, for other CUDA versions, please refer to the pytorch website.
NOTE: For example, to install pytorch for CUDA 12.2, run the following command:
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu121
- Test the installation by running the following command:
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
- If you don't want to train the model and just want to download the pre-trained model, you can download next proof step prediction model from huggingface model hub:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import pipeline
model_name = "amitayusht/ProofWala-Multilingual"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0) # device=0 for GPU, -1 for CPU
# Example usage
state = """
Goals to prove:
[GOALS]
[GOAL] 1
forall n : nat, n + 1 = 1 + n
[END]
"""
result = pipeline(state, max_length=100, num_return_sequences=1)
print(result[0]['generated_text'])
# Output:
# [RUN TACTIC]
# induction n; trivial.
# [END]
NOTE: Our proof step generation model is trained on the multilingual proof step data is available at the huggingface model hub.
- You need to install
graphvizfor visualization of the proof trees. You can install it using the following command:
sudo apt-get install graphviz
Running the Proof Search
- After installing the
proof-wala package, you can run the parallel proof search by first starting the ray cluster. This step is needed if the ray cluster is not already running. To start the ray cluster, run the following command:
proof-wala-init-ray --num_cpus 20 --object_store_memory 53687091200 --memory 53687091200 --metrics_report_interval_ms 300000000 &
# ^This process needs to run in background. It will start the ray cluster and save the session details in the .log/ray/session_latest file
# ^You can also create the session file with info of an existing ray cluster without starting a new one, by providing the session details in the .log/ray/session_latest file
NOTE: The command
proof-wala-init-rayactually runssrc/proof_wala/main/init_ray.py.
NOTE: Make sure to delete the
.log/ray/session_latestfile if you want to start a new ray cluster. Otherwise the new ray cluster will use the existing session details and might cause issues.
NOTE: for the
session_latestfile, we use the following json format:
{
"node_ip_address": "128.83.141.189",
"raylet_ip_address": "128.83.141.189",
"redis_address": null,
"object_store_address": "/tmp/ray/session_2025-01-30_18-30-47_764457_139622/sockets/plasma_store",
"raylet_socket_name": "/tmp/ray/session_2025-01-30_18-30-47_764457_139622/sockets/raylet", "webui_url": "127.0.0.1:8265",
"session_dir": "/tmp/ray/session_2025-01-30_18-30-47_764457_139622", "metrics_export_port": 64063,
"gcs_address": "128.83.141.189:61362",
"address": "128.83.141.189:61362", "dashboard_agent_listen_port": 52365,
"node_id":"1d44a4b8ce6d2bbf51077e7289b648315c291bbaba1d7dc9aa4c6497",
"main_pid": 139622
}
- Build the Lean/Coq project before running the proof search. For example, to build the example Lean project, run the following command:
cd src/proof_wala/data/proofs/lean/lean4_proj
lake build
- To run the parallel proof search module, run the following command from the root directory:
export FOLLOW_SEED="True"
export CUDA_VISIBLE_DEVICES="0,1,2,3" # Based on the number of GPUs available
proof-wala-search --config-dir=src/proof_wala/main/config --config-name=eval_simple_lean_test_multilingual.yaml
#^ This will automatically start the proof search on the ray cluster mentioned in the session_latest file
#^ For a distributed run we assume that each node has access to the same data, models, essentially the same file system (NFS, SMB, etc)
NOTE: The command
proof-wala-searchactually runssrc/proof_wala/main/run_proof_search.py.
NOTE: The config file
eval_simple_lean_test_multilingual.yaml(src/proof_wala/main/config/eval_simple_lean_test_multilingual.yaml). You can change the config file based on your requirements. These config files are based on hydra configuration system. For more details on the hydra configuration system, refer to the hydra documentation.
NOTE: The
<root>macro mentioned in the config files can be set using the environment variableROOTor one can directly replace the<root>with the path to relevant directory in the config file.
Training the Proof Step Generation Model
-
First generate the proof step data by following the instructions in the itp-interface repository.
-
To train the model for proof step generation, run the following command from the root directory:
torchrun --nproc-per-node 2 --master-port 31052 proof-wala-train --config-dir=src/proof_wala/main/config --config-name multilingual_random_base_experiment.yaml
# ^ This will run training on 2 GPUs, on the same node
# ^ For a single node or no GPU training just remove the --nproc-per-node 2 and --master-port 31052 and torchrun
# The above training job can also be run on a slurm cluster/(or any distributed cluster), for that refer the per_node_job.sh and tacc_slurm.sh script in the root directory
NOTE: The command
proof-wala-trainactually runssrc/proof_wala/main/run.py.
NOTE: Check the
multilingual_random_base_experiment.yamlconfiguration in the src/proof_wala/main/config directory for the exact details of the training configuration and where the model will be saved, the location where it expects the data to be present, etc. These config files are based on hydra configuration system. For more details on the hydra configuration system, refer to the hydra documentation.
NOTE: The
<root>macro mentioned in the config files can be set using the environment variableROOTor one can directly replace the<root>with the path to relevant directory in the config file.
Our Paper
For more details, please refer to our paper: ProofWala: Multilingual Proof Data Synthesis and Theorem-Proving.
@misc{thakur2025proofwala,
title={${\rm P{\small ROOF}W{\small ALA}}$: Multilingual Proof Data Synthesis and Theorem-Proving},
author={Amitayush Thakur and George Tsoukalas and Greg Durrett and Swarat Chaudhuri},
year={2025},
eprint={2502.04671},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.04671},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file proof_wala-1.1.9.tar.gz.
File metadata
- Download URL: proof_wala-1.1.9.tar.gz
- Upload date:
- Size: 406.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8455a7ddd99eb32a418122197da5d503fc978016ddd1b9a762e4363d1f2cfde
|
|
| MD5 |
23b54ea56a7b1ec69f4b88d86a3819a3
|
|
| BLAKE2b-256 |
5428493f2a62995d34ae11902e3be5a6c51caf42b1e98dac5fe0fba3728bb611
|
File details
Details for the file proof_wala-1.1.9-py3-none-any.whl.
File metadata
- Download URL: proof_wala-1.1.9-py3-none-any.whl
- Upload date:
- Size: 252.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41fc9e582bf1120701270729be167def13dbae6a3a2b16069b7aaf414fe5e749
|
|
| MD5 |
98e9b3ad6e6e35146ddcf150d2ee9766
|
|
| BLAKE2b-256 |
5f02f8024949739adbcd32c9ca40ca89b2f04c581c28dfb22cb80456a1126d55
|