SATQuest
Project description
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Preprint Paper | Datasets | PyPI
🚀 Quickstart
# Install dependencies
# pip install datasets satquest
from datasets import load_dataset
from satquest import CNF, create_problem, create_question
cnf = CNF(dimacs=load_dataset('sdpkjc/SATQuest', split='test')[0]['sat_dimacs'])
# cnf.shuffle()
P, Q = create_problem('SATSP', cnf), create_question('math')
prompt = P.accept(Q)
answer = P.solution # LLM(prompt)
reward = int(P.check(answer))
Dataset Generation
# Install dependencies
pip install datasets numpy tyro
# Run dataset generation
python gen_cnf_dataset.py --hf-entity {YOUR_HF_ENTITY} --seed 9527
python gen_cnf_rft_dataset.py --hf-entity {YOUR_HF_ENTITY} --seed 9527
Evaluation
# Install dependencies
pip install datasets weave wandb tyro openai
# Run evaluation
python eval_model.py \
--exp-name {YOUR_EXP_NAME} \
--wandb-project "SATQuest-Eval" \
--hf-dataset-name "sdpkjc/SATQuest" \
--p-type-list SATSP \
--q-type-list math \
--llm-model "gpt-4o" \
--max-tokens 16384 \
--temperature 0.6 \
--num-example 10 \
--stream True \
--cnf-shuffle False \
--n-repeat 1
Evaluation Parameters
exp-name: Name of your experimentwandb-project: Weights & Biases project namehf-dataset-name: HuggingFace dataset namep-type-list: Problem types to evaluate (e.g., "SATSP", "MaxSAT", "MCS", "MUS", "SATDP_SAT", "SATDP_UNSAT")q-type-list: Question types to evaluate (e.g., "math", "dimacs", "story", "dualstory")llm-model: LLM model to use for evaluationmax-tokens: Maximum tokens for LLM responsetemperature: Temperature for LLM generationnum-example: Number of examples to evaluatestream: Whether to stream LLM responsescnf-shuffle: Whether to shuffle CNF formulasn-repeat: Number of times to repeat evaluation
The evaluation results will be logged to Weights & Biases.
Reinforcement Fine-Tuning (RFT)
# Install dependencies
pip install git+https://github.com/huggingface/trl.git@aaf396 # for reproducibility
pip install datasets vllm tyro
# Run RFT training
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup trl vllm-serve --model "Qwen/Qwen2.5-7B-Instruct" --tensor_parallel_size 4 --max_model_len 16384 --gpu_memory_utilization 0.9 --enable_prefix_caching True &
CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch --num-processes 4 --config-file zero3.yaml rft.py --model-id "Qwen/Qwen2.5-7B-Instruct" --p-list SATSP --q-list math --exp-name "test" --server-ip "0.0.0.0"
RFT Parameters
model-id: Base model to fine-tune (default: "Qwen/Qwen2.5-7B-Instruct")p-list: Problem types for training (e.g., "SATSP", "MaxSAT", "MCS", "MUS", "SATDP_SAT", "SATDP_UNSAT")q-list: Question formats for training (e.g., "math", "story")exp-name: Name of your experimentserver-ip: IP address for VLLM server
Citation
@misc{satquest,
author = {Yanxiao Zhao, Yaqian Li, Zihao Bo, Rinyoichi Takezoe, Haojia Hui, Mo Guang, Lei Ren, Xiaolin Qin, Kaiwen Long},
title = {SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/sdpkjc/SATQuest}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
satquest-0.1.1.tar.gz
(12.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
satquest-0.1.1-py3-none-any.whl
(11.5 kB
view details)
File details
Details for the file satquest-0.1.1.tar.gz.
File metadata
- Download URL: satquest-0.1.1.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fea50bd7cae5611e693cb65a39b8f97f4c714ad644bfecf4514851c6123b5f76
|
|
| MD5 |
18106fb4eb9ce528a6fab4334aa6a656
|
|
| BLAKE2b-256 |
95ee2aae8fdc5b80328d0455841d3b2833359c38268d1886e9219af49d45887c
|
File details
Details for the file satquest-0.1.1-py3-none-any.whl.
File metadata
- Download URL: satquest-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d81ee5a9c7a90789a3977f00824ca20bcd4831191ae1bbc0b722529d8ecb21ff
|
|
| MD5 |
d3688cacde690a2700b8f69f7f813836
|
|
| BLAKE2b-256 |
65b96deb3ce1c63fd6ec278b6c4aa0f21774c18161bfe773458a661f80b256f8
|