A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Project description
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
🚀 Quickstart
# Install dependencies
# pip install datasets satquest
from datasets import load_dataset
from satquest import CNF, create_problem, create_question
item = load_dataset('sdpkjc/SATQuest', split='test')[0]
cnf = CNF(dimacs=item['sat_dimacs'])
# cnf.shuffle()
problem = create_problem('SATSP', cnf) # or 'SATDP', 'MaxSAT', 'MCS', 'MUS'
question = create_question('math') # or 'dimacs', 'story', 'dualstory'
prompt = problem.accept(question)
answer = problem.solution # reference answer
reward = int(problem.check(answer)) # 1 if answer is correct, 0 otherwise
Dataset Generation
# Run dataset generation
uv run --group gen gen_cnf_dataset.py --hf-entity {YOUR_HF_ENTITY} --seed 9527
uv run --group gen gen_cnf_rft_dataset.py --hf-entity {YOUR_HF_ENTITY} --seed 9527
Evaluation
# Run evaluation
uv run --group eval eval_model.py \
--exp-name {YOUR_EXP_NAME} \
--wandb-project "SATQuest-Eval" \
--hf-dataset-name "sdpkjc/SATQuest" \
--p-type-list SATSP \
--q-type-list math \
--llm-model "gpt-4o" \
--max-tokens 16384 \
--temperature 0.6 \
--num-example 10 \
--stream True \
--cnf-shuffle False \
--n-repeat 1
Evaluation Parameters
exp-name: Name of your experimentwandb-project: Weights & Biases project namehf-dataset-name: HuggingFace dataset namep-type-list: Problem types to evaluate (e.g., "SATSP", "MaxSAT", "MCS", "MUS", "SATDP_SAT", "SATDP_UNSAT")q-type-list: Question types to evaluate (e.g., "math", "dimacs", "story", "dualstory")llm-model: LLM model to use for evaluationmax-tokens: Maximum tokens for LLM responsetemperature: Temperature for LLM generationnum-example: Number of examples to evaluatestream: Whether to stream LLM responsescnf-shuffle: Whether to shuffle CNF formulasn-repeat: Number of times to repeat evaluation
The evaluation results will be logged to Weights & Biases.
Reinforcement Fine-Tuning (RFT)
# Run RFT training
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup uv run --group rft trl vllm-serve --model "Qwen/Qwen2.5-7B-Instruct" --tensor_parallel_size 4 --max_model_len 16384 --gpu_memory_utilization 0.9 --enable_prefix_caching True &
CUDA_VISIBLE_DEVICES=4,5,6,7 uv run --group rft accelerate launch --num-processes 4 --config-file zero3.yaml rft.py --model-id "Qwen/Qwen2.5-7B-Instruct" --p-list SATSP --q-list math --exp-name "test" --server-ip "0.0.0.0"
RFT Parameters
model-id: Base model to fine-tune (default: "Qwen/Qwen2.5-7B-Instruct")p-list: Problem types for training (e.g., "SATSP", "MaxSAT", "MCS", "MUS", "SATDP_SAT", "SATDP_UNSAT")q-list: Question formats for training (e.g., "math", "story")exp-name: Name of your experimentserver-ip: IP address for VLLM server
Citation
@misc{satquest2025,
author = {Yanxiao Zhao, Yaqian Li, Zihao Bo, Rinyoichi Takezoe, Haojia Hui, Mo Guang, Lei Ren, Xiaolin Qin, Kaiwen Long},
title = {SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/sdpkjc/SATQuest}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
satquest-0.1.2.tar.gz
(13.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
satquest-0.1.2-py3-none-any.whl
(12.5 kB
view details)
File details
Details for the file satquest-0.1.2.tar.gz.
File metadata
- Download URL: satquest-0.1.2.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc41fdc7714844eced1dc46a9d76fb820df02880b29f2beeb2430c1a1ced69d3
|
|
| MD5 |
6ec8b1f64e386383c2a55e7a99c5b798
|
|
| BLAKE2b-256 |
2f312ac149667fa8d39611dd7ea462de0414e76566853b25801fca115015ac68
|
File details
Details for the file satquest-0.1.2-py3-none-any.whl.
File metadata
- Download URL: satquest-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8840be8f670dae92fbd1a965ae1dd3899c54a5ac6ad6bf5ddae22074e16fcaf4
|
|
| MD5 |
fabefd473ca5515a3869c2b671b40c35
|
|
| BLAKE2b-256 |
4fcd15dd3ad856fc88b76b96d49cccdd3f83d97da5b63e530be2e9fd539cf066
|