Skip to main content

A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Project description

SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Preprint Paper

License MIT GitHub Repo PyPI

pipeline

🚀 Quickstart

# Install dependencies
# pip install datasets satquest

from datasets import load_dataset
from satquest import CNF, create_problem, create_question

item = load_dataset('sdpkjc/SATQuest', split='test')[0]
cnf = CNF(dimacs=item['sat_dimacs'])
# cnf.shuffle()

problem = create_problem('SATSP', cnf) # or 'SATDP', 'MaxSAT', 'MCS', 'MUS'
question = create_question('math')  # or 'dimacs', 'story', 'dualstory'
prompt = problem.accept(question)
answer = problem.solution  # reference answer
reward = int(problem.check(answer))  # 1 if answer is correct, 0 otherwise

Dataset Generation

Dataset on HF

# Run dataset generation
uv run --group gen gen_cnf_dataset.py --hf-entity {YOUR_HF_ENTITY} --seed 9527

uv run --group gen gen_cnf_rft_dataset.py --hf-entity {YOUR_HF_ENTITY} --seed 9527

Evaluation

# Run evaluation
uv run --group eval eval_model.py \
    --exp-name {YOUR_EXP_NAME} \
    --wandb-project "SATQuest-Eval" \
    --hf-dataset-name "sdpkjc/SATQuest" \
    --p-type-list SATSP \
    --q-type-list math \
    --llm-model "gpt-4o" \
    --max-tokens 16384 \
    --temperature 0.6 \
    --num-example 10 \
    --stream True \
    --cnf-shuffle False \
    --n-repeat 1

Evaluation Parameters

  • exp-name: Name of your experiment
  • wandb-project: Weights & Biases project name
  • hf-dataset-name: HuggingFace dataset name
  • p-type-list: Problem types to evaluate (e.g., "SATSP", "MaxSAT", "MCS", "MUS", "SATDP_SAT", "SATDP_UNSAT")
  • q-type-list: Question types to evaluate (e.g., "math", "dimacs", "story", "dualstory")
  • llm-model: LLM model to use for evaluation
  • max-tokens: Maximum tokens for LLM response
  • temperature: Temperature for LLM generation
  • num-example: Number of examples to evaluate
  • stream: Whether to stream LLM responses
  • cnf-shuffle: Whether to shuffle CNF formulas
  • n-repeat: Number of times to repeat evaluation

The evaluation results will be logged to Weights & Biases.

Reinforcement Fine-Tuning (RFT)

# Run RFT training
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup uv run --group rft trl vllm-serve --model "Qwen/Qwen2.5-7B-Instruct" --tensor_parallel_size 4 --max_model_len 16384  --gpu_memory_utilization 0.9 --enable_prefix_caching True &
CUDA_VISIBLE_DEVICES=4,5,6,7 uv run --group rft accelerate launch --num-processes 4 --config-file zero3.yaml rft.py --model-id "Qwen/Qwen2.5-7B-Instruct" --p-list SATSP --q-list math --exp-name "test" --server-ip "0.0.0.0"

RFT Parameters

  • model-id: Base model to fine-tune (default: "Qwen/Qwen2.5-7B-Instruct")
  • p-list: Problem types for training (e.g., "SATSP", "MaxSAT", "MCS", "MUS", "SATDP_SAT", "SATDP_UNSAT")
  • q-list: Question formats for training (e.g., "math", "story")
  • exp-name: Name of your experiment
  • server-ip: IP address for VLLM server

Citation

@misc{satquest2025,
  author = {Yanxiao Zhao, Yaqian Li, Zihao Bo, Rinyoichi Takezoe, Haojia Hui, Mo Guang, Lei Ren, Xiaolin Qin, Kaiwen Long},
  title = {SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/sdpkjc/SATQuest}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

satquest-0.1.2.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

satquest-0.1.2-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file satquest-0.1.2.tar.gz.

File metadata

  • Download URL: satquest-0.1.2.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for satquest-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bc41fdc7714844eced1dc46a9d76fb820df02880b29f2beeb2430c1a1ced69d3
MD5 6ec8b1f64e386383c2a55e7a99c5b798
BLAKE2b-256 2f312ac149667fa8d39611dd7ea462de0414e76566853b25801fca115015ac68

See more details on using hashes here.

File details

Details for the file satquest-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: satquest-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for satquest-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8840be8f670dae92fbd1a965ae1dd3899c54a5ac6ad6bf5ddae22074e16fcaf4
MD5 fabefd473ca5515a3869c2b671b40c35
BLAKE2b-256 4fcd15dd3ad856fc88b76b96d49cccdd3f83d97da5b63e530be2e9fd539cf066

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page