Skip to main content

A Text2SQL benchmark for evaluation of Large Language Models

Project description

Downloads codecov PyPI Version CI Python Versions License

LLMSQL

Patched and improved version of the original large crowd-sourced dataset for developing natural language interfaces for relational databases, WikiSQL.

Our datasets are available for different scenarios on our HuggingFace page.

Overview

Install

pip3 install llmsql

This repository provides the LLMSQL Benchmark — a modernized, cleaned, and extended version of WikiSQL, designed for evaluating and fine-tuning large language models (LLMs) on Text-to-SQL tasks.

Note

The package doesn't have the dataset, it is stored on our HuggingFace page.

This package contains

  • Support for modern LLMs.
  • Tools for inference and evaluation.
  • Support for Hugging Face models out-of-the-box.
  • Structured for reproducibility and benchmarking.

Usage Recommendations

Modern LLMs are already strong at producing SQL queries without finetuning. We therefore recommend that most users:

  1. Run inference directly on the full benchmark: model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct", output_file="path_to_your_outputs.jsonl",

    • Use llmsql.inference_transformers (the function for transformers inference) for generation of SQL predictions with your model. If you want to do vllm based inference, use llmsql.inference_vllm. Works both with HF model id, e.g. Qwen/Qwen2.5-1.5B-Instruct and model instance passed directly, e.g. inference_transformers(model_or_model_name_or_path=model, ...)
    • Evaluate results against the benchmark with the llmsql.LLMSQLEvaluator evaluator class.
  2. Optional finetuning:

    • For research or domain adaptation, we provide finetuning version for HF models. Use Finetune Ready dataset from HuggingFace.

[!Tip] You can find additional manuals in the README files of each folder(Inferece Readme, Evaluation Readme)

[!Tip] vllm based inference require vllm optional dependency group installed: pip install llmsql[vllm]


Repository Structure


llmsql/
├── evaluation/          # Scripts for downloading DB + evaluating predictions
└── inference/           # Generate SQL queries with your LLM

Quickstart

Install

Make sure you have the package installed (we used python3.11):

pip3 install llmsql

1. Run Inference

from llmsql import inference_transformers

# Run generation directly with transformers
results = inference_transformers(
    model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct",
    output_file="path_to_your_outputs.jsonl",
    num_fewshots=5,
    batch_size=8,
    max_new_tokens=256,
    do_sample=False,
    model_args={
        "torch_dtype": "bfloat16",
    }
)

2. Evaluate Results

from llmsql import LLMSQLEvaluator

evaluator = LLMSQLEvaluator(workdir_path="llmsql_workdir")
report = evaluator.evaluate(outputs_path="path_to_your_outputs.jsonl")
print(report)

Vllm inference (Recommended)

To speed up your inference we recommend using vllm inference. You can do it with optional llmsql[vllm] dependency group

pip install llmsql[vllm]

After that run

from llmsql import inference_vllm
results = inference_vllm(
    "Qwen/Qwen2.5-1.5B-Instruct",
    "test_results.jsonl",
    do_sample=False,
    batch_size=20000
)

for fast inference.

Suggested Workflow

  • Primary: Run inference on dataset/questions.jsonl with vllm → Evaluate with evaluation/.
  • Secondary (optional): Fine-tune on train/val → Test on test_questions.jsonl.

Contributing

Check out our open issues and feel free to submit pull requests!

We also encourage you to submit new issues!

To get started with development, first fork the repository and install the dev dependencies.

For more information on the contributing: check CONTRIBUTING.md and our documentation page.

License & Citation

Please cite LLMSQL if you use it in your work:

@inproceedings{llmsql_bench,
  title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
  author={Pihulski, Dzmitry and  Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
  booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)},
  year={2025},
  organization={IEEE}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmsql-0.1.13.tar.gz (32.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmsql-0.1.13-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file llmsql-0.1.13.tar.gz.

File metadata

  • Download URL: llmsql-0.1.13.tar.gz
  • Upload date:
  • Size: 32.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmsql-0.1.13.tar.gz
Algorithm Hash digest
SHA256 83ec08c1f19a4b5ce322b5b391c6f9fb02a9e6f849e3766776a8343412dbcb76
MD5 f3a6bcc1b9a2e362b169b2897b92aaea
BLAKE2b-256 debc9feaf84f4ffb64e58738de0a21a54921c61ce1876730a75dc76fcce89bfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmsql-0.1.13.tar.gz:

Publisher: publish.yml on LLMSQL/llmsql-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llmsql-0.1.13-py3-none-any.whl.

File metadata

  • Download URL: llmsql-0.1.13-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmsql-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 9da23c404e51d81ed6c927c8930a750bf283248e98479637eb04c2f38a633762
MD5 fc204987e290799b7f06584fec80811d
BLAKE2b-256 ba9d0bb699f5511640acbc6aa795fa6f95eeb42059959f7bf2d0eba6a74a00ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmsql-0.1.13-py3-none-any.whl:

Publisher: publish.yml on LLMSQL/llmsql-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page