A Text2SQL benchmark for evaluation of Large Language Models
Project description
LLMSQL
Patched and improved version of the original large crowd-sourced dataset for developing natural language interfaces for relational databases, WikiSQL.
Our datasets are available for different scenarios on our HuggingFace page.
Overview
Install
pip3 install llmsql
This repository provides the LLMSQL Benchmark — a modernized, cleaned, and extended version of WikiSQL, designed for evaluating large language models (LLMs) on Text-to-SQL tasks.
Note
The package doesn't have the dataset, it is stored on our HuggingFace page.
This package contains
- Support for modern LLMs.
- Tools for inference and evaluation.
- Support for Hugging Face models out-of-the-box.
- Structured for reproducibility and benchmarking.
Latest News 📣
-
[2026/03] Fully functional CLI commands for inference and evaluation. See this guide.
-
[2026/03] Added support for API inference, for now only for OpenAI-compatable APIs, see
inference_api()function -
[2026/03] The page now contains first version of leaderboard!
-
[2026/02] The new LLMSQL 2.0 version is out now! See the dataset. The support is already added with the
versionparameter to eachinferencefunction.
Usage Recommendations
Modern LLMs are already strong at producing SQL queries without finetuning. We therefore recommend that most users:
-
Run inference directly on the full benchmark:
- Use
llmsql.inference_transformers(the function for transformers inference) for generation of SQL predictions with your model. If you want to do vllm based inference, usellmsql.inference_vllm. Works both with HF model id, e.g.Qwen/Qwen2.5-1.5B-Instructand model instance passed directly, e.g.inference_transformers(model_or_model_name_or_path=model, ...). The api inference is also supported, seeinference_api() - Evaluate results against the benchmark with the
llmsql.evaluatefunction.
- Use
-
Optional finetuning:
- For research or domain adaptation, we provide finetuning version for HF models. Use Finetune Ready datasets from HuggingFace.
[!Tip] You can find additional manuals in the README files of each folder(Inferece Readme, Evaluation Readme)
[!Tip] vllm based inference require vllm optional dependency group installed:
pip install llmsql[vllm]
Repository Structure
llmsql/
├── evaluation/ # Scripts for evaluation
└── inference/ # Generate SQL queries with your LLM
Quickstart
For the full tutorial, check out the Colab notebook: Open in Colab
Install
Make sure you have the package installed (we used python3.11):
pip3 install llmsql
1. Run Inference
Transformers inference
from llmsql import inference_transformers
# Run generation directly with transformers
results = inference_transformers(
model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct",
output_file="path_to_your_outputs.jsonl",
num_fewshots=5,
batch_size=8,
max_new_tokens=256,
do_sample=False,
model_kwargs={
"torch_dtype": "bfloat16",
}
)
Vllm inference (Recommended)
To speed up your inference we recommend using vllm inference. You can do it with optional llmsql[vllm] dependency group
pip install llmsql[vllm]
After that run
from llmsql import inference_vllm
results = inference_vllm(
"Qwen/Qwen2.5-1.5B-Instruct",
"test_results.jsonl",
do_sample=False,
batch_size=20000
)
for fast inference.
2. Evaluate Results
from llmsql import evaluate
report =evaluate(outputs="path_to_your_outputs.jsonl")
print(report)
Or with ther results from the infernece:
from llmsql import evaluate
# results = inference_transformers(...) or infernce_vllm(...)
report =evaluate(outputs=results)
print(report)
For more examples check the examples folder
Prompt Template
The prompt defines explicit constraints on the generated output.
The model is instructed to output only a valid SQL SELECT query, to use a fixed table name ("Table") (which will be replaced with the actual table name during evaluation), to quote all table and column names, and to restrict generation to the specified SQL functions, condition operators, and keywords.
The full prompt specification is provided in the prompt template.
Below is an example of the 5-shot prompt template used during inference.
Your task: Given a question and a table schema, output ONLY a valid SQL SELECT query.
⚠️ STRICT RULES:
- Output ONLY SQL (no explanations, no markdown, no ``` fences)
- Use table name "Table"
- Allowed functions: ['MAX', 'MIN', 'COUNT', 'SUM', 'AVG']
- Allowed condition operators: ['=', '>', '<', '!=']
- Allowed SQL keywords: ['SELECT', 'WHERE', 'AND']
- Always use "" with all column names and table name, even one word: "Price", "General column", "Something #"
### EXAMPLE 1:
Question: What is the price of the Samsung Galaxy S23?
Columns: ['Brand', 'Model', 'Price', 'Storage', 'Color']
Types: ['text', 'text', 'real', 'text', 'text']
Sample row: ['Apple', 'iPhone 14', 899.99, '128GB', 'White']
SQL: SELECT "Price" FROM "Table" WHERE "Brand" = "Samsung" AND "Model" = "Galaxy S23";
### EXAMPLE 2:
Question: How many books did Maya Chen publish?
Columns: ['Author', 'Books Published', 'Genre', 'Country', 'Years Active']
Types: ['text', 'real', 'text', 'text', 'text']
Sample row: ['John Smith', 3, 'Non-fiction', 'Canada', '2005–2015']
SQL: SELECT "Books Published" FROM "Table" WHERE "Author" = "Maya Chen";
### EXAMPLE 3:
Question: What is the total population of cities in California?
Columns: ['City', 'State', 'Population', 'Area', 'Founded']
Types: ['text', 'text', 'real', 'real', 'text']
Sample row: ['Houston', 'Texas', 2304580, 1651.1, '1837']
SQL: SELECT SUM("Population") FROM "Table" WHERE "State" = "California";
### EXAMPLE 4:
Question: How many restaurants serve Italian cuisine?
Columns: ['Restaurant', 'Cuisine', 'Rating', 'City', 'Price Range']
Types: ['text', 'text', 'real', 'text', 'text']
Sample row: ['Golden Dragon', 'Chinese', 4.2, 'Boston', '$$']
SQL: SELECT COUNT(*) FROM "Table" WHERE "Cuisine" = "Italian";
### EXAMPLE 5:
Question: What is the average salary for Software Engineers?
Columns: ['Job Title', 'Salary', 'Experience', 'Location', 'Company Size']
Types: ['text', 'real', 'text', 'text', 'text']
Sample row: ['Data Analyst', 70000, 'Junior', 'Chicago', '200–500']
SQL: SELECT AVG("Salary") FROM "Table" WHERE "Job Title" = "Software Engineer";
### NOW ANSWER:
Question: {question}
Columns: {headers}
Types: {types}
Sample row: {sample_row}
SQL:"""
Implementations of 0-shot, 1-shot, and 5-shot prompt templates are available here: 👉 link-to-file
Contributing
Check out our open issues, fork this repo and feel free to submit pull requests!
We also encourage you to submit new issues!
To get started with development, first fork the repository and install basic dependencies with dev dependencies.
For more information on the contributing: check CONTRIBUTING.md and our documentation page.
License & Citation
Please cite LLMSQL if you use it in your work:
@inproceedings{llmsql_bench,
title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
author={Pihulski, Dzmitry and Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)},
year={2025},
organization={IEEE}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmsql-0.1.16.tar.gz.
File metadata
- Download URL: llmsql-0.1.16.tar.gz
- Upload date:
- Size: 47.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d570dfc51ebd4558e831aef36b0b32898e2aacfd2650ed9bcf763671dcf3bc9c
|
|
| MD5 |
c7fcc58853eb6f80187909a546731e0c
|
|
| BLAKE2b-256 |
dee81db887807293c9158492c7bfa70b8e71d2a5d81ebdfab3856a764c46cd3f
|
Provenance
The following attestation bundles were made for llmsql-0.1.16.tar.gz:
Publisher:
publish.yml on LLMSQL/llmsql-benchmark
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmsql-0.1.16.tar.gz -
Subject digest:
d570dfc51ebd4558e831aef36b0b32898e2aacfd2650ed9bcf763671dcf3bc9c - Sigstore transparency entry: 1038255153
- Sigstore integration time:
-
Permalink:
LLMSQL/llmsql-benchmark@e1b14198c0ad10c8e49753666f7463f45a28bed3 -
Branch / Tag:
refs/tags/v0.1.16 - Owner: https://github.com/LLMSQL
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e1b14198c0ad10c8e49753666f7463f45a28bed3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file llmsql-0.1.16-py3-none-any.whl.
File metadata
- Download URL: llmsql-0.1.16-py3-none-any.whl
- Upload date:
- Size: 37.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21a933054450ce1b8473bcf8a35ae2aa2914d0dbba800b00decd01347b812825
|
|
| MD5 |
b21d516f1b417444f0ca53d2d4f87107
|
|
| BLAKE2b-256 |
2d3de0ef45485fdaf96be5c29bbdf4afeef5369d1be729f51085e39afdb1c1db
|
Provenance
The following attestation bundles were made for llmsql-0.1.16-py3-none-any.whl:
Publisher:
publish.yml on LLMSQL/llmsql-benchmark
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmsql-0.1.16-py3-none-any.whl -
Subject digest:
21a933054450ce1b8473bcf8a35ae2aa2914d0dbba800b00decd01347b812825 - Sigstore transparency entry: 1038255220
- Sigstore integration time:
-
Permalink:
LLMSQL/llmsql-benchmark@e1b14198c0ad10c8e49753666f7463f45a28bed3 -
Branch / Tag:
refs/tags/v0.1.16 - Owner: https://github.com/LLMSQL
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e1b14198c0ad10c8e49753666f7463f45a28bed3 -
Trigger Event:
push
-
Statement type: