A Text2SQL benchmark for evaluation of Large Language Models

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pihul

These details have not been verified by PyPI

Project description

Downloads PyPI Version Python Versions License

LLMSQL

Patched and improved version of the original large crowd-sourced dataset for developing natural language interfaces for relational databases, WikiSQL.

Our datasets are available for different scenarios on our HuggingFace page.

Overview

Install

pip3 install llmsql

This repository provides the LLMSQL Benchmark — a modernized, cleaned, and extended version of WikiSQL, designed for evaluating large language models (LLMs) on Text-to-SQL tasks.

Note

The package doesn't have the dataset, it is stored on our HuggingFace page.

This package contains

Support for modern LLMs.
Tools for inference and evaluation.
Support for Hugging Face models out-of-the-box.
Structured for reproducibility and benchmarking.

Latest News 📣

[2025/12] Evaluation class converted to function see new evaluate(...) function
New page version added to https://llmsql.github.io/llmsql-benchmark/
Vllm inference method now supports chat templates, see inference_vllm(...).
Transformers inference now supports custom chat tempalates with chat_template argument, see inference_transformers(...)
More stable and deterministic inference with inference_vllm(...) function added by setting some envars
padding_side argument added to inference_transformers(...) function with default left option.

Usage Recommendations

Modern LLMs are already strong at producing SQL queries without finetuning. We therefore recommend that most users:

Run inference directly on the full benchmark: model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct", output_file="path_to_your_outputs.jsonl",
- Use llmsql.inference_transformers (the function for transformers inference) for generation of SQL predictions with your model. If you want to do vllm based inference, use llmsql.inference_vllm. Works both with HF model id, e.g. Qwen/Qwen2.5-1.5B-Instruct and model instance passed directly, e.g. inference_transformers(model_or_model_name_or_path=model, ...)
- Evaluate results against the benchmark with the llmsql.evaluate function.
Optional finetuning:
- For research or domain adaptation, we provide finetuning version for HF models. Use Finetune Ready dataset from HuggingFace.

[!Tip] You can find additional manuals in the README files of each folder(Inferece Readme, Evaluation Readme)

[!Tip] vllm based inference require vllm optional dependency group installed: pip install llmsql[vllm]

Repository Structure


llmsql/
├── evaluation/          # Scripts for downloading DB + evaluating predictions
└── inference/           # Generate SQL queries with your LLM

Quickstart

For the full tutorial, check out the Colab notebook: Open in Colab

Install

Make sure you have the package installed (we used python3.11):

pip3 install llmsql

1. Run Inference

Transformers inference

from llmsql import inference_transformers

# Run generation directly with transformers
results = inference_transformers(
    model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct",
    output_file="path_to_your_outputs.jsonl",
    num_fewshots=5,
    batch_size=8,
    max_new_tokens=256,
    do_sample=False,
    model_kwargs={
        "torch_dtype": "bfloat16",
    }
)

Vllm inference (Recommended)

To speed up your inference we recommend using vllm inference. You can do it with optional llmsql[vllm] dependency group

pip install llmsql[vllm]

After that run

from llmsql import inference_vllm
results = inference_vllm(
    "Qwen/Qwen2.5-1.5B-Instruct",
    "test_results.jsonl",
    do_sample=False,
    batch_size=20000
)

for fast inference.

2. Evaluate Results

from llmsql import evaluate

report =evaluate(outputs="path_to_your_outputs.jsonl")
print(report)

Or with ther results from the infernece:

from llmsql import evaluate

# results = inference_transformers(...) or infernce_vllm(...)

report =evaluate(outputs=results)
print(report)

Prompt Template

The prompt defines explicit constraints on the generated output. The model is instructed to output only a valid SQL SELECT query, to use a fixed table name ("Table") (which will be replaced with the actual table name during evaluation), to quote all table and column names, and to restrict generation to the specified SQL functions, condition operators, and keywords. The full prompt specification is provided in the prompt template.

Below is an example of the 5-shot prompt template used during inference.

Your task: Given a question and a table schema, output ONLY a valid SQL SELECT query.
⚠️ STRICT RULES:
 - Output ONLY SQL (no explanations, no markdown, no ``` fences)
 - Use table name "Table"
 - Allowed functions: ['MAX', 'MIN', 'COUNT', 'SUM', 'AVG']
 - Allowed condition operators: ['=', '>', '<', '!=']
 - Allowed SQL keywords: ['SELECT', 'WHERE', 'AND']
 - Always use "" with all column names and table name, even one word: "Price", "General column", "Something #"

### EXAMPLE 1:
Question: What is the price of the Samsung Galaxy S23?
Columns: ['Brand', 'Model', 'Price', 'Storage', 'Color']
Types: ['text', 'text', 'real', 'text', 'text']
Sample row: ['Apple', 'iPhone 14', 899.99, '128GB', 'White']
SQL: SELECT "Price" FROM "Table" WHERE "Brand" = "Samsung" AND "Model" = "Galaxy S23";

### EXAMPLE 2:
Question: How many books did Maya Chen publish?
Columns: ['Author', 'Books Published', 'Genre', 'Country', 'Years Active']
Types: ['text', 'real', 'text', 'text', 'text']
Sample row: ['John Smith', 3, 'Non-fiction', 'Canada', '2005–2015']
SQL: SELECT "Books Published" FROM "Table" WHERE "Author" = "Maya Chen";

### EXAMPLE 3:
Question: What is the total population of cities in California?
Columns: ['City', 'State', 'Population', 'Area', 'Founded']
Types: ['text', 'text', 'real', 'real', 'text']
Sample row: ['Houston', 'Texas', 2304580, 1651.1, '1837']
SQL: SELECT SUM("Population") FROM "Table" WHERE "State" = "California";

### EXAMPLE 4:
Question: How many restaurants serve Italian cuisine?
Columns: ['Restaurant', 'Cuisine', 'Rating', 'City', 'Price Range']
Types: ['text', 'text', 'real', 'text', 'text']
Sample row: ['Golden Dragon', 'Chinese', 4.2, 'Boston', '$$']
SQL: SELECT COUNT(*) FROM "Table" WHERE "Cuisine" = "Italian";

### EXAMPLE 5:
Question: What is the average salary for Software Engineers?
Columns: ['Job Title', 'Salary', 'Experience', 'Location', 'Company Size']
Types: ['text', 'real', 'text', 'text', 'text']
Sample row: ['Data Analyst', 70000, 'Junior', 'Chicago', '200–500']
SQL: SELECT AVG("Salary") FROM "Table" WHERE "Job Title" = "Software Engineer";

### NOW ANSWER:
Question: {question}
Columns: {headers}
Types: {types}
Sample row: {sample_row}
SQL:"""

Implementations of 0-shot, 1-shot, and 5-shot prompt templates are available here: 👉 link-to-file

Suggested Workflow

Primary: Run inference on all questions with vllm or transformers → Evaluate with evaluate().
Secondary (optional): Fine-tune on train/val → Test on test_questions.jsonl. You can find the datasets here HF Finetune Ready.

Contributing

Check out our open issues, fork this repo and feel free to submit pull requests!

We also encourage you to submit new issues!

To get started with development, first fork the repository and install basic dependencies with dev dependencies.

For more information on the contributing: check CONTRIBUTING.md and our documentation page.

License & Citation

Please cite LLMSQL if you use it in your work:

@inproceedings{llmsql_bench,
  title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
  author={Pihulski, Dzmitry and  Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
  booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)},
  year={2025},
  organization={IEEE}
}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pihul

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.16

Mar 5, 2026

This version

0.1.15

Feb 24, 2026

0.1.14

Dec 15, 2025

0.1.13

Dec 2, 2025

0.1.11

Oct 16, 2025

0.1.10

Oct 16, 2025

0.1.9

Oct 16, 2025

0.1.7

Oct 16, 2025

0.1.6

Oct 16, 2025

0.1.5

Oct 15, 2025

0.1.4

Oct 13, 2025

0.1.3

Sep 25, 2025

0.1.2

Sep 24, 2025

0.1.1

Sep 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmsql-0.1.15.tar.gz (41.0 kB view details)

Uploaded Feb 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmsql-0.1.15-py3-none-any.whl (32.0 kB view details)

Uploaded Feb 24, 2026 Python 3

File details

Details for the file llmsql-0.1.15.tar.gz.

File metadata

Download URL: llmsql-0.1.15.tar.gz
Upload date: Feb 24, 2026
Size: 41.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmsql-0.1.15.tar.gz
Algorithm	Hash digest
SHA256	`555d70ef19837c0f2c7ce9a01b5b7a8cbf60a9c3ebfde10178c56960357c5eb6`
MD5	`28a58605d8b0dedb46a2edae0820085b`
BLAKE2b-256	`b2b780c97b98a0047e3a2b7b85fc002c66dbd26ea3500386e62e4cef4a6ab2f0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmsql-0.1.15.tar.gz:

Publisher: publish.yml on LLMSQL/llmsql-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmsql-0.1.15.tar.gz
- Subject digest: 555d70ef19837c0f2c7ce9a01b5b7a8cbf60a9c3ebfde10178c56960357c5eb6
- Sigstore transparency entry: 984909216
- Sigstore integration time: Feb 24, 2026
Source repository:
- Permalink: LLMSQL/llmsql-benchmark@79175212c90b1fc094abd2c9666c23d903060014
- Branch / Tag: refs/tags/v0.1.15
- Owner: https://github.com/LLMSQL
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@79175212c90b1fc094abd2c9666c23d903060014
- Trigger Event: push

File details

Details for the file llmsql-0.1.15-py3-none-any.whl.

File metadata

Download URL: llmsql-0.1.15-py3-none-any.whl
Upload date: Feb 24, 2026
Size: 32.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmsql-0.1.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8daf3b8d7c41d9159ba738ffae810eb84d6778a3553f379fb85a7315369c8301`
MD5	`eff618a3beeccfb086d3e024bf0c9a61`
BLAKE2b-256	`b0bdeb35e2fec2b3ca3c94db855a7d2aaa6b40e0ac2a6848d6469c7509a7880d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmsql-0.1.15-py3-none-any.whl:

Publisher: publish.yml on LLMSQL/llmsql-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmsql-0.1.15-py3-none-any.whl
- Subject digest: 8daf3b8d7c41d9159ba738ffae810eb84d6778a3553f379fb85a7315369c8301
- Sigstore transparency entry: 984909220
- Sigstore integration time: Feb 24, 2026
Source repository:
- Permalink: LLMSQL/llmsql-benchmark@79175212c90b1fc094abd2c9666c23d903060014
- Branch / Tag: refs/tags/v0.1.15
- Owner: https://github.com/LLMSQL
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@79175212c90b1fc094abd2c9666c23d903060014
- Trigger Event: push

llmsql 0.1.15

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

LLMSQL

Overview

Install

Note

This package contains

Latest News 📣

Usage Recommendations

Repository Structure

Quickstart

Install

1. Run Inference

Transformers inference

Vllm inference (Recommended)

2. Evaluate Results

Prompt Template

Suggested Workflow

Contributing

License & Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance