T2S-Metrics provides a modular abstraction layer that decouples metric specification from implementation, enabling consistent, transparent, and reproducible evaluation of SPARQL-based QA systems

These details have not been verified by PyPI

Project links

Project description

T2S-Metrics

A small evaluation toolkit for text-to-SPARQL systems. It runs a configurable set of metrics over JSONL datasets and can execute queries against local RDF files or SPARQL endpoints.

Features

Full evaluation pipeline for Text-to-SPARQL systems from JSONL inputs to exportable JSON results.
Rich metric coverage for answer-set quality, text similarity, structural similarity, ranking, distance, and execution validity.
Two execution backend families:
- Local RDF graphs with RDFLib.
- Remote SPARQL endpoints (QLever, Corese, Fuseki, GraphDB, Virtuoso, Blazegraph, etc.).
Simple Python API for research workflows and reproducible experiment scripts.
CLI to run evaluations and launch dashboards.
Static dashboard export support for sharing reports without running a server.

Demo video ⏯️

Teaser

Full tutorial

Prerequisites

Python 3.12 or later.
uv (recommended for local development) or pip.
A SPARQL endpoint only if you use execution metrics with a remote KG (for example QLever/Corese).
Ollama only if you enable LLM-based metrics.
QCan jar only if you use qcan-related metrics. The repository includes it under third_party_lib.
NLTK data only if you use BLEU and METEOR realated metrics.

Installation

PyPI

For users who only want to run the library and CLI:

pip install t2s-metrics

After installation, the CLI entry point is available as:

t2s --help

[!TIP] If you would like to follow the examples below, you may wish to check the GitHub repository to obtain the evaluation datasets and knowledge graphs in the datasets folder. These files are not included in the PyPI package.

For development (editable install):

Clone the repository:

git clone https://github.com/Wimmics/t2s-metrics.git

Navigate to the project directory:

cd t2s-metrics

Install dependencies

Using `uv`:

uv sync

# With dev dependencies (pytest, ruff, twine)
uv sync --all-extras

Using `pip`:

pip install -e .

# With dev dependencies (pytest, ruff, twine)
pip install -e ".[dev]"

Adding NLTK data (check Prerequisites)

python -c "import nltk; nltk.download('punkt_tab'); nltk.download('wordnet')"

Usage

This section focuses on practical usage for both PyPI users and repository users.

1. Prepare your evaluation data

Input files must be JSON Lines (.jsonl) with one object per line.

Required keys:

id: unique query/case identifier.
golden: reference SPARQL query.
generated: system-generated SPARQL query.
order_matters: whether result ordering must be preserved.

Example (from datasets/ck25/eval/AIFB.jsonl):

{"id": "ck25:1-en", "golden": "PREFIX pv: <http://ld.company.org/prod-vocab/>\nSELECT DISTINCT ?result\nWHERE\n{\n  <http://ld.company.org/prod-instances/empl-Karen.Brant%40company.org> pv:memberOf ?result .\n  ?result a pv:Department .\n}\n", "generated": "SELECT ?department WHERE { ?person :name \"Ms. Brant\"; :worksIn ?department. }", "order_matters": false}

2. Choose your execution backend

You must provide one execution backend when running execution-aware metrics:

Local graph file with --execution_backend_graph_path or -eg.
SPARQL endpoint URL with --execution_backend_endpoint_url or -ee.

Python examples:

from t2smetrics.execution.rdflib_backend import RDFLibBackend
from t2smetrics.execution.sparql_endpoint_backend import SparqlEndpointBackend

# Local file backend
local_backend = RDFLibBackend("./datasets/example/kg/example.ttl")

# Remote endpoint backend
endpoint_backend = SparqlEndpointBackend("http://localhost:8886/")

3. Run from Python (based on run_example.py)

The minimal complete workflow from t2smetrics/run_example.py:

from t2smetrics import run_experiments
from t2smetrics.metrics import (
    AnswerSetPrecision,
    AnswerSetRecall,
    AnswerSetF1,
    Bleu,
    CodeBLEU,
    QueryExecution,
    QueryExactMatch,
)

run_experiments.run(
    dataset="example",
    jsonl_evals=["./datasets/example/eval/example.jsonl"],
    metrics_list=[
        AnswerSetPrecision(),
        AnswerSetRecall(),
        AnswerSetF1(),
        Bleu(),
        CodeBLEU(),
        QueryExecution(),
        QueryExactMatch(),
    ],
    execution_backend_graph_path="./datasets/example/kg/example.ttl",
    verbose=True,
)

4. Run from Python on ck25 (based on run_text2sparql.py)

t2smetrics/run_text2sparql.py demonstrates a multi-system run on ck25 dataset with an endpoint backend and parallel execution.

uv run ./t2smetrics/run_text2sparql.py

This generates timestamped JSON results under:

datasets/ck25/results/ck25-YYYYMMDD-HHMMSS.json

5. Run from CLI (recommended for daily use)

Show command help:

t2s --help
t2s run --help

Here is an example command with specific metrics, a SPARQL endpoint, verbosity and parallel processing:

t2s run -d ck25 -j ./datasets/ck25/eval/ -m 'hit@1' 'answerset_f1' 'answerset_precision' 'answerset_recall' 'bleu' 'codebleu' 'cosine_sim' 'euclidean' 'f1_qald' 'f1_spinach' 'jaccard' 'levenshtein' 'meteor' 'mrr' 'ndcg' 'p@1' 'precision_qald' 'query_exact_match' 'recall_qald' 'rouge_4' 'sp-bleu' 'sp-f1' 'token_f1' 'token_precision' 'token_recall' 'uri_hallucination' 'query_execution' -ee http://localhost:8886/ -v -p

[!NOTE] By default, the results are automatically exported to the directory ./datasets/{dataset}/results/. You can change this behaviour by using the -ep flag.

Here is an example with all metrics, a local TTL backend, an explicit export path, and detailed output per request instead:

t2s run -d ck25 -j ./datasets/ck25/eval/ \
  -m __all__ \
  -eg ./datasets/ck25/kg/dataset.ttl \
  -ep ./custom_results_folder \
  -eq -v -p

[!IMPORTANT] If you use the LLM as the judge metric, either directly or via the __all__ keyword, you will need to have Ollama running with either the requested model or the gemma3:4b model, as this is the default. After installing Ollama, you can use the following commands:

ollama serve          # run the server
ollama pull gemma3:4b # default model used by t2s-metrics

Common useful flags:

-s/--systems_name for explicit system names.
-p/--parallel for multiprocessing.
-eq/--export_per_query to include per-query values in output JSON.
-ep/--export_path to control output location.
-eg/--execution_backend_graph_path to run on local RDF files instead of endpoint mode.

6. Launch the dashboard

Auto-discover results under datasets/*/results/*.json:

t2s dashboard

[!IMPORTANT] The results are discovered automatically relative to the folder in which the command is executed. If you are not in the root directory of the cloned GitHub project, use the -f flag to avoid a FileNotFoundError.

Load explicit result files:

t2s dashboard -f \
  datasets/ck25/results/ck25-20260306-133227.json \
  datasets/db25/results/db25-20260306-132100.json

Generate a static dashboard snapshot:

t2s dashboard --static --output static_dashboard_snapshot

Then open:

http://127.0.0.1:8050

Development

Build

uv build

Tests

Run the test suite with:

uv run pytest

Release updates

For full details by version, see CHANGELOG.md.

License

t2s-metrics

t2s-metrics is provided under the terms of the GNU Affero General Public License 3.0 (AGPL-3.0).

Redistribution of third-party software and data

This repository provides several third-party contributions redistributed with their original licenses.

CK25 Dataset

t2s-metrics reuses the CK25 Corporate Knowledge Reference Dataset for Benchmarking Text-2-SPARQL QA Approaches that we modified to account for file format requirements (jsonl format).

The modified version is redistributed in directory datasets/ck25 under the terms of the Creative Commons Attribution 4.0 International license (CC-BY-4.0).

QCan library

t2s-metrics reuses the QCan software for canonicalising SPARQL queries.

QCan is written in Java. In this repository, we distribute the compiled jar of QCan v1.1, third_party_lib/qcan-1.1-jar-with-dependencies.jar, under the terms of the Apache 2.0 license.

Cite this work

Yousouf Taghzouti, et al. T2S-Metrics: Unified Library for Evaluating SPARQL Queries Generated From Natural Language. ELMKE 2026: The Third International Workshop on Evaluation of Language Models in Knowledge Engineering, co-located with ESWC 2026, May 2026, Dubrovnik, Croatia. ⟨hal-05598018⟩

See BibTex

@inproceedings{taghzouti:hal-05598018, TITLE = {{T2S-Metrics: Unified Library for Evaluating SPARQL Queries Generated From Natural Language}}, AUTHOR = {Taghzouti, Yousouf and Jiang, Tao and Juign{\'e}, Camille and Navet, Benjamin and Gandon, Fabien and Michel, Franck and Nothias, Louis-Felix}, URL = {https://inria.hal.science/hal-05598018}, BOOKTITLE = {{Proceedings of the Third International Workshop on Evaluation of Language Models in Knowledge Engineering (ELMKE 2026) co-located with the 23rd European Semantic Web Conference (ESWC 2026)}}, ADDRESS = {Dubrovnik, Croatia}, EDITOR = {CEUR}, YEAR = {2026}, MONTH = May }

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.2

May 6, 2026

This version

1.1.1

May 5, 2026

1.1.0

Mar 27, 2026

1.0.2

Mar 16, 2026

1.0.1

Mar 13, 2026

1.0.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

t2s_metrics-1.1.1.tar.gz (57.5 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

t2s_metrics-1.1.1-py3-none-any.whl (55.4 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file t2s_metrics-1.1.1.tar.gz.

File metadata

Download URL: t2s_metrics-1.1.1.tar.gz
Upload date: May 5, 2026
Size: 57.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for t2s_metrics-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`04119d20132c3b80b0d340d1519af0398172628f1c44756302e21cbcb894d493`
MD5	`f311dc56c603323542e1969f640e5e07`
BLAKE2b-256	`e0d71684f265b296ce92b3522c07bb482409bada2e36a7fc68f4e998b579db37`

See more details on using hashes here.

File details

Details for the file t2s_metrics-1.1.1-py3-none-any.whl.

File metadata

Download URL: t2s_metrics-1.1.1-py3-none-any.whl
Upload date: May 5, 2026
Size: 55.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for t2s_metrics-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4871c3d90fdd3ce17976a78eb5d424dfa8fba04d33b25d4462e627fd1baef011`
MD5	`34615b359c91ab15b45e29e2717721e8`
BLAKE2b-256	`cb6e203d801784433541599dae79a2cfe758cca0f0d712c3e1eea293edd906e7`

See more details on using hashes here.

t2s-metrics 1.1.1

Navigation

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

T2S-Metrics

Features

Demo video ⏯️

Teaser

Full tutorial

Prerequisites

Installation

PyPI

For development (editable install):

Using uv:

Using pip:

Adding NLTK data (check Prerequisites)

Usage

1. Prepare your evaluation data

2. Choose your execution backend

3. Run from Python (based on run_example.py)

4. Run from Python on ck25 (based on run_text2sparql.py)

5. Run from CLI (recommended for daily use)

6. Launch the dashboard

Development

Build

Tests

Release updates

License

t2s-metrics

Redistribution of third-party software and data

CK25 Dataset

QCan library

Cite this work

Project details

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Using `uv`:

Using `pip`: