No project description provided

These details have not been verified by PyPI

Project links

Homepage

Project description

T2S-Metrics

A small evaluation toolkit for text-to-SPARQL systems. It runs a configurable set of metrics over JSONL datasets and can execute queries against local RDF files or SPARQL endpoints.

Features

Metrics for query exact match, token overlap, answer-set quality, BLEU/ROUGE, CodeBLEU, and more.
Execution backends for local RDF (RDFLib) and remote SPARQL endpoints.
Pluggable LLM-based judging via an Ollama backend.
Python API for quick experiments.

Installation

The package is available on PyPI and can be installed directly with pip:

pip install t2s-metrics

For development (editable install), you can use:

uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install -e .

Usage

Expected JSONL format

Input evaluation files must be JSON Lines (.jsonl) with one object per line. Each object must include:

id (string): unique query/case identifier
golden (string): reference SPARQL query
generated (string): system-generated SPARQL query
order_matters (boolean): whether answer order must be preserved

This is exactly what JsonlEval expects in t2smetrics/core/eval.py.

Example (from datasets/ck25/eval/AIFB.jsonl):

{"id": "ck25:1-en", "golden": "PREFIX pv: <http://ld.company.org/prod-vocab/>\nSELECT DISTINCT ?result\nWHERE\n{\n  <http://ld.company.org/prod-instances/empl-Karen.Brant%40company.org> pv:memberOf ?result .\n  ?result a pv:Department .\n}\n", "generated": "SELECT ?department WHERE { ?person :name \"Ms. Brant\"; :worksIn ?department. }", "order_matters": false}

Execution backends

The library supports two execution backend families:

Local RDF file execution with RDFLibBackend
Remote SPARQL endpoint execution with SparqlEndpointBackend

SparqlEndpointBackend is generic SPARQL 1.1 and works with endpoints such as QLever and Corese (and also GraphDB, Fuseki, Virtuoso, Blazegraph, etc.).

from t2smetrics.execution.rdflib_backend import RDFLibBackend
from t2smetrics.execution.sparql_endpoint_backend import SparqlEndpointBackend

# Option 1: local KG file
local_backend = RDFLibBackend("./datasets/example/kg/example.ttl")

# Option 2: remote endpoint (e.g., QLever/Corese)
endpoint_backend = SparqlEndpointBackend("http://localhost:8886/")

LLM backend (local Ollama + extensible)

For LLM-based metrics (for example LLMJudge), the library currently provides OllamaBackend for local inference.

from t2smetrics.llm.ollama_backend import OllamaBackend

llm_backend = OllamaBackend(model="gemma3:4b")

The LLM layer is extensible via LLMBackend (t2smetrics/llm/base.py). To plug another provider, implement judge(prompt: str, timeout: int = 30) -> dict and return a dictionary with a numeric score (recommended in [0, 1]).

from t2smetrics.llm.base import LLMBackend


class MyLLMBackend(LLMBackend):
    def judge(self, prompt: str, timeout: int = 30) -> dict:
        # Call your provider/client here
        return {"score": 0.85, "raw": "optional provider response"}

Then pass your backend to Experiment(..., llm_backend=...).

Python (minimal example)

from t2smetrics.core.experiment import Experiment
from t2smetrics.core.eval import JsonlEval
from t2smetrics.metrics.text_metrics import Bleu
from t2smetrics.metrics.token import TokenF1


jsonl_eval = JsonlEval("./datasets/example/eval/example.jsonl")
metrics = [Bleu(), TokenF1()]
experiment = Experiment(jsonl_eval, metrics)
_, summary = experiment.run()

print("\n=== SUMMARY ===")
for k, v in summary.items():
    print(f"{k}: {v:.4f}")

Python (full example with execution backends)

from t2smetrics.core.experiment import Experiment
from t2smetrics.core.eval import JsonlEval
from t2smetrics.execution.rdflib_backend import RDFLibBackend

from t2smetrics.llm.ollama_backend import OllamaBackend
from t2smetrics.metrics.answer_set.f1 import AnswerSetF1
from t2smetrics.metrics.answer_set.precision import AnswerSetPrecision
from t2smetrics.metrics.answer_set.precision_qald import PrecisionQALD
from t2smetrics.metrics.answer_set.recall import AnswerSetRecall
from t2smetrics.metrics.answer_set.recall_qald import RecallQALD
from t2smetrics.metrics.exact import QueryExactMatch
from t2smetrics.metrics.codebleu.codebleu import CodeBLEU
from t2smetrics.metrics.answer_set.f1_qald import F1QALD
from t2smetrics.metrics.answer_set.f1_spinach import F1Spinach
from t2smetrics.metrics.answer_set.mrr import MRR
from t2smetrics.metrics.answer_set.hit_at_k import HitAtK
from t2smetrics.metrics.answer_set.ndcg import NDCG
from t2smetrics.metrics.answer_set.p_at_k import PrecisionAtK
from t2smetrics.metrics.distance import (
    LevenshteinDistance,
    JaccardSimilarity,
    CosineSimilarity,
    EuclideanDistance,
)
from t2smetrics.metrics.llm_judge import LLMJudge
from t2smetrics.metrics.text_metrics import Bleu, RougeN, Meteor, SPBleu
from t2smetrics.metrics.uri.uri_hallucination import URIHallucination
from t2smetrics.metrics.query_execution import QueryExecution
from t2smetrics.metrics.token import SPF1, TokenRecall, TokenPrecision, TokenF1


jsonl_eval = JsonlEval("./datasets/example/eval/example.jsonl")

execution_backend = RDFLibBackend("./datasets/example/kg/example.ttl")

llm_backend = OllamaBackend()

metrics = [
    AnswerSetPrecision(),
    AnswerSetRecall(),
    AnswerSetF1(),
    Bleu(),
    SPBleu(),
    CodeBLEU(),
    CosineSimilarity(),
    EuclideanDistance(),
    F1QALD(),
    PrecisionQALD(),
    RecallQALD(),
    F1Spinach(),
    HitAtK(k=5),
    JaccardSimilarity(),
    LLMJudge(),
    LevenshteinDistance(),
    MRR(),
    Meteor(),
    NDCG(),
    PrecisionAtK(k=1),
    QueryExecution(),
    QueryExactMatch(),
    RougeN(1),
    RougeN(2),
    RougeN(3),
    RougeN(4),
    TokenF1(),
    SPF1(),
    TokenPrecision(),
    TokenRecall(),
    URIHallucination(),
]

experiment = Experiment(
    jsonl_eval=jsonl_eval,
    metrics=metrics,
    execution_backend=execution_backend,
    llm_backend=llm_backend,
    verbose=True,
)

results, summary = experiment.run()

print("=== PER QUERY RESULTS ===")
for r in results:
    print(r)

print("\n=== SUMMARY ===")
for k, v in summary.items():
    print(f"{k}: {v:.4f}")

Full workflow example (dataset + endpoint + export)

For a complete run over multiple systems and export of aggregated metrics to JSON, see t2smetrics/run_text2sparql.py.

Typical workflow:

Choose a dataset folder (for example datasets/ck25).
Put input files under datasets/<dataset>/eval/*.jsonl.
Start your SPARQL endpoint (for example QLever/Corese).
Set endpoint URL in the script (example: http://localhost:8886/).
Run:

python -m t2smetrics.run_text2sparql

The script writes timestamped summary files under:

datasets/<dataset>/results/<dataset>-YYYYMMDD-HHMMSS.json

These result files are then directly consumable by the dashboard.

Dashboard

The dashboard reads JSON result files (generated in datasets/*/results/*.json) and serves an interactive UI (Radar, Bar, Correlation Heatmap, Parallel Coordinates, Scatter Matrix).

Launch with auto-discovery:

python -m t2smetrics.cli dashboard

Launch with explicit files:

python -m t2smetrics.cli dashboard \
    datasets/ck25/results/ck25-20260306-133227.json \
    datasets/db25/results/db25-20260306-132100.json

Then open:

http://127.0.0.1:8050

Development

Build

python setup.py sdist bdist_wheel

Tests

There are no automated tests yet. If you add tests, run them with:

python -m pytest

License

t2s-metrics

t2s-metrics is provided under the terms of the GNU Affero General Public License 3.0 (AGPL-3.0).

Redistribution of third-party software and data

This repository provides several third-party contributions redistributed with their original licenses.

CK25 Dataset

t2s-metrics reuses the CK25 Corporate Knowledge Reference Dataset for Benchmarking Text-2-SPARQL QA Approaches that we modified to account for file format requirements (jsonl format).

The modified version is redistributed in directory dataset/ck25 under the terms of the Creative Commons Attribution 4.0 International license (CC-BY-4.0).

QCan library

t2s-metrics reuses the QCan software for canonicalising SPARQL queries.

QCan is written in Java. In this repository, we distribute the compiled jar of QCan v1.1, third_party_lib/qcan-1.1-jar-with-dependencies.jar, under the terms of the Apache 2.0 license.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.2

May 6, 2026

1.1.1

May 5, 2026

1.1.0

Mar 27, 2026

1.0.2

Mar 16, 2026

1.0.1

Mar 13, 2026

This version

1.0.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

t2s_metrics-1.0.0.tar.gz (25.5 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

t2s_metrics-1.0.0-py3-none-any.whl (43.1 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file t2s_metrics-1.0.0.tar.gz.

File metadata

Download URL: t2s_metrics-1.0.0.tar.gz
Upload date: Mar 12, 2026
Size: 25.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for t2s_metrics-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`5ba88fb51074c492bc04f9fd5230e225fccfe554fcb5ca15eb65b356533223db`
MD5	`2fcc5d6fe128ffdfd56b76b1997bdccd`
BLAKE2b-256	`3a315da07f24e7e2d7234f3606e45a982122a08441a5015da4ebb6ea28f3fa2e`

See more details on using hashes here.

File details

Details for the file t2s_metrics-1.0.0-py3-none-any.whl.

File metadata

Download URL: t2s_metrics-1.0.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 43.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for t2s_metrics-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7283f3a51206155f284c86ac191ca516776ba231f29130c97def4854e6143dfe`
MD5	`31ff3dba241c9deae6bce5c9ad25631f`
BLAKE2b-256	`9d0d8d97694099bf277ea4976ee152e80a56089b019c5c607238e8160e68dc3c`

See more details on using hashes here.

t2s-metrics 1.0.0

Navigation

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Project description

T2S-Metrics

Features

Installation

Usage

Expected JSONL format

Execution backends

LLM backend (local Ollama + extensible)

Python (minimal example)

Python (full example with execution backends)

Full workflow example (dataset + endpoint + export)

Dashboard

Development

Build

Tests

License

t2s-metrics

Redistribution of third-party software and data

CK25 Dataset

QCan library

Project details

Verified details

Owner

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes