Reusable components and complete chat system to improve Large Language Models (LLMs) capabilities when generating SPARQL queries for a given set of endpoints, using Retrieval-Augmented Generation (RAG) and SPARQL query validation from the endpoint schema.

These details have not been verified by PyPI

Project links

Project description

✨ SPARQL query generation with LLMs 🦜

This project provides tools to enhance the capabilities of Large Language Models (LLMs) in generating SPARQL queries for specific endpoints:

a complete chat web service available at expasy.org/chat
a MCP server exposing tools at chat.expasy.org/mcp
reusable components published as the sparql-llm pip package

The system integrates Retrieval-Augmented Generation (RAG) and SPARQL query validation through endpoint schemas, to ensure more accurate and relevant query generation on large scale knowledge graphs.

The components are designed to work either independently or as part of a full chat-based system that can be deployed for a set of SPARQL endpoints. It requires endpoints to include metadata such as SPARQL query examples and endpoint descriptions using the Vocabulary of Interlinked Datasets (VoID), which can be automatically generated using the void-generator.

🌈 Features

Metadata Extraction: Functions to extract and load relevant metadata from SPARQL endpoints. These loaders are compatible with LangChain but are flexible enough to be used independently, providing metadata as JSON for custom vector store integration.
SPARQL Query Validation: A function to automatically parse and validate federated SPARQL queries against the VoID description of the target endpoints.
MCP server with tools to help LLM write SPARQL queries for a set of endpoints
Deployable Chat System: A reusable and containerized system for deploying an LLM-based chat service with a web UI, API, and vector database. This system helps users write SPARQL queries by leveraging endpoint metadata (WIP).
Live Example: Configuration for expasy.org/chat, an LLM-powered chat system supporting SPARQL query generation for endpoints maintained by the SIB.

[!TIP]

You can quickly check if an endpoint contains the expected metadata at sib-swiss.github.io/sparql-editor/check

🔌 MCP server

The server exposes a Model Context Protocol (MCP) endpoint to access biodata resources at the SIB, through their SPARQL endpoints, such as UniProt, Bgee, OMA, SwissLipids, Cellosaurus at chat.expasy.org/mcp

🛠️ Available tools

📝 Retrieve relevant documents (query examples and classes schema) to help writing SPARQL queries to access SIB biodata resources
- Arguments:
  - question (string): the user's question
  - potential_classes (list[string]): high level concepts and potential classes that could be found in the SPARQL endpoints
  - steps (list[string]): split the question in standalone smaller parts if relevant
🏷️ Retrieve relevant classes schema to help writing SPARQL queries to access SIB biodata resources
- Arguments:
  - classes (list[string]): high level concepts and potential classes that could be found in the SPARQL endpoints
📡 Execute a SPARQL query against a SPARQL endpoint
- Arguments:
  - query (string): a valid SPARQL query string
  - endpoint (string): the SPARQL endpoint URL to execute the query against

⚡️ Connect client to MCP server

Follow the instructions of your client, and use the URL of the public server: https://chat.expasy.org/mcp

For example, for GitHub Copilot in VSCode, to add a new MCP server through the VSCode UI:

Open side panel chat (ctrl+shift+i or cmd+shift+i), and make sure the mode is set to Agent in the bottom right
Open command palette (ctrl+shift+p or cmd+shift+p), and search for MCP: Open User Configuration, this will open a mcp.json file

📡 Use streamable HTTP server

Connect to a running streamable HTTP MCP server, such as the publicly available chat.expasy.org/mcp.

In your VSCode mcp.json you should have the following:

{
	"servers": {
		"expasy-mcp-http": {
			"url": "https://chat.expasy.org/mcp",
			"type": "http"
		}
	}
}

⌨️ Use stdio transport

uvx sparql-llm

Optionally you can provide the path to a custom settings JSON file to configure the server (e.g. the list of endpoints that will be indexed and available through the server), see the Settings class for detailed available settings.

Example settings file for your MCP server deployment:

{
    "app_org": "Your organization",
    "app_topics": "genes, proteins, lipids, chemical reactions, and metabolomics data",
    "endpoints" : [
        {
            "label": "UniProt",
            "endpoint_url": "https://sparql.uniprot.org/sparql/",
            "description": "UniProt is a comprehensive resource for protein sequence and annotation data."
        },
        {
            "label": "Bgee",
            "description": "Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species.",
            "endpoint_url": "https://www.bgee.org/sparql/",
            "homepage_url": "https://www.bgee.org/"
        }
    ]
}

Example mcp.json file to add and configure the MCP server in a client (e.g. VSCode):

{
  "servers": {
    "expasy-mcp": {
      "type": "stdio",
      "command": "uvx",
      "env": {
				"SETTINGS_FILEPATH": "/Users/you/sparql-mcp.json"
			},
      "args": [
        "sparql-llm"
      ]
    }
  }
}

[!IMPORTANT]

Click on Start just on top of "openroute-mcp" to start the connection to the MCP server.

You can click the wrench and screwdriver button 🛠️ (Configure Tools...) to enable/disable specific tools

[!NOTE]

More details available in the VSCode MCP official docs.

📦️ Reusable components

Installation

Requires Python >=3.10

pip install sparql-llm

Or with uv:

uv add sparql-llm

SPARQL query examples loader

Load SPARQL query examples defined using the SHACL ontology from a SPARQL endpoint. See github.com/sib-swiss/sparql-examples for more details on how to define the examples.

from sparql_llm import SparqlExamplesLoader

loader = SparqlExamplesLoader("https://sparql.uniprot.org/sparql/")
docs = loader.load()
print(len(docs))
print(docs[0].metadata)

You can provide the examples as a file if it is not integrated in the endpoint, e.g.:

loader = SparqlExamplesLoader("https://sparql.uniprot.org/sparql/", examples_file="uniprot_examples.ttl")

Refer to the LangChain documentation to figure out how to best integrate documents loaders to your system.

[!NOTE]

You can check the completeness of your examples against the endpoint schema using this notebook.

SPARQL endpoint schema loader

Generate a human-readable schema using the ShEx format to describe all classes of a SPARQL endpoint based on the VoID description present in the endpoint. Ideally the endpoint should also contain the ontology describing the classes, so the rdfs:label and rdfs:comment of the classes can be used to generate embeddings and improve semantic matching.

[!TIP]

Checkout the void-generator project to automatically generate VoID description for your endpoint.

from sparql_llm import SparqlVoidShapesLoader

loader = SparqlVoidShapesLoader("https://sparql.uniprot.org/sparql/")
docs = loader.load()
print(len(docs))
print(docs[0].metadata)

You can provide the VoID description as a file if it is not integrated in the endpoint, e.g.:

loader = SparqlVoidShapesLoader("https://sparql.uniprot.org/sparql/", void_file="uniprot_void.ttl")

The generated shapes are well-suited for use with a LLM or a human, as they provide clear information about which predicates are available for a class, and the corresponding classes or datatypes those predicates point to. Each object property references a list of classes rather than another shape, making each shape self-contained and interpretable on its own, e.g. for a Disease Annotation in UniProt:
up:Disease_Annotation {
a [ up:Disease_Annotation ] ;
up:sequence [ up:Chain_Annotation up:Modified_Sequence ] ;
rdfs:comment xsd:string ;
up:disease IRI
}

Generate complete ShEx shapes from VoID description

You can also generate the complete ShEx shapes for a SPARQL endpoint with:

from sparql_llm import get_shex_from_void

shex_str = get_shex_from_void("https://sparql.uniprot.org/sparql/")
print(shex_str)

Validate a SPARQL query based on VoID description

This takes a SPARQL query and validates the predicates/types used are compliant with the VoID description present in the SPARQL endpoint the query is executed on.

This function supports:

federated queries (VoID description will be automatically retrieved for each SERVICE call in the query),
path patterns (e.g. orth:organism/obo:RO_0002162/up:scientificName)

This function requires that at least one type is defined for each endpoint, but it will be able to infer types of subjects that are connected to the subject for which the type is defined.

It will return a list of issues described in natural language, with hints on how to fix them (by listing the available classes/predicates), which can be passed to an LLM as context to help it figuring out how to fix the query.

from sparql_llm import validate_sparql_with_void

sparql_query = """PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX orth: <http://purl.org/net/orth#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX lscr: <http://purl.org/lscr#>
PREFIX genex: <http://purl.org/genex#>
PREFIX sio: <http://semanticscience.org/resource/>
SELECT DISTINCT ?humanProtein ?orthologRatProtein ?orthologRatGene
WHERE {
    ?humanProtein a orth:Protein ;
        lscr:xrefUniprot <http://purl.uniprot.org/uniprot/Q9Y2T1> .
    ?orthologRatProtein a orth:Protein ;
        sio:SIO_010078 ?orthologRatGene ;
        orth:organism/obo:RO_0002162/up:name 'Rattus norvegicus' .
    ?cluster a orth:OrthologsCluster .
    ?cluster orth:hasHomologousMember ?node1 .
    ?cluster orth:hasHomologousMember ?node2 .
    ?node1 orth:hasHomologousMember* ?humanProtein .
    ?node2 orth:hasHomologousMember* ?orthologRatProtein .
    FILTER(?node1 != ?node2)
    SERVICE <https://www.bgee.org/sparql/> {
        ?orthologRatGene a orth:Gene ;
            genex:expressedIn ?anatEntity ;
            orth:organism ?ratOrganism .
        ?anatEntity rdfs:label 'brain' .
        ?ratOrganism obo:RO_0002162 taxon:10116 .
    }
}"""

issues = validate_sparql_with_void(sparql_query, "https://sparql.omabrowser.org/sparql/")
print("\n".join(issues))

🚀 Complete chat system

[!WARNING]

To deploy the complete chat system right now you will need to fork/clone this repository, change the configuration in src/sparql-llm/config.py and compose.yml, then deploy with docker/podman compose. It can easily be adapted to use any LLM served through an OpenAI-compatible API.

Requirements: Docker, nodejs (to build the frontend), and optionally uv if you want to run scripts outside of docker.

Explore and change the system configuration in src/sparql-llm/config.py

Create a .env file at the root of the repository to provide secrets and API keys:

CHAT_API_KEY=NOT_SO_SECRET_API_KEY_USED_BY_FRONTEND_TO_AVOID_SPAM_FROM_CRAWLERS
LOGS_API_KEY=SECRET_PASSWORD_TO_EASILY_ACCESS_LOGS_THROUGH_THE_API

OPENROUTER_API_KEY=sk-YYY
OPENAI_API_KEY=sk-proj-YYY

LANGFUSE_HOST=https://cloud.langfuse.com
LANGFUSE_PUBLIC_KEY=
LANGFUSE_SECRET_KEY=

Optionally, if you made changes to it, build the chat UI webpage:
```
cd chat-with-context
npm i
npm run build:demo
cd ..
```
You can change the UI around the chat in chat-with-context/demo/index.html
Start the vector database and web server locally for development, with code from the src folder mounted in the container and automatic API reload on changes to the code:
```
docker compose up
```
- Chat web UI available at http://localhost:8000
- OpenAPI Swagger UI available at http://localhost:8000/docs
- Vector database dashboard UI available at http://localhost:6333/dashboard
In production, you will need to make some changes to the compose.prod.yml file to adapt it to your server/proxy setup:
```
docker compose -f compose.prod.yml up
```
Then run the indexing script manually from within the container to index the SPARQL endpoints (need to do it once):
```
docker compose -f compose.prod.yml exec api uv run src/sparql_llm/indexing/index_resources.py
```
All data from the containers are stored persistently in the data folder (e.g. vectordb indexes and endpoints metadata)

[!NOTE]

Query the chat API:

curl -X POST http://localhost:8000/chat -H "Content-Type: application/json" -d '{"messages": [{"role": "user", "content": "What is the HGNC symbol for the P68871 protein?"}], "model": "mistralai/mistral-small-latest", "stream": true}'

[!WARNING]

Experimental entities indexing: it can take a lot of time to generate embeddings for millions of entities. So we recommend to run the script to generate embeddings on a machine with GPU (does not need to be a powerful one, but at least with a GPU, checkout fastembed GPU docs to install the GPU drivers and dependencies)
docker compose up vectordb -d
VECTORDB_URL=http://localhost:6334 nohup uv run --extra gpu src/sparql_llm/indexing/index_entities.py --gpu &
Then move the entities collection containing the embeddings in data/qdrant/collections/entities before starting the stack

🥇 Benchmarks

There are a few benchmarks available for the system:

The tests/benchmark.py script will run a list of questions and compare their results to a reference SPARQL queries, with and without query validation, against a list of LLM providers. You will need to change the list of queries if you want to use it for different endpoints. You will need to start the stack in development mode to run it:
```
uv run --env-file .env tests/benchmark.py
```
It takes time to run and will log the output and results in data/benchmarks
Follow these instructions to run the Text2SPARQL Benchmark.

For biodata benchmark:

docker compose up -d
VECTORDB_URL=http://localhost:6334 uv run tests/benchmark_biodata.py

🧑‍🏫 Tutorial

There is a step by step tutorial to show how a LLM-based chat system for generating SPARQL queries can be easily built here: https://sib-swiss.github.io/sparql-llm

🧑‍💻 Contributing

Checkout the CONTRIBUTING.md page.

🪶 How to cite this work

If you reuse any part of this work, please cite at least one of our articles below:

SPARQL-LLM: Real-Time SPARQL Query Generation from Natural Language Questions

@misc{smeros2025sparqlllmrealtimesparqlquery,
      title={SPARQL-LLM: Real-Time SPARQL Query Generation from Natural Language Questions},
      author={Panayiotis Smeros and Vincent Emonet and Ruijie Wang and Ana-Claudia Sima and Tarcisio Mendes de Farias},
      year={2025},
      eprint={2512.14277},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2512.14277},
}

LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs

@conference{emonet2025llm,
    title={LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs},
    author={Emonet, Vincent and Bolleman, Jerven and Duvaud, Severine and Mendes de Farias, Tarcisio and Sima, Ana Claudia},
    year = 2025,
    note={CEUR-WS.org, online \url{https://ceur-ws.org/Vol-3953/355.pdf}},
    booktitle = {ISWC 2024 Special Session on Harmonising Generative AI and Semantic Web Technologies, November 13, 2024, Baltimore, Maryland},
    volume = 3953,
    series = {CEUR Workshop Proceedings},
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.4

Jan 21, 2026

0.1.3

Jan 20, 2026

0.1.2

Oct 14, 2025

0.1.1

Oct 7, 2025

0.1.0

Oct 6, 2025

0.0.8

Feb 19, 2025

0.0.7

Feb 19, 2025

0.0.6

Feb 18, 2025

0.0.5

Feb 18, 2025

0.0.4

Feb 17, 2025

0.0.3

Oct 28, 2024

0.0.2

Sep 19, 2024

0.0.1

Sep 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparql_llm-0.1.4.tar.gz (8.7 MB view details)

Uploaded Jan 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sparql_llm-0.1.4-py3-none-any.whl (123.2 kB view details)

Uploaded Jan 21, 2026 Python 3

File details

Details for the file sparql_llm-0.1.4.tar.gz.

File metadata

Download URL: sparql_llm-0.1.4.tar.gz
Upload date: Jan 21, 2026
Size: 8.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sparql_llm-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`f82abda08c8d4d985b2d2eda7ded87ef4bd36d045d32f8a4664e14ae87ee7bf5`
MD5	`0d2399a089ba5ac0aa5b92f7766f3c70`
BLAKE2b-256	`d5c500901b81054062ac55c1c7aa36cd57921b3a1636ad0d761024b2a16dec57`

See more details on using hashes here.

File details

Details for the file sparql_llm-0.1.4-py3-none-any.whl.

File metadata

Download URL: sparql_llm-0.1.4-py3-none-any.whl
Upload date: Jan 21, 2026
Size: 123.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sparql_llm-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a40a57c7e79e27e70f45aeecf29997496dd652b82d6bcb7d7c6c66e1e086b285`
MD5	`4f5399ccf387f5cfc002e0956a83d0c8`
BLAKE2b-256	`e5ec262c9ee81e7728b64ed3224fa59ea3381a0abd603ff2ba0f618fd4a676cd`

See more details on using hashes here.

sparql-llm 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

✨ SPARQL query generation with LLMs 🦜

🌈 Features

🔌 MCP server

🛠️ Available tools

⚡️ Connect client to MCP server

📡 Use streamable HTTP server

⌨️ Use stdio transport

📦️ Reusable components

Installation

SPARQL query examples loader

SPARQL endpoint schema loader

Generate complete ShEx shapes from VoID description

Validate a SPARQL query based on VoID description

🚀 Complete chat system

🥇 Benchmarks

🧑‍🏫 Tutorial

🧑‍💻 Contributing

🪶 How to cite this work

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes