Refactx package
Project description
ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation
Riccardo Pozzi, Matteo Palmonari, Andrea Coletta, Luigi Bellomarini, Jens Lehmann, Sahar Vahdati
The implementation corresponding to the ISWC 2025 paper is available at the ISWC2025 branch.
A preprint that has not undergone peer review is available at https://arxiv.org/abs/2508.16983.
We present ReFactX, a scalable method that enables LLMs to access external knowledge without depending on retrievers or auxiliary models. Our approach uses constrained generation with a pre-built prefix-tree index. Triples from Wikidata are verbalized in 800 million textual facts, tokenized, and indexed in a prefix tree for efficient access. During inference, to acquire external knowledge, the LLM generates facts with constrained generation which allows only sequences of tokens that form an existing fact.
This repository contains the source code for using ReFactX and reproducing our work accepted at ISWC 2025.
Setup
- install the requirements
pip install -r requirements.txt - prepare the
.envfile:cp env-sample.txt .env, then edit.env(can be skipped if using the simple index in the try_refactx notebook)
Try ReFactX
For quickly trying ReFactX with an in-memory prefix tree (derived from a 31k-facts knowledge base) use the notebook try_refactx.ipynb.
Wikidata Prefix Tree
Refer to PrefixTree.md for creating the Wikidata prefix tree we used in our work.
Experiments
To reproduce our experiments use the eval.py script replacing INDEX, MODEL, and DATASET according to your needs (each of them is a python file to import).
python eval.py --index INDEX --model MODEL --dataset DATASET
Throughtput
For the throughput experiment run
python throughput.py --model MODEL --index INDEX --max-tokens 4001 --output out.json [--unconstrained-generation]
Cite
@InProceedings{10.1007/978-3-032-09527-5_16,
author="Pozzi, Riccardo
and Palmonari, Matteo
and Coletta, Andrea
and Bellomarini, Luigi
and Lehmann, Jens
and Vahdati, Sahar",
editor="Garijo, Daniel
and Kirrane, Sabrina
and Salatino, Angelo
and Shimizu, Cogan
and Acosta, Maribel
and Nuzzolese, Andrea Giovanni
and Ferrada, Sebasti{\'a}n
and Soulard, Thibaut
and Kozaki, Kouji
and Takeda, Hideaki
and Gentile, Anna Lisa",
title="ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation",
booktitle="The Semantic Web -- ISWC 2025",
year="2026",
publisher="Springer Nature Switzerland",
address="Cham",
pages="290--308",
isbn="978-3-032-09527-5",
doi="10.1007/978-3-032-09527-5_16",
url="https://doi.org/10.1007/978-3-032-09527-5_16"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file refactx-0.1.0.tar.gz.
File metadata
- Download URL: refactx-0.1.0.tar.gz
- Upload date:
- Size: 31.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
968c0abb18e9adb182edafb4655f5680d77d4e17caf7c640efd95e85f17fad15
|
|
| MD5 |
5cf33187185a50f74c0399dfaa09eb0a
|
|
| BLAKE2b-256 |
3e5b47d7523178174c1ea8cf4b4789fee9ee17981a2e9017b959de68e73058bc
|
File details
Details for the file refactx-0.1.0-py3-none-any.whl.
File metadata
- Download URL: refactx-0.1.0-py3-none-any.whl
- Upload date:
- Size: 34.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6bac4df9a43d780039258e0fb0960a2e2e3bb4e6802d6186a9af520df820523
|
|
| MD5 |
05dcb141b6e7e4e01c5476fe5492cb87
|
|
| BLAKE2b-256 |
82b595822e22cff5b0276394747b97f07a322601333cbc06e998c057a89a829b
|