Skip to main content

Refactx package

Project description

ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation

Riccardo Pozzi, Matteo Palmonari, Andrea Coletta, Luigi Bellomarini, Jens Lehmann, Sahar Vahdati

ISWC 2025 Arxiv HF GitHub license

The implementation corresponding to the ISWC 2025 paper is available at the ISWC2025 branch.

A preprint that has not undergone peer review is available at https://arxiv.org/abs/2508.16983.

We present ReFactX, a scalable method that enables LLMs to access external knowledge without depending on retrievers or auxiliary models. Our approach uses constrained generation with a pre-built prefix-tree index. Triples from Wikidata are verbalized in 800 million textual facts, tokenized, and indexed in a prefix tree for efficient access. During inference, to acquire external knowledge, the LLM generates facts with constrained generation which allows only sequences of tokens that form an existing fact.

ReFactX Example

This repository contains the source code for using ReFactX and reproducing our work accepted at ISWC 2025.

Setup

  • install the requirements pip install -r requirements.txt
  • prepare the .env file: cp env-sample.txt .env, then edit .env (can be skipped if using the simple index in the try_refactx notebook)

Try ReFactX

For quickly trying ReFactX with an in-memory prefix tree (derived from a 31k-facts knowledge base) use the notebook try_refactx.ipynb.

Wikidata Prefix Tree

Refer to PrefixTree.md for creating the Wikidata prefix tree we used in our work.

Experiments

To reproduce our experiments use the eval.py script replacing INDEX, MODEL, and DATASET according to your needs (each of them is a python file to import).

python eval.py --index INDEX --model MODEL --dataset DATASET

Throughtput

For the throughput experiment run

python throughput.py --model MODEL --index INDEX --max-tokens 4001 --output out.json [--unconstrained-generation]

Cite

@InProceedings{10.1007/978-3-032-09527-5_16,
      author="Pozzi, Riccardo
      and Palmonari, Matteo
      and Coletta, Andrea
      and Bellomarini, Luigi
      and Lehmann, Jens
      and Vahdati, Sahar",
      editor="Garijo, Daniel
      and Kirrane, Sabrina
      and Salatino, Angelo
      and Shimizu, Cogan
      and Acosta, Maribel
      and Nuzzolese, Andrea Giovanni
      and Ferrada, Sebasti{\'a}n
      and Soulard, Thibaut
      and Kozaki, Kouji
      and Takeda, Hideaki
      and Gentile, Anna Lisa",
      title="ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation",
      booktitle="The Semantic Web -- ISWC 2025",
      year="2026",
      publisher="Springer Nature Switzerland",
      address="Cham",
      pages="290--308",
      isbn="978-3-032-09527-5",
      doi="10.1007/978-3-032-09527-5_16",
      url="https://doi.org/10.1007/978-3-032-09527-5_16"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refactx-0.1.0.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refactx-0.1.0-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file refactx-0.1.0.tar.gz.

File metadata

  • Download URL: refactx-0.1.0.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for refactx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 968c0abb18e9adb182edafb4655f5680d77d4e17caf7c640efd95e85f17fad15
MD5 5cf33187185a50f74c0399dfaa09eb0a
BLAKE2b-256 3e5b47d7523178174c1ea8cf4b4789fee9ee17981a2e9017b959de68e73058bc

See more details on using hashes here.

File details

Details for the file refactx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: refactx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for refactx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6bac4df9a43d780039258e0fb0960a2e2e3bb4e6802d6186a9af520df820523
MD5 05dcb141b6e7e4e01c5476fe5492cb87
BLAKE2b-256 82b595822e22cff5b0276394747b97f07a322601333cbc06e998c057a89a829b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page