Skip to main content

An Open-source Factuality Evaluation Demo for LLMs

Project description

OpenFactCheck Logo

An Open-source Factuality Evaluation Demo for LLMs


Release Docs
License Python Version PyPI Latest Release arXiv DOI


OverviewInstallationUsageHuggingFace DemoDocumentation

Overview

OpenFactCheck is an open-source repository designed to facilitate the evaluation and enhancement of factuality in responses generated by large language models (LLMs). This project aims to integrate various fact-checking tools into a unified framework and provide comprehensive evaluation pipelines, with built-in solvers for English and five additional languages (Arabic, Bulgarian, Chinese, Italian, and Urdu).

Supported Solvers

OpenFactCheck ships with several fact-checking pipelines you can use out of the box:

English

  • factool — pipeline from FacTool
  • factcheckgpt — pipeline from FactCheck-GPT
  • rarr — Retrieval-Augmented Research and Revision

Multilingual (Arabic, Bulgarian, Chinese, and Italian are new in v1.1.0)

  • arabicfactcheck — Arabic claim verification
  • bulgarianfactcheck — Bulgarian claim verification
  • chinesefactcheck — Chinese claim verification
  • italianfactcheck — Italian claim verification
  • urdufactcheck — Urdu claim verification

Each multilingual solver follows the same five-stage pattern: cp (claim processing) → rtv / rtv_tr / rtv_thtr (retrieval variants) → vfr (verification).

Utility

  • dummy — passthrough/no-op (testing)
  • tutorial — minimal example for building your own solver
  • webservice — wrap any HTTP API as a solver

Installation

You can install the package from PyPI using pip:

pip install openfactcheck

Usage

First, you need to initialize the OpenFactCheckConfig object and then the OpenFactCheck object.

from openfactcheck import OpenFactCheck, OpenFactCheckConfig

# Initialize the OpenFactCheck object
config = OpenFactCheckConfig()
ofc = OpenFactCheck(config)

Response Evaluation

You can evaluate a response using the ResponseEvaluator class.

# Evaluate a response
result = ofc.ResponseEvaluator.evaluate(response: str)

LLM Evaluation

We provide FactQA, a dataset of 6480 questions for evaluating LLMs. Onc you have the responses from the LLM, you can evaluate them using the LLMEvaluator class.

# Evaluate an LLM
result = ofc.LLMEvaluator.evaluate(model_name: str,
                                   input_path: str)

Checker Evaluation

We provide FactBench, a dataset of 4507 claims for evaluating fact-checkers. Once you have the responses from the fact-checker, you can evaluate them using the CheckerEvaluator class.

# Evaluate a fact-checker
result = ofc.CheckerEvaluator.evaluate(checker_name: str,
                                       input_path: str)

Cite

If you use OpenFactCheck in your research, please cite the following:

@article{wang2024openfactcheck,
  title        = {OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs},
  author       = {Wang, Yuxia and Wang, Minghan and Iqbal, Hasan and Georgiev, Georgi and Geng, Jiahui and Nakov, Preslav},
  journal      = {arXiv preprint arXiv:2405.05583},
  year         = {2024}
}

@article{iqbal2024openfactcheck,
  title        = {OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs},
  author       = {Iqbal, Hasan and Wang, Yuxia and Wang, Minghan and Georgiev, Georgi and Geng, Jiahui and Gurevych, Iryna and Nakov, Preslav},
  journal      = {arXiv preprint arXiv:2408.11832},
  year         = {2024}
}

@software{hasan_iqbal_2024_13358665,
  author       = {Hasan Iqbal},
  title        = {hasaniqbal777/OpenFactCheck: v1.1.0},
  month        = {aug},
  year         = {2024},
  publisher    = {Zenodo},
  version      = {v1.1.0},
  doi          = {10.5281/zenodo.13358665},
  url          = {https://doi.org/10.5281/zenodo.13358665}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openfactcheck-1.1.2.tar.gz (6.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openfactcheck-1.1.2-py3-none-any.whl (6.2 MB view details)

Uploaded Python 3

File details

Details for the file openfactcheck-1.1.2.tar.gz.

File metadata

  • Download URL: openfactcheck-1.1.2.tar.gz
  • Upload date:
  • Size: 6.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for openfactcheck-1.1.2.tar.gz
Algorithm Hash digest
SHA256 e17a73b6a233648ad39310aa8a464e419cdef9d174fa56def9c795090d4e6c92
MD5 0dd8f4a5230e1807f537ad8cfedc5cd0
BLAKE2b-256 f373abb04f77963e0198ebd04b9924020b354e65a2b410f58004f9f09e4e3e33

See more details on using hashes here.

File details

Details for the file openfactcheck-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: openfactcheck-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for openfactcheck-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2b00259e9bcf4db02a05542e68da7e70e450fdd7862ce0f032395f72dfd942e5
MD5 144b81e9a42da1038dc256b6be4a534d
BLAKE2b-256 bd426cee61f0bd6dd7ca74cad4e9584026693eb77eddf55926f1db5629e88931

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page