Skip to main content

An Open-source Factuality Evaluation Demo for LLMs

Project description

OpenFactCheck Logo

An Open-source Factuality Evaluation Demo for LLMs


Release Docs
License: Apache-2.0 Python Version PyPI Latest Release arXiv DOI


OverviewInstallationUsageHuggingFace DemoDocumentation

Overview

OpenFactCheck is an open-source repository designed to facilitate the evaluation and enhancement of factuality in responses generated by large language models (LLMs). This project aims to integrate various fact-checking tools into a unified framework and provide comprehensive evaluation pipelines.

Installation

You can install the package from PyPI using pip:

pip install openfactcheck

Usage

First, you need to initialize the OpenFactCheckConfig object and then the OpenFactCheck object.

from openfactcheck import OpenFactCheck, OpenFactCheckConfig

# Initialize the OpenFactCheck object
config = OpenFactCheckConfig()
ofc = OpenFactCheck(config)

Response Evaluation

You can evaluate a response using the ResponseEvaluator class.

# Evaluate a response
result = ofc.ResponseEvaluator.evaluate(response: str)

LLM Evaluation

We provide FactQA, a dataset of 6480 questions for evaluating LLMs. Onc you have the responses from the LLM, you can evaluate them using the LLMEvaluator class.

# Evaluate an LLM
result = ofc.LLMEvaluator.evaluate(model_name: str,
                                   input_path: str)

Checker Evaluation

We provide FactBench, a dataset of 4507 claims for evaluating fact-checkers. Once you have the responses from the fact-checker, you can evaluate them using the CheckerEvaluator class.

# Evaluate a fact-checker
result = ofc.CheckerEvaluator.evaluate(checker_name: str,
                                       input_path: str)

Cite

If you use OpenFactCheck in your research, please cite the following:

@article{wang2024openfactcheck,
  title        = {OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs},
  author       = {Wang, Yuxia and Wang, Minghan and Iqbal, Hasan and Georgiev, Georgi and Geng, Jiahui and Nakov, Preslav},
  journal      = {arXiv preprint arXiv:2405.05583},
  year         = {2024}
}

@article{iqbal2024openfactcheck,
  title        = {OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs},
  author       = {Iqbal, Hasan and Wang, Yuxia and Wang, Minghan and Georgiev, Georgi and Geng, Jiahui and Gurevych, Iryna and Nakov, Preslav},
  journal      = {arXiv preprint arXiv:2408.11832},
  year         = {2024}
}

@software{hasan_iqbal_2024_13358665,
  author       = {Hasan Iqbal},
  title        = {hasaniqbal777/OpenFactCheck: v0.3.0},
  month        = {aug},
  year         = {2024},
  publisher    = {Zenodo},
  version      = {v0.3.0},
  doi          = {10.5281/zenodo.13358665},
  url          = {https://doi.org/10.5281/zenodo.13358665}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openfactcheck-0.3.10rc1.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

openfactcheck-0.3.10rc1-py3-none-any.whl (6.1 MB view details)

Uploaded Python 3

File details

Details for the file openfactcheck-0.3.10rc1.tar.gz.

File metadata

  • Download URL: openfactcheck-0.3.10rc1.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for openfactcheck-0.3.10rc1.tar.gz
Algorithm Hash digest
SHA256 a8cab935a7da6593fa5b3f489d51d0b7801abd95a1b65dc89b2f86c5dd5e02a3
MD5 8d685b18d87d9ef48a59d9ec2ec63502
BLAKE2b-256 dd9cc9c2101f793037d4ff911604530bdaad3fc43edf82d15e5343b17916f39e

See more details on using hashes here.

File details

Details for the file openfactcheck-0.3.10rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for openfactcheck-0.3.10rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 a72b976eeddaea7cd68d106466b6b7443e73133a0fcf406eac5d9ca0a01da51a
MD5 5ce325262b3dadf37402dff6ce84d53e
BLAKE2b-256 992b6b9e61af2a76e640d7fc4b7219546070d9005ebdd5a07ab3f13eeaf5add1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page