Skip to main content

CLI tool to evaluate ChatGPT factuality on MMLU benchmark.

Project description

CI Coverage Docs

Factly is a modern CLI tool designed to evaluate the factuality of Large Language Models (LLMs) on the MMLU (Massive Multitask Language Understanding) benchmark. It provides a robust framework for prompt engineering experiments and factual accuracy assessment.

Features

  • Evaluate LLM factuality on the MMLU benchmark with detailed results

  • Support for various prompt engineering experiments via configurable system instructions

  • Generate comparative visualizations of factuality scores across models and prompts

  • Structured output for easy analysis and comparison

  • Built with modern Python tooling (Python 3.12, uv, click, pydantic)

  • Extensible and reproducible evaluation workflows

Quick Start

# Run factuality evaluation with default settings
factly evaluate

# Run evaluation and generate plots
factly evaluate --plot

# Get help on all available options
factly evaluate --help

That’s it! The tool uses optimized default parameters and saves all outputs to the output directory.

For more advanced usage, including saving results and evaluation, see the Usage Guide.

Project Information

Factly is released under the MIT License, its documentation lives at Read the Docs, the code on GitHub, and the latest release on PyPI. It’s rigorously tested on Python 3.12+.

If you’d like to contribute to Factly you’re most welcome!

Support

Should you have any question, any remark, or if you find a bug, or if there is something you can’t do with the Factly, please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factly_eval-1.0.1.tar.gz (148.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

factly_eval-1.0.1-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file factly_eval-1.0.1.tar.gz.

File metadata

  • Download URL: factly_eval-1.0.1.tar.gz
  • Upload date:
  • Size: 148.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for factly_eval-1.0.1.tar.gz
Algorithm Hash digest
SHA256 fd8ab44dad4e380adb596335e5a04904113e6c85a74f978bffef337f7cf1fcec
MD5 f6b9387574f79b7e338b9f824dbd54c8
BLAKE2b-256 616fb58dd73e64039f713b49f2197a85fdf9626faf0a0f2dab88b4a86be014f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for factly_eval-1.0.1.tar.gz:

Publisher: cd.yml on sergeyklay/factly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file factly_eval-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: factly_eval-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for factly_eval-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 46e9f380b387c40a3aacf666b4645707448589bb061e01acc04b00af5ca22d02
MD5 582ee13131e5a04a50ed4560cf525960
BLAKE2b-256 a82484304b0e7eb012541031b4187668157510d31b95bbb3ce7ed2f16238ed4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for factly_eval-1.0.1-py3-none-any.whl:

Publisher: cd.yml on sergeyklay/factly

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page