CLI tool to evaluate ChatGPT factuality on MMLU benchmark.
Project description
Factly is a modern CLI tool designed to evaluate the factuality of Large Language Models (LLMs) on the MMLU (Massive Multitask Language Understanding) benchmark. It provides a robust framework for prompt engineering experiments and factual accuracy assessment.
Features
Evaluate LLM factuality on the MMLU benchmark with detailed results
Support for various prompt engineering experiments via configurable system instructions
Generate comparative visualizations of factuality scores across models and prompts
Structured output for easy analysis and comparison
Built with modern Python tooling (Python 3.12, uv, click, pydantic)
Extensible and reproducible evaluation workflows
Quick Start
# Run factuality evaluation with default settings
factly evaluate
# Run evaluation and generate plots
factly evaluate --plot
# Get help on all available options
factly evaluate --help
That’s it! The tool uses optimized default parameters and saves all outputs to the output directory.
For more advanced usage, including saving results and evaluation, see the Usage Guide.
Project Information
Factly is released under the MIT License, its documentation lives at Read the Docs, the code on GitHub, and the latest release on PyPI. It’s rigorously tested on Python 3.12+.
If you’d like to contribute to Factly you’re most welcome!
Support
Should you have any question, any remark, or if you find a bug, or if there is something you can’t do with the Factly, please open an issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file factly_eval-1.0.1.tar.gz.
File metadata
- Download URL: factly_eval-1.0.1.tar.gz
- Upload date:
- Size: 148.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd8ab44dad4e380adb596335e5a04904113e6c85a74f978bffef337f7cf1fcec
|
|
| MD5 |
f6b9387574f79b7e338b9f824dbd54c8
|
|
| BLAKE2b-256 |
616fb58dd73e64039f713b49f2197a85fdf9626faf0a0f2dab88b4a86be014f5
|
Provenance
The following attestation bundles were made for factly_eval-1.0.1.tar.gz:
Publisher:
cd.yml on sergeyklay/factly
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
factly_eval-1.0.1.tar.gz -
Subject digest:
fd8ab44dad4e380adb596335e5a04904113e6c85a74f978bffef337f7cf1fcec - Sigstore transparency entry: 204914267
- Sigstore integration time:
-
Permalink:
sergeyklay/factly@423a9c7d9417d662d6e59aacb3333535c9698b42 -
Branch / Tag:
refs/tags/1.0.1 - Owner: https://github.com/sergeyklay
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@423a9c7d9417d662d6e59aacb3333535c9698b42 -
Trigger Event:
push
-
Statement type:
File details
Details for the file factly_eval-1.0.1-py3-none-any.whl.
File metadata
- Download URL: factly_eval-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46e9f380b387c40a3aacf666b4645707448589bb061e01acc04b00af5ca22d02
|
|
| MD5 |
582ee13131e5a04a50ed4560cf525960
|
|
| BLAKE2b-256 |
a82484304b0e7eb012541031b4187668157510d31b95bbb3ce7ed2f16238ed4f
|
Provenance
The following attestation bundles were made for factly_eval-1.0.1-py3-none-any.whl:
Publisher:
cd.yml on sergeyklay/factly
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
factly_eval-1.0.1-py3-none-any.whl -
Subject digest:
46e9f380b387c40a3aacf666b4645707448589bb061e01acc04b00af5ca22d02 - Sigstore transparency entry: 204914276
- Sigstore integration time:
-
Permalink:
sergeyklay/factly@423a9c7d9417d662d6e59aacb3333535c9698b42 -
Branch / Tag:
refs/tags/1.0.1 - Owner: https://github.com/sergeyklay
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@423a9c7d9417d662d6e59aacb3333535c9698b42 -
Trigger Event:
push
-
Statement type: