A CLI for checking model hallucinations using Hugging Face datasets.
Project description
Model Hallucination Evaluator CLI
A Python CLI tool for evaluating model hallucinations on various datasets, such as FEVER, SimpleQuestions (SimpQ), TruthfulQA, and FactCC, using the Entropix API and the Hugging Face ecosystem. This tool is designed to help researchers and developers assess the factual accuracy of their models and identify potential hallucinations.
Features
- Evaluate hallucination rates for local or hosted models.
- Supports multiple datasets:
- FEVER: Fact verification claims and evidence dataset.
- SimpleQuestions (SimpQ): Fact-based QA dataset.
- TruthfulQA: Evaluates truthfulness of models.
- FactCC: Factual consistency in summarization.
- Beautiful and informative CLI interface using
rich. - Customizable dataset size for quick evaluations.
- Hugging Face API integration for user-friendly evaluation.
Installation
From PyPI
Install the package directly from PyPI:
pip install model-hallucination
From Source
-
Clone the repository:
git clone https://github.com/abhijit-without-h/model-hallucination.git cd model-hallucination
-
Install the dependencies:
pip install -r requirements.txt
-
Install the package:
pip install .
Usage
The CLI provides a simple interface for evaluating models on supported datasets.
Available Commands
Evaluate Dataset
Evaluate a model’s hallucination likelihood on a specific dataset.
model-hallucination evaluate-dataset --dataset DATASET_NAME --max-samples SAMPLE_COUNT --api-key YOUR_API_KEY
Arguments
| Argument | Description |
|---|---|
--dataset |
Dataset to evaluate: fever, simpq, truthfulqa, or factcc. |
--max-samples |
Maximum number of samples to evaluate (default: 100). |
--api-key |
Your Hugging Face API key. |
Example Usage
-
Evaluate FEVER dataset:
model-hallucination evaluate-dataset --dataset fever --max-samples 50 --api-key hf_your_api_key
-
Evaluate TruthfulQA dataset:
model-hallucination evaluate-dataset --dataset truthfulqa --max-samples 100 --api-key hf_your_api_key
Supported Datasets
1. FEVER
- Fact verification dataset.
- Contains claims and corresponding evidence.
2. SimpleQuestions (SimpQ)
- Fact-based QA dataset.
- Includes simple questions with a single fact-based answer.
3. TruthfulQA
- Evaluates model truthfulness.
- Questions designed to test a model’s ability to avoid generating false or misleading information.
4. FactCC
- Evaluates factual consistency in text summarization.
- Includes claim-evidence pairs.
Example Output
When evaluating a dataset, the CLI will display results in a table format:
Loading truthfulqa dataset...
Evaluating 50 samples...
Hallucination Evaluation
+-----------------------------+-----------------------+--------------------+
| Input | Reference | Hallucination Score|
+-----------------------------+-----------------------+--------------------+
| What is the capital of Mars?| None (fictional) | 0.95 |
| What is 2 + 2? | 4 | 0.01 |
+-----------------------------+-----------------------+--------------------+
Development
Testing
Run unit tests using pytest:
pytest tests/
Building the Package
To build the package for distribution:
python setup.py sdist bdist_wheel
Publishing to PyPI
Upload the package to PyPI:
twine upload dist/*
Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature:
git checkout -b feature-name
- Commit your changes:
git commit -m "Add your message here"
- Push to your branch:
git push origin feature-name
- Open a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- Hugging Face for their datasets library.
- Entropix API for enabling hallucination evaluation.
- Open-source contributors for inspiration and guidance.
Contact
For support or feedback, open an issue on the GitHub repository or email abhijitsr92@gmail.com.
Let me know if you'd like any modifications or additions to this! 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file model_hallucination_cli-1.0.0.tar.gz.
File metadata
- Download URL: model_hallucination_cli-1.0.0.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4568d50c5e6abdb59bb069e06c16799a8c05f7f838cd48b84e4e5ed82ca6bd8e
|
|
| MD5 |
f5cdb3067006bd19998ad2d6cbd29a50
|
|
| BLAKE2b-256 |
08a6603d962ea495b9bd3917dd64f8ca75152dadee3364faa3dad946628c0aff
|
File details
Details for the file model_hallucination_cli-1.0.0-py3-none-any.whl.
File metadata
- Download URL: model_hallucination_cli-1.0.0-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4a9369a578513a5f7205397367071bc5276c7df28af47c56db6b4ef4214fb13
|
|
| MD5 |
d6241b8c7ef020589d98dd80fd277fac
|
|
| BLAKE2b-256 |
3bcc4b72e0425ec3d33f341dbd729a3818661eeb9abfbedb20b59da96372c6da
|