A CLI for checking model hallucinations using Hugging Face datasets.

Project description

Model Hallucination Evaluator CLI

A Python CLI tool for evaluating model hallucinations on various datasets, such as FEVER, SimpleQuestions (SimpQ), TruthfulQA, and FactCC, using the Entropix API and the Hugging Face ecosystem. This tool is designed to help researchers and developers assess the factual accuracy of their models and identify potential hallucinations.

Features

Evaluate hallucination rates for local or hosted models.
Supports multiple datasets:
- FEVER: Fact verification claims and evidence dataset.
- SimpleQuestions (SimpQ): Fact-based QA dataset.
- TruthfulQA: Evaluates truthfulness of models.
- FactCC: Factual consistency in summarization.
Beautiful and informative CLI interface using rich.
Customizable dataset size for quick evaluations.
Hugging Face API integration for user-friendly evaluation.

Installation

From PyPI

Install the package directly from PyPI:

pip install model-hallucination

From Source

Clone the repository:

git clone https://github.com/abhijit-without-h/model-hallucination.git
cd model-hallucination

Install the dependencies:
```
pip install -r requirements.txt
```
Install the package:
```
pip install .
```

Usage

The CLI provides a simple interface for evaluating models on supported datasets.

Available Commands

Evaluate Dataset

Evaluate a model’s hallucination likelihood on a specific dataset.

model-hallucination evaluate-dataset --dataset DATASET_NAME --max-samples SAMPLE_COUNT --api-key YOUR_API_KEY

Arguments

Argument	Description
`--dataset`	Dataset to evaluate: `fever`, `simpq`, `truthfulqa`, or `factcc`.
`--max-samples`	Maximum number of samples to evaluate (default: `100`).
`--api-key`	Your Hugging Face API key.

Example Usage

Evaluate FEVER dataset:

model-hallucination evaluate-dataset --dataset fever --max-samples 50 --api-key hf_your_api_key

Evaluate TruthfulQA dataset:

model-hallucination evaluate-dataset --dataset truthfulqa --max-samples 100 --api-key hf_your_api_key

Supported Datasets

1. FEVER

Fact verification dataset.
Contains claims and corresponding evidence.

2. SimpleQuestions (SimpQ)

Fact-based QA dataset.
Includes simple questions with a single fact-based answer.

3. TruthfulQA

Evaluates model truthfulness.
Questions designed to test a model’s ability to avoid generating false or misleading information.

4. FactCC

Evaluates factual consistency in text summarization.
Includes claim-evidence pairs.

Example Output

When evaluating a dataset, the CLI will display results in a table format:

Loading truthfulqa dataset...
Evaluating 50 samples...

Hallucination Evaluation
+-----------------------------+-----------------------+--------------------+
| Input                       | Reference            | Hallucination Score|
+-----------------------------+-----------------------+--------------------+
| What is the capital of Mars?| None (fictional)     | 0.95               |
| What is 2 + 2?              | 4                    | 0.01               |
+-----------------------------+-----------------------+--------------------+

Development

Testing

Run unit tests using pytest:

pytest tests/

Building the Package

To build the package for distribution:

python setup.py sdist bdist_wheel

Publishing to PyPI

Upload the package to PyPI:

twine upload dist/*

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch for your feature:
```
git checkout -b feature-name
```
Commit your changes:
```
git commit -m "Add your message here"
```
Push to your branch:
```
git push origin feature-name
```
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Hugging Face for their datasets library.
Entropix API for enabling hallucination evaluation.
Open-source contributors for inspiration and guidance.

Contact

For support or feedback, open an issue on the GitHub repository or email abhijitsr92@gmail.com.

Let me know if you'd like any modifications or additions to this! 🚀

Project details

Release history Release notifications | RSS feed

This version

1.0.0

Nov 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model_hallucination_cli-1.0.0.tar.gz (5.2 kB view details)

Uploaded Nov 30, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

model_hallucination_cli-1.0.0-py3-none-any.whl (6.1 kB view details)

Uploaded Nov 30, 2024 Python 3

File details

Details for the file model_hallucination_cli-1.0.0.tar.gz.

File metadata

Download URL: model_hallucination_cli-1.0.0.tar.gz
Upload date: Nov 30, 2024
Size: 5.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for model_hallucination_cli-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`4568d50c5e6abdb59bb069e06c16799a8c05f7f838cd48b84e4e5ed82ca6bd8e`
MD5	`f5cdb3067006bd19998ad2d6cbd29a50`
BLAKE2b-256	`08a6603d962ea495b9bd3917dd64f8ca75152dadee3364faa3dad946628c0aff`

See more details on using hashes here.

File details

Details for the file model_hallucination_cli-1.0.0-py3-none-any.whl.

File metadata

Download URL: model_hallucination_cli-1.0.0-py3-none-any.whl
Upload date: Nov 30, 2024
Size: 6.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for model_hallucination_cli-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4a9369a578513a5f7205397367071bc5276c7df28af47c56db6b4ef4214fb13`
MD5	`d6241b8c7ef020589d98dd80fd277fac`
BLAKE2b-256	`3bcc4b72e0425ec3d33f341dbd729a3818661eeb9abfbedb20b59da96372c6da`

See more details on using hashes here.

model-hallucination-cli 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Model Hallucination Evaluator CLI

Features

Installation

From PyPI

From Source

Usage

Available Commands

Evaluate Dataset

Arguments

Example Usage

Supported Datasets

1. FEVER

2. SimpleQuestions (SimpQ)

3. TruthfulQA

4. FactCC

Example Output

Development

Testing

Building the Package

Publishing to PyPI

Contributing

License

Acknowledgments

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes