Skip to main content

A CLI for checking model hallucinations using Hugging Face datasets.

Project description


Model Hallucination Evaluator CLI

A Python CLI tool for evaluating model hallucinations on various datasets, such as FEVER, SimpleQuestions (SimpQ), TruthfulQA, and FactCC, using the Entropix API and the Hugging Face ecosystem. This tool is designed to help researchers and developers assess the factual accuracy of their models and identify potential hallucinations.


Features

  • Evaluate hallucination rates for local or hosted models.
  • Supports multiple datasets:
    • FEVER: Fact verification claims and evidence dataset.
    • SimpleQuestions (SimpQ): Fact-based QA dataset.
    • TruthfulQA: Evaluates truthfulness of models.
    • FactCC: Factual consistency in summarization.
  • Beautiful and informative CLI interface using rich.
  • Customizable dataset size for quick evaluations.
  • Hugging Face API integration for user-friendly evaluation.

Installation

From PyPI

Install the package directly from PyPI:

pip install model-hallucination

From Source

  1. Clone the repository:

    git clone https://github.com/abhijit-without-h/model-hallucination.git
    cd model-hallucination
    
  2. Install the dependencies:

    pip install -r requirements.txt
    
  3. Install the package:

    pip install .
    

Usage

The CLI provides a simple interface for evaluating models on supported datasets.

Available Commands

Evaluate Dataset

Evaluate a model’s hallucination likelihood on a specific dataset.

model-hallucination evaluate-dataset --dataset DATASET_NAME --max-samples SAMPLE_COUNT --api-key YOUR_API_KEY

Arguments

Argument Description
--dataset Dataset to evaluate: fever, simpq, truthfulqa, or factcc.
--max-samples Maximum number of samples to evaluate (default: 100).
--api-key Your Hugging Face API key.

Example Usage

  1. Evaluate FEVER dataset:

    model-hallucination evaluate-dataset --dataset fever --max-samples 50 --api-key hf_your_api_key
    
  2. Evaluate TruthfulQA dataset:

    model-hallucination evaluate-dataset --dataset truthfulqa --max-samples 100 --api-key hf_your_api_key
    

Supported Datasets

1. FEVER

  • Fact verification dataset.
  • Contains claims and corresponding evidence.

2. SimpleQuestions (SimpQ)

  • Fact-based QA dataset.
  • Includes simple questions with a single fact-based answer.

3. TruthfulQA

  • Evaluates model truthfulness.
  • Questions designed to test a model’s ability to avoid generating false or misleading information.

4. FactCC

  • Evaluates factual consistency in text summarization.
  • Includes claim-evidence pairs.

Example Output

When evaluating a dataset, the CLI will display results in a table format:

Loading truthfulqa dataset...
Evaluating 50 samples...

Hallucination Evaluation
+-----------------------------+-----------------------+--------------------+
| Input                       | Reference            | Hallucination Score|
+-----------------------------+-----------------------+--------------------+
| What is the capital of Mars?| None (fictional)     | 0.95               |
| What is 2 + 2?              | 4                    | 0.01               |
+-----------------------------+-----------------------+--------------------+

Development

Testing

Run unit tests using pytest:

pytest tests/

Building the Package

To build the package for distribution:

python setup.py sdist bdist_wheel

Publishing to PyPI

Upload the package to PyPI:

twine upload dist/*

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature:
    git checkout -b feature-name
    
  3. Commit your changes:
    git commit -m "Add your message here"
    
  4. Push to your branch:
    git push origin feature-name
    
  5. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

  • Hugging Face for their datasets library.
  • Entropix API for enabling hallucination evaluation.
  • Open-source contributors for inspiration and guidance.

Contact

For support or feedback, open an issue on the GitHub repository or email abhijitsr92@gmail.com.


Let me know if you'd like any modifications or additions to this! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model_hallucination_cli-1.0.0.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

model_hallucination_cli-1.0.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file model_hallucination_cli-1.0.0.tar.gz.

File metadata

  • Download URL: model_hallucination_cli-1.0.0.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for model_hallucination_cli-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4568d50c5e6abdb59bb069e06c16799a8c05f7f838cd48b84e4e5ed82ca6bd8e
MD5 f5cdb3067006bd19998ad2d6cbd29a50
BLAKE2b-256 08a6603d962ea495b9bd3917dd64f8ca75152dadee3364faa3dad946628c0aff

See more details on using hashes here.

File details

Details for the file model_hallucination_cli-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for model_hallucination_cli-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4a9369a578513a5f7205397367071bc5276c7df28af47c56db6b4ef4214fb13
MD5 d6241b8c7ef020589d98dd80fd277fac
BLAKE2b-256 3bcc4b72e0425ec3d33f341dbd729a3818661eeb9abfbedb20b59da96372c6da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page