Python SDK for HallucinationGuard-Env — evaluate and train LLMs to avoid hallucination using OpenEnv
Project description
openenv-halluguard
Python SDK for HallucinationGuard-Env — evaluate and train any LLM to avoid hallucinations using OpenEnv.
Install
pip install openenv-halluguard
Quick Start
from openenv_halluguard import HallucinationGuardEnv
# Define your model function
def my_model(question, context):
# Call your LLM here — must answer from context only
return "your answer based on context"
# Evaluate
env = HallucinationGuardEnv()
results = env.evaluate(my_model, episodes=5, model_name="my-model")
env.print_report(results)
What It Does
Connects to the live HallucinationGuard-Env — an OpenEnv RL environment with 100,000+ QA examples across 15 real-world datasets. Your model is scored on:
| Component | Weight |
|---|---|
| Factual correctness | 30% |
| Source grounding | 20% |
| Citation accuracy | 15% |
| Confidence calibration | 15% |
| Semantic consistency | 10% |
| Hallucination penalty | 10% |
Works With Any LLM
# OpenAI
import openai
client = openai.OpenAI(api_key="sk-...")
def openai_model(question, context):
r = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {question}\n\nAnswer from context only."}]
)
return r.choices[0].message.content
env = HallucinationGuardEnv()
results = env.evaluate(openai_model, episodes=5, model_name="gpt-4")
# Anthropic Claude
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")
def claude_model(question, context):
msg = client.messages.create(
model="claude-3-haiku-20240307",
max_tokens=256,
messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {question}\n\nAnswer from context only."}]
)
return msg.content[0].text
env = HallucinationGuardEnv()
results = env.evaluate(claude_model, episodes=5, model_name="claude-haiku")
# Groq (fast inference)
from groq import Groq
client = Groq(api_key="gsk_...")
def groq_model(question, context):
r = client.chat.completions.create(
model="llama3-8b-8192",
messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {question}\n\nAnswer from context only."}]
)
return r.choices[0].message.content
env = HallucinationGuardEnv()
results = env.evaluate(groq_model, episodes=5, model_name="llama3-groq")
Optional Dependencies
pip install openenv-halluguard[openai] # OpenAI
pip install openenv-halluguard[anthropic] # Anthropic Claude
pip install openenv-halluguard[groq] # Groq
Links
- 🤗 Live Environment: https://huggingface.co/spaces/SamSankar/hallucination-guard-env
- 📖 API Docs: https://samsankar-hallucination-guard-env.hf.space/docs
- 📦 OpenEnv: https://github.com/meta-pytorch/OpenEnv
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openenv_halluguard-1.0.0.tar.gz.
File metadata
- Download URL: openenv_halluguard-1.0.0.tar.gz
- Upload date:
- Size: 2.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
febf3f606327492a82b58f8178565f975a0340b88cda003bc82f5b325f668b74
|
|
| MD5 |
4b3981e46b45daf659059a2e37fa71f1
|
|
| BLAKE2b-256 |
4a376b891c187a00a6b2a233a8b6bfca9cda3e515829b5ba51adcdc934db546d
|
File details
Details for the file openenv_halluguard-1.0.0-py3-none-any.whl.
File metadata
- Download URL: openenv_halluguard-1.0.0-py3-none-any.whl
- Upload date:
- Size: 3.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b13947c57c6a7e058ea0427746f0b6d874e4fb76f3671ea1039b6fe88d34114d
|
|
| MD5 |
1787f3dee5b48a73836ec65e07aed57d
|
|
| BLAKE2b-256 |
c910abfeb713e949758f1b3c2ee5591f1c54abfbf8f4f9d0d75768f7d240407a
|