Skip to main content

Python SDK for HallucinationGuard-Env — evaluate and train LLMs to avoid hallucination using OpenEnv

Project description

openenv-halluguard

Python SDK for HallucinationGuard-Env — evaluate and train any LLM to avoid hallucinations using OpenEnv.

PyPI License OpenEnv

Install

pip install openenv-halluguard

Quick Start

from openenv_halluguard import HallucinationGuardEnv

# Define your model function
def my_model(question, context):
    # Call your LLM here — must answer from context only
    return "your answer based on context"

# Evaluate
env = HallucinationGuardEnv()
results = env.evaluate(my_model, episodes=5, model_name="my-model")
env.print_report(results)

What It Does

Connects to the live HallucinationGuard-Env — an OpenEnv RL environment with 100,000+ QA examples across 15 real-world datasets. Your model is scored on:

Component Weight
Factual correctness 30%
Source grounding 20%
Citation accuracy 15%
Confidence calibration 15%
Semantic consistency 10%
Hallucination penalty 10%

Works With Any LLM

# OpenAI
import openai
client = openai.OpenAI(api_key="sk-...")

def openai_model(question, context):
    r = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {question}\n\nAnswer from context only."}]
    )
    return r.choices[0].message.content

env = HallucinationGuardEnv()
results = env.evaluate(openai_model, episodes=5, model_name="gpt-4")
# Anthropic Claude
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")

def claude_model(question, context):
    msg = client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=256,
        messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {question}\n\nAnswer from context only."}]
    )
    return msg.content[0].text

env = HallucinationGuardEnv()
results = env.evaluate(claude_model, episodes=5, model_name="claude-haiku")
# Groq (fast inference)
from groq import Groq
client = Groq(api_key="gsk_...")

def groq_model(question, context):
    r = client.chat.completions.create(
        model="llama3-8b-8192",
        messages=[{"role": "user", "content": f"Context: {context}\n\nQuestion: {question}\n\nAnswer from context only."}]
    )
    return r.choices[0].message.content

env = HallucinationGuardEnv()
results = env.evaluate(groq_model, episodes=5, model_name="llama3-groq")

Optional Dependencies

pip install openenv-halluguard[openai]      # OpenAI
pip install openenv-halluguard[anthropic]   # Anthropic Claude
pip install openenv-halluguard[groq]        # Groq

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openenv_halluguard-1.0.0.tar.gz (2.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openenv_halluguard-1.0.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file openenv_halluguard-1.0.0.tar.gz.

File metadata

  • Download URL: openenv_halluguard-1.0.0.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for openenv_halluguard-1.0.0.tar.gz
Algorithm Hash digest
SHA256 febf3f606327492a82b58f8178565f975a0340b88cda003bc82f5b325f668b74
MD5 4b3981e46b45daf659059a2e37fa71f1
BLAKE2b-256 4a376b891c187a00a6b2a233a8b6bfca9cda3e515829b5ba51adcdc934db546d

See more details on using hashes here.

File details

Details for the file openenv_halluguard-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for openenv_halluguard-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b13947c57c6a7e058ea0427746f0b6d874e4fb76f3671ea1039b6fe88d34114d
MD5 1787f3dee5b48a73836ec65e07aed57d
BLAKE2b-256 c910abfeb713e949758f1b3c2ee5591f1c54abfbf8f4f9d0d75768f7d240407a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page