Skip to main content

Open-source AI evaluation toolkit — hallucination detection, safety, industry-specific evals

Project description

syncreus-eval

Open-source AI evaluation toolkit for hallucination detection, safety scanning, bias analysis, and industry-specific evals. Runs locally without the Syncreus platform.

Installation

# Core (LLM-as-judge evaluators via Gemini)
pip install syncreus-eval

# With optional extras
pip install syncreus-eval[accuracy]          # fastembed for semantic similarity
pip install syncreus-eval[safety]            # Presidio PII scanning
pip install syncreus-eval[prompt-injection]  # LLM Guard injection detection
pip install syncreus-eval[upload]            # Upload results to Syncreus platform
pip install syncreus-eval[all]               # Everything

Quick Start

from syncreus_eval import evaluate, EvalType

# Hallucination detection
result = evaluate(
    EvalType.HALLUCINATION,
    ai_input="The Eiffel Tower is in Paris, France. It was built in 1889.",
    ai_output="The Eiffel Tower is in Paris, France. It was built in 1889 and is 300 meters tall.",
    gemini_key="your-gemini-api-key",
)
print(result.passed)    # True/False/None
print(result.details)   # Claim-level verdicts

# Performance tracking (no LLM needed)
result = evaluate(
    EvalType.PERFORMANCE,
    trace={"latency_ms": 150, "token_count_input": 100, "token_count_output": 50},
)
print(result.details)   # {"latency_ms": 150, "total_tokens": 150, ...}

# Run multiple evals at once
results = evaluate(
    [EvalType.HALLUCINATION, EvalType.IDEOLOGY],
    ai_input="Context here",
    ai_output="Response here",
    gemini_key="your-key",
)
for r in results:
    print(f"{r.eval_type.value}: passed={r.passed}")

Evaluation Types

General Purpose

Type Description Requires
HALLUCINATION Detects unsupported factual claims Gemini API key
ACCURACY Golden dataset comparison via semantic similarity [accuracy] extra
CONSISTENCY Pairwise similarity across repeated prompts [accuracy] extra
PERFORMANCE Extracts latency, tokens, cost metrics Nothing
AGENT_TASK Verifies agent completion claim honesty Gemini API key
REGRESSION Baseline comparison (platform only) Syncreus platform

Safety & Compliance

Type Description Requires
SAFETY PII/sensitive data detection + content safety [safety] extra
BIAS Demographic parity / EEOC four-fifths rule Nothing
IDEOLOGY Political neutrality (OMB M-26-04) Gemini API key
PROMPT_INJECTION Injection attempt detection [prompt-injection] extra

Industry-Specific

Type Description Requires
HEALTHCARE Medical accuracy, drug safety, PHI detection Gemini API key
LEGAL Citation validity, holding fidelity Gemini API key
FINANCE Regulatory accuracy, numerical precision Gemini API key
CODE_ACCURACY API existence, function signatures Gemini API key

API Reference

evaluate()

from syncreus_eval import evaluate, EvalType

result = evaluate(
    eval_type=EvalType.HALLUCINATION,  # or a list of types
    ai_input="...",
    ai_output="...",
    gemini_key="...",                  # or set GEMINI_API_KEY env var
    # Accuracy-specific:
    test_cases=[{"input_text": "...", "expected_output": "..."}],
    threshold=0.85,
    # Performance-specific:
    trace={"latency_ms": 100, ...},
    # Agent task-specific:
    verification_result="exit code 0",
    # Bias-specific:
    traces=[{"metadata": {"demographic_group": "A"}, "passed": True}],
    # Consistency-specific:
    outputs=["response1", "response2", "response3"],
    # Safety-specific:
    entity_whitelist=["aspirin"],
    enable_gemini_content_safety=True,
)

Returns an EvalResult (or list of them):

class EvalResult:
    eval_type: EvalType
    passed: bool | None      # True/False/None (None = error or skipped)
    score: float | None       # Numeric score where applicable
    details: dict[str, Any]   # Evaluator-specific details
    error: bool               # Whether an error occurred
    error_message: str | None # Error description

upload_results() (optional)

from syncreus_eval import upload_results

upload_results(
    results=result,           # EvalResult or list
    api_key="syn_...",        # Syncreus API key
    endpoint="https://api.syncreus.com",
    trace_id="trace-123",     # optional
)

Requires: pip install syncreus-eval[upload]

Environment Variables

Variable Description
GEMINI_API_KEY Google Gemini API key for LLM-as-judge evaluators

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syncreus_eval-0.1.0.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syncreus_eval-0.1.0-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file syncreus_eval-0.1.0.tar.gz.

File metadata

  • Download URL: syncreus_eval-0.1.0.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for syncreus_eval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4b6f838e72478f56362fa60016e1e985aa037cbf89a40066e6bace79685083ba
MD5 8a57fe1199e404ee06a9b3563d08b5e6
BLAKE2b-256 65d420b067816ee23132856a249f6c07d86b2dc43171df24bf44a8bed9b852b8

See more details on using hashes here.

File details

Details for the file syncreus_eval-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: syncreus_eval-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for syncreus_eval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a10ed3d301bc979d84bed83e718bde0cf47225ba6bded5d35c5ed2a458370c0
MD5 7d4c1c099b39b31a99fbcb5fa2434a5e
BLAKE2b-256 d0c200e67b7b22fa75b3d69b3facb915b924c72295e8a2e07b9605e8adff1922

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page