Open-source AI evaluation toolkit — hallucination detection, safety, industry-specific evals
Project description
syncreus-eval
Open-source AI evaluation toolkit for hallucination detection, safety scanning, bias analysis, and industry-specific evals. Runs locally without the Syncreus platform.
Installation
# Core (LLM-as-judge evaluators via Gemini)
pip install syncreus-eval
# With optional extras
pip install syncreus-eval[accuracy] # fastembed for semantic similarity
pip install syncreus-eval[safety] # Presidio PII scanning
pip install syncreus-eval[prompt-injection] # LLM Guard injection detection
pip install syncreus-eval[upload] # Upload results to Syncreus platform
pip install syncreus-eval[all] # Everything
Quick Start
from syncreus_eval import evaluate, EvalType
# Hallucination detection
result = evaluate(
EvalType.HALLUCINATION,
ai_input="The Eiffel Tower is in Paris, France. It was built in 1889.",
ai_output="The Eiffel Tower is in Paris, France. It was built in 1889 and is 300 meters tall.",
gemini_key="your-gemini-api-key",
)
print(result.passed) # True/False/None
print(result.details) # Claim-level verdicts
# Performance tracking (no LLM needed)
result = evaluate(
EvalType.PERFORMANCE,
trace={"latency_ms": 150, "token_count_input": 100, "token_count_output": 50},
)
print(result.details) # {"latency_ms": 150, "total_tokens": 150, ...}
# Run multiple evals at once
results = evaluate(
[EvalType.HALLUCINATION, EvalType.IDEOLOGY],
ai_input="Context here",
ai_output="Response here",
gemini_key="your-key",
)
for r in results:
print(f"{r.eval_type.value}: passed={r.passed}")
Evaluation Types
General Purpose
| Type | Description | Requires |
|---|---|---|
HALLUCINATION |
Detects unsupported factual claims | Gemini API key |
ACCURACY |
Golden dataset comparison via semantic similarity | [accuracy] extra |
CONSISTENCY |
Pairwise similarity across repeated prompts | [accuracy] extra |
PERFORMANCE |
Extracts latency, tokens, cost metrics | Nothing |
AGENT_TASK |
Verifies agent completion claim honesty | Gemini API key |
REGRESSION |
Baseline comparison (platform only) | Syncreus platform |
Safety & Compliance
| Type | Description | Requires |
|---|---|---|
SAFETY |
PII/sensitive data detection + content safety | [safety] extra |
BIAS |
Demographic parity / EEOC four-fifths rule | Nothing |
IDEOLOGY |
Political neutrality (OMB M-26-04) | Gemini API key |
PROMPT_INJECTION |
Injection attempt detection | [prompt-injection] extra |
Industry-Specific
| Type | Description | Requires |
|---|---|---|
HEALTHCARE |
Medical accuracy, drug safety, PHI detection | Gemini API key |
LEGAL |
Citation validity, holding fidelity | Gemini API key |
FINANCE |
Regulatory accuracy, numerical precision | Gemini API key |
CODE_ACCURACY |
API existence, function signatures | Gemini API key |
API Reference
evaluate()
from syncreus_eval import evaluate, EvalType
result = evaluate(
eval_type=EvalType.HALLUCINATION, # or a list of types
ai_input="...",
ai_output="...",
gemini_key="...", # or set GEMINI_API_KEY env var
# Accuracy-specific:
test_cases=[{"input_text": "...", "expected_output": "..."}],
threshold=0.85,
# Performance-specific:
trace={"latency_ms": 100, ...},
# Agent task-specific:
verification_result="exit code 0",
# Bias-specific:
traces=[{"metadata": {"demographic_group": "A"}, "passed": True}],
# Consistency-specific:
outputs=["response1", "response2", "response3"],
# Safety-specific:
entity_whitelist=["aspirin"],
enable_gemini_content_safety=True,
)
Returns an EvalResult (or list of them):
class EvalResult:
eval_type: EvalType
passed: bool | None # True/False/None (None = error or skipped)
score: float | None # Numeric score where applicable
details: dict[str, Any] # Evaluator-specific details
error: bool # Whether an error occurred
error_message: str | None # Error description
upload_results() (optional)
from syncreus_eval import upload_results
upload_results(
results=result, # EvalResult or list
api_key="syn_...", # Syncreus API key
endpoint="https://api.syncreus.com",
trace_id="trace-123", # optional
)
Requires: pip install syncreus-eval[upload]
Environment Variables
| Variable | Description |
|---|---|
GEMINI_API_KEY |
Google Gemini API key for LLM-as-judge evaluators |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file syncreus_eval-0.1.0.tar.gz.
File metadata
- Download URL: syncreus_eval-0.1.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b6f838e72478f56362fa60016e1e985aa037cbf89a40066e6bace79685083ba
|
|
| MD5 |
8a57fe1199e404ee06a9b3563d08b5e6
|
|
| BLAKE2b-256 |
65d420b067816ee23132856a249f6c07d86b2dc43171df24bf44a8bed9b852b8
|
File details
Details for the file syncreus_eval-0.1.0-py3-none-any.whl.
File metadata
- Download URL: syncreus_eval-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a10ed3d301bc979d84bed83e718bde0cf47225ba6bded5d35c5ed2a458370c0
|
|
| MD5 |
7d4c1c099b39b31a99fbcb5fa2434a5e
|
|
| BLAKE2b-256 |
d0c200e67b7b22fa75b3d69b3facb915b924c72295e8a2e07b9605e8adff1922
|