Domain-Aware Evaluation Framework — intelligent LLM evaluation with domain-specific metrics
Project description
DAEF — Domain-Aware Evaluation Framework
Intelligent LLM evaluation that adapts to your domain. Instead of generic metrics, DAEF researches your domain, understands industry-specific requirements, and generates tailored evaluation criteria using a multi-agent pipeline powered by Google's Agent Development Kit.
Installation
pip install daef
Quick Start
import asyncio
from daef import DAEFClient, EvaluationRequest
client = DAEFClient(api_key="YOUR_GOOGLE_API_KEY")
request = EvaluationRequest(
domain="Healthcare",
task_description="Medical Q&A chatbot answering patient questions",
task_type="single_call",
focus_areas=["Security and Guardrails", "Content Generation Quality"],
prompt="What is the recommended dosage of ibuprofen for adults?",
llm_output="The standard adult dose of ibuprofen is 200-400mg every 4-6 hours...",
)
result = asyncio.run(client.evaluate(request))
print(f"Overall Score: {result.overall_score}/100")
for metric in result.metrics:
print(f" {metric.metric_name}: {metric.score}/100")
Features
- Domain-aware: Researches domain-specific standards (Healthcare, Legal, Finance, Education, and more)
- Adaptive metrics: Selects from 30+ built-in metrics and supports custom metrics
- Task-type support: RAG, Fine-tuning, Single LLM Call
- Version comparison: Compare two evaluation results to understand regressions and improvements
- Async-first: Built on asyncio for high-throughput pipelines
Configuration
Set your Google API key via environment variable or constructor:
export GOOGLE_API_KEY="your-key"
# Or pass directly
client = DAEFClient(api_key="your-key")
EvaluationRequest Fields
| Field | Type | Required | Description |
|---|---|---|---|
domain |
str | Yes | Domain (e.g. "Healthcare", "Finance") |
task_description |
str | Yes | What the LLM task does |
task_type |
str | Yes | "rag", "tuning", or "single_call" |
focus_areas |
list[str] | No | Up to 3 priority areas |
prompt |
str | No | The input prompt sent to the LLM |
llm_output |
str | No | The LLM's response to evaluate |
context_data |
str | No | Retrieved context (for RAG) |
mandatory_metrics |
list[str] | No | Metrics that must be included |
avoided_metrics |
list[str] | No | Metrics to exclude |
custom_metrics |
list[str] | No | User-defined metric names |
Version Comparison
result_v1 = asyncio.run(client.evaluate(request_v1))
result_v2 = asyncio.run(client.evaluate(request_v2))
comparison = asyncio.run(client.compare(result_v1, result_v2))
print(f"Overall change: {comparison.overall_change}")
print(f"Summary: {comparison.summary}")
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file daef-0.1.0.tar.gz.
File metadata
- Download URL: daef-0.1.0.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df2468b8f39ff97e6e67c67f8a97ec3bda8fe9df54c6b390ed996ee10a791c23
|
|
| MD5 |
edd9b61a0e9e66654453a5ea76ca1150
|
|
| BLAKE2b-256 |
acf1a7ea66ffeb125139e48a8d5a406ee7d0b765a94abf9108c4c8e8927b8203
|
File details
Details for the file daef-0.1.0-py3-none-any.whl.
File metadata
- Download URL: daef-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f722287d2b2276b5572d2f6ca8b6e3e4cac6eb9658fc878187b0eab6eb5e1788
|
|
| MD5 |
53ba91afba6817fb86d7a7f2b015ea5c
|
|
| BLAKE2b-256 |
ae8569fcdec3b996514568e4c99e27255a28e622ffe96db2dba860c25c80e36e
|