Skip to main content

Domain-Aware Evaluation Framework — intelligent LLM evaluation with domain-specific metrics

Project description

DAEF — Domain-Aware Evaluation Framework

Intelligent LLM evaluation that adapts to your domain. Instead of generic metrics, DAEF researches your domain, understands industry-specific requirements, and generates tailored evaluation criteria using a multi-agent pipeline powered by Google's Agent Development Kit.

Installation

pip install daef

Quick Start

import asyncio
from daef import DAEFClient, EvaluationRequest

client = DAEFClient(api_key="YOUR_GOOGLE_API_KEY")

request = EvaluationRequest(
    domain="Healthcare",
    task_description="Medical Q&A chatbot answering patient questions",
    task_type="single_call",
    focus_areas=["Security and Guardrails", "Content Generation Quality"],
    prompt="What is the recommended dosage of ibuprofen for adults?",
    llm_output="The standard adult dose of ibuprofen is 200-400mg every 4-6 hours...",
)

result = asyncio.run(client.evaluate(request))
print(f"Overall Score: {result.overall_score}/100")
for metric in result.metrics:
    print(f"  {metric.metric_name}: {metric.score}/100")

Features

  • Domain-aware: Researches domain-specific standards (Healthcare, Legal, Finance, Education, and more)
  • Adaptive metrics: Selects from 30+ built-in metrics and supports custom metrics
  • Task-type support: RAG, Fine-tuning, Single LLM Call
  • Version comparison: Compare two evaluation results to understand regressions and improvements
  • Async-first: Built on asyncio for high-throughput pipelines

Configuration

Set your Google API key via environment variable or constructor:

export GOOGLE_API_KEY="your-key"
# Or pass directly
client = DAEFClient(api_key="your-key")

EvaluationRequest Fields

Field Type Required Description
domain str Yes Domain (e.g. "Healthcare", "Finance")
task_description str Yes What the LLM task does
task_type str Yes "rag", "tuning", or "single_call"
focus_areas list[str] No Up to 3 priority areas
prompt str No The input prompt sent to the LLM
llm_output str No The LLM's response to evaluate
context_data str No Retrieved context (for RAG)
mandatory_metrics list[str] No Metrics that must be included
avoided_metrics list[str] No Metrics to exclude
custom_metrics list[str] No User-defined metric names

Version Comparison

result_v1 = asyncio.run(client.evaluate(request_v1))
result_v2 = asyncio.run(client.evaluate(request_v2))

comparison = asyncio.run(client.compare(result_v1, result_v2))
print(f"Overall change: {comparison.overall_change}")
print(f"Summary: {comparison.summary}")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daef-0.1.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daef-0.1.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file daef-0.1.0.tar.gz.

File metadata

  • Download URL: daef-0.1.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for daef-0.1.0.tar.gz
Algorithm Hash digest
SHA256 df2468b8f39ff97e6e67c67f8a97ec3bda8fe9df54c6b390ed996ee10a791c23
MD5 edd9b61a0e9e66654453a5ea76ca1150
BLAKE2b-256 acf1a7ea66ffeb125139e48a8d5a406ee7d0b765a94abf9108c4c8e8927b8203

See more details on using hashes here.

File details

Details for the file daef-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: daef-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for daef-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f722287d2b2276b5572d2f6ca8b6e3e4cac6eb9658fc878187b0eab6eb5e1788
MD5 53ba91afba6817fb86d7a7f2b015ea5c
BLAKE2b-256 ae8569fcdec3b996514568e4c99e27255a28e622ffe96db2dba860c25c80e36e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page