Skip to main content

Domain-Aware Evaluation Framework — intelligent LLM evaluation with domain-specific metrics

Project description

DAEF — Domain-Aware Evaluation Framework

Intelligent LLM evaluation that adapts to your domain. Instead of generic metrics, DAEF researches your domain, understands industry-specific requirements, and generates tailored evaluation criteria using a multi-agent pipeline powered by Google's Agent Development Kit.

Installation

pip install daef

Quick Start

import asyncio
from daef import DAEFClient, EvaluationRequest

client = DAEFClient(api_key="YOUR_GOOGLE_API_KEY")

request = EvaluationRequest(
    domain="Healthcare",
    task_description="Medical Q&A chatbot answering patient questions",
    task_type="single_call",
    focus_areas=["Security and Guardrails", "Content Generation Quality"],
    prompt="What is the recommended dosage of ibuprofen for adults?",
    llm_output="The standard adult dose of ibuprofen is 200-400mg every 4-6 hours...",
)

result = asyncio.run(client.evaluate(request))
print(f"Overall Score: {result.overall_score}/100")
for metric in result.metrics:
    print(f"  {metric.metric_name}: {metric.score}/100")

Features

  • Domain-aware: Researches domain-specific standards (Healthcare, Legal, Finance, Education, and more)
  • Adaptive metrics: Selects from 30+ built-in metrics and supports custom metrics
  • Task-type support: RAG, Fine-tuning, Single LLM Call
  • Version comparison: Compare two evaluation results to understand regressions and improvements
  • Async-first: Built on asyncio for high-throughput pipelines

Configuration

Set your Google API key via environment variable or constructor:

export GOOGLE_API_KEY="your-key"
# Or pass directly
client = DAEFClient(api_key="your-key")

EvaluationRequest Fields

Field Type Required Description
domain str Yes Domain (e.g. "Healthcare", "Finance")
task_description str Yes What the LLM task does
task_type str Yes "rag", "tuning", or "single_call"
focus_areas list[str] No Up to 3 priority areas
prompt str No The input prompt sent to the LLM
llm_output str No The LLM's response to evaluate
context_data str No Retrieved context (for RAG)
mandatory_metrics list[str] No Metrics that must be included
avoided_metrics list[str] No Metrics to exclude
custom_metrics list[str] No User-defined metric names

Version Comparison

result_v1 = asyncio.run(client.evaluate(request_v1))
result_v2 = asyncio.run(client.evaluate(request_v2))

comparison = asyncio.run(client.compare(result_v1, result_v2))
print(f"Overall change: {comparison.overall_change}")
print(f"Summary: {comparison.summary}")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daef-0.1.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daef-0.1.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file daef-0.1.1.tar.gz.

File metadata

  • Download URL: daef-0.1.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for daef-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d4134d881f4167d88bc82954c2ed4fca094ded468db625102ab6e68ce7a0946e
MD5 55f51302e5cbd44f43ea92ac68a05995
BLAKE2b-256 2e0103e2ada56c8ecee7b6518014722075de2a9b9e260963726b41664d039f51

See more details on using hashes here.

File details

Details for the file daef-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: daef-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for daef-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9936515c7ad3a8214a0aef08b687fab713a1e95ce1a54f4a022fb2879cec969a
MD5 7b68dfea556c61497cbeb72952f38d3e
BLAKE2b-256 2186360512c5cb7432aa217fbce8a953bf733786089b3b4d1ac8e61186f61d73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page