Skip to main content

Agentic AI Evaluation reference implementation from Vector Institute AI Engineering

Project description

aieng-eval-agents

PyPI Python License

Shared library for Vector Institute's Agentic AI Evaluation Bootcamp. Provides reusable components for building, running, and evaluating LLM agents with Google ADK and Langfuse.

What's included

Agent implementations

Module Description
aieng.agent_evals.knowledge_qa ReAct agent that answers questions using live web search. Includes evaluation against the DeepSearchQA benchmark with LLM-as-a-judge metrics (precision/recall/F1).
aieng.agent_evals.aml_investigation Agent that investigates Anti-Money Laundering cases by querying a SQLite database of financial transactions via a read-only SQL tool.
aieng.agent_evals.report_generation Agent that generates structured Excel reports from a relational database based on natural language queries.

Reusable tools (aieng.agent_evals.tools)

  • search — Google Search with response grounding and citations
  • web — HTML and PDF content fetching
  • file — Download and search data files (CSV, XLSX, JSON)
  • sql_database — Read-only SQL database access via ReadOnlySqlDatabase

Evaluation harness (aieng.agent_evals.evaluation)

Wrappers around Langfuse for running agent experiments:

  • run_experiment — Run a dataset through an agent and score outputs
  • run_experiment_with_trace_evals — Run experiments with trace-level evaluation
  • run_trace_evaluations — Score existing Langfuse traces with LLM-based or heuristic graders

Utilities

  • display — Rich-based terminal and Jupyter display helpers for metrics and agent responses
  • progress — Progress tracking for batch evaluation runs
  • configs — Pydantic-based configuration loading from .env
  • langfuse — Langfuse client and trace utilities
  • db_manager — Database connection management

Installation

pip install aieng-eval-agents

Requires Python 3.12+.

Source

Full reference implementations and documentation are in the eval-agents repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aieng_eval_agents-0.3.0.tar.gz (153.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aieng_eval_agents-0.3.0-py3-none-any.whl (196.3 kB view details)

Uploaded Python 3

File details

Details for the file aieng_eval_agents-0.3.0.tar.gz.

File metadata

  • Download URL: aieng_eval_agents-0.3.0.tar.gz
  • Upload date:
  • Size: 153.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aieng_eval_agents-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5343f781ff75fa4ea911f58d8c828a67d77d5bf37732532071e8b8eeff73d83a
MD5 ed03764e5cf4031e1025665ed2d6333f
BLAKE2b-256 c702115e57eb0d03d07a081c3bbd3f6e7dc5884fa11a1e1228ad1518ef97a280

See more details on using hashes here.

File details

Details for the file aieng_eval_agents-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aieng_eval_agents-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a883eca75f1ed76c0c8b9143ea7969508587e8a35ee6d6029b0780efe113d72
MD5 c1823649ca0af7da8eb5093d43643189
BLAKE2b-256 52533240460c29f5b45f096739b203f6ebf0a07f967df9930fcaa5c1971d432a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page