Skip to main content

Agentic AI Evaluation reference implementation from Vector Institute AI Engineering

Project description

aieng-eval-agents

PyPI Python License

Shared library for Vector Institute's Agentic AI Evaluation Bootcamp. Provides reusable components for building, running, and evaluating LLM agents with Google ADK and Langfuse.

What's included

Agent implementations

Module Description
aieng.agent_evals.knowledge_qa ReAct agent that answers questions using live web search. Includes evaluation against the DeepSearchQA benchmark with LLM-as-a-judge metrics (precision/recall/F1).
aieng.agent_evals.aml_investigation Agent that investigates Anti-Money Laundering cases by querying a SQLite database of financial transactions via a read-only SQL tool.
aieng.agent_evals.report_generation Agent that generates structured Excel reports from a relational database based on natural language queries.

Reusable tools (aieng.agent_evals.tools)

  • search — Google Search with response grounding and citations
  • web — HTML and PDF content fetching
  • file — Download and search data files (CSV, XLSX, JSON)
  • sql_database — Read-only SQL database access via ReadOnlySqlDatabase

Evaluation harness (aieng.agent_evals.evaluation)

Wrappers around Langfuse for running agent experiments:

  • run_experiment — Run a dataset through an agent and score outputs
  • run_experiment_with_trace_evals — Run experiments with trace-level evaluation
  • run_trace_evaluations — Score existing Langfuse traces with LLM-based or heuristic graders

Utilities

  • display — Rich-based terminal and Jupyter display helpers for metrics and agent responses
  • progress — Progress tracking for batch evaluation runs
  • configs — Pydantic-based configuration loading from .env
  • langfuse — Langfuse client and trace utilities
  • db_manager — Database connection management

Installation

pip install aieng-eval-agents

Requires Python 3.12+.

Source

Full reference implementations and documentation are in the eval-agents repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aieng_eval_agents-0.2.1.tar.gz (125.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aieng_eval_agents-0.2.1-py3-none-any.whl (163.9 kB view details)

Uploaded Python 3

File details

Details for the file aieng_eval_agents-0.2.1.tar.gz.

File metadata

  • Download URL: aieng_eval_agents-0.2.1.tar.gz
  • Upload date:
  • Size: 125.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aieng_eval_agents-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3351bae4ad7d61778088c1b9e31745d867cea1145627f90cc44371ed48bccfbf
MD5 ff96199fd14b524ef3ac60f2da86b901
BLAKE2b-256 18933ccea44939d94f7b63ae7e989dc185e26ad81a209894b7487180b52e9f14

See more details on using hashes here.

File details

Details for the file aieng_eval_agents-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for aieng_eval_agents-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b5ed9060fd729a51f3b3d6f68b66ceff5f0f171919fe08e01d9ae0ea6e9db18d
MD5 cb6297bab5ebe39ecc22b7a74679bf19
BLAKE2b-256 3efe98b22a0d572ee1e1e7455d6a023d870ec942d8c5fb45f7700d172dfbc6bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page