Agentic AI Evaluation reference implementation from Vector Institute AI Engineering
Project description
aieng-eval-agents
Shared library for Vector Institute's Agentic AI Evaluation Bootcamp. Provides reusable components for building, running, and evaluating LLM agents with Google ADK and Langfuse.
What's included
Agent implementations
| Module | Description |
|---|---|
aieng.agent_evals.knowledge_qa |
ReAct agent that answers questions using live web search. Includes evaluation against the DeepSearchQA benchmark with LLM-as-a-judge metrics (precision/recall/F1). |
aieng.agent_evals.aml_investigation |
Agent that investigates Anti-Money Laundering cases by querying a SQLite database of financial transactions via a read-only SQL tool. |
aieng.agent_evals.report_generation |
Agent that generates structured Excel reports from a relational database based on natural language queries. |
Reusable tools (aieng.agent_evals.tools)
search— Google Search with response grounding and citationsweb— HTML and PDF content fetchingfile— Download and search data files (CSV, XLSX, JSON)sql_database— Read-only SQL database access viaReadOnlySqlDatabase
Evaluation harness (aieng.agent_evals.evaluation)
Wrappers around Langfuse for running agent experiments:
run_experiment— Run a dataset through an agent and score outputsrun_experiment_with_trace_evals— Run experiments with trace-level evaluationrun_trace_evaluations— Score existing Langfuse traces with LLM-based or heuristic graders
Utilities
display— Rich-based terminal and Jupyter display helpers for metrics and agent responsesprogress— Progress tracking for batch evaluation runsconfigs— Pydantic-based configuration loading from.envlangfuse— Langfuse client and trace utilitiesdb_manager— Database connection management
Installation
pip install aieng-eval-agents
Requires Python 3.12+.
Source
Full reference implementations and documentation are in the eval-agents repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aieng_eval_agents-0.2.1.tar.gz.
File metadata
- Download URL: aieng_eval_agents-0.2.1.tar.gz
- Upload date:
- Size: 125.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3351bae4ad7d61778088c1b9e31745d867cea1145627f90cc44371ed48bccfbf
|
|
| MD5 |
ff96199fd14b524ef3ac60f2da86b901
|
|
| BLAKE2b-256 |
18933ccea44939d94f7b63ae7e989dc185e26ad81a209894b7487180b52e9f14
|
File details
Details for the file aieng_eval_agents-0.2.1-py3-none-any.whl.
File metadata
- Download URL: aieng_eval_agents-0.2.1-py3-none-any.whl
- Upload date:
- Size: 163.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5ed9060fd729a51f3b3d6f68b66ceff5f0f171919fe08e01d9ae0ea6e9db18d
|
|
| MD5 |
cb6297bab5ebe39ecc22b7a74679bf19
|
|
| BLAKE2b-256 |
3efe98b22a0d572ee1e1e7455d6a023d870ec942d8c5fb45f7700d172dfbc6bb
|