Skip to main content

Agentic AI Evaluation reference implementation from Vector Institute AI Engineering

Project description

aieng-eval-agents

PyPI Python License

Shared library for Vector Institute's Agentic AI Evaluation Bootcamp. Provides reusable components for building, running, and evaluating LLM agents with Google ADK and Langfuse.

What's included

Agent implementations

Module Description
aieng.agent_evals.knowledge_qa ReAct agent that answers questions using live web search. Includes evaluation against the DeepSearchQA benchmark with LLM-as-a-judge metrics (precision/recall/F1).
aieng.agent_evals.aml_investigation Agent that investigates Anti-Money Laundering cases by querying a SQLite database of financial transactions via a read-only SQL tool.
aieng.agent_evals.report_generation Agent that generates structured Excel reports from a relational database based on natural language queries.

Reusable tools (aieng.agent_evals.tools)

  • search — Google Search with response grounding and citations
  • web — HTML and PDF content fetching
  • file — Download and search data files (CSV, XLSX, JSON)
  • sql_database — Read-only SQL database access via ReadOnlySqlDatabase

Evaluation harness (aieng.agent_evals.evaluation)

Wrappers around Langfuse for running agent experiments:

  • run_experiment — Run a dataset through an agent and score outputs
  • run_experiment_with_trace_evals — Run experiments with trace-level evaluation
  • run_trace_evaluations — Score existing Langfuse traces with LLM-based or heuristic graders

Utilities

  • display — Rich-based terminal and Jupyter display helpers for metrics and agent responses
  • progress — Progress tracking for batch evaluation runs
  • configs — Pydantic-based configuration loading from .env
  • langfuse — Langfuse client and trace utilities
  • db_manager — Database connection management

Installation

pip install aieng-eval-agents

Requires Python 3.12+.

Source

Full reference implementations and documentation are in the eval-agents repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aieng_eval_agents-0.3.1.tar.gz (162.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aieng_eval_agents-0.3.1-py3-none-any.whl (209.1 kB view details)

Uploaded Python 3

File details

Details for the file aieng_eval_agents-0.3.1.tar.gz.

File metadata

  • Download URL: aieng_eval_agents-0.3.1.tar.gz
  • Upload date:
  • Size: 162.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for aieng_eval_agents-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d62f41ae92971e57a5ff3a01b5ce570413be1f3386e12a67d5e2adb356a0355f
MD5 be3920240a625f104276cf14c004f6c8
BLAKE2b-256 1f06e1d8765eea65b3d0209727573dbf36df3dd8596774e3e34b257cb889a9a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for aieng_eval_agents-0.3.1.tar.gz:

Publisher: publish.yml on VectorInstitute/eval-agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aieng_eval_agents-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for aieng_eval_agents-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 69367aa48b162a5f5e9c4728a0c37f9ddc134c21510737fbb305b9d8dbc6440d
MD5 c185f2266eb89c5365543416928f1827
BLAKE2b-256 b6721c0c6ed2a854571dc0aaecad36b76f4e12bbe78143f12ca3e92c7858c64f

See more details on using hashes here.

Provenance

The following attestation bundles were made for aieng_eval_agents-0.3.1-py3-none-any.whl:

Publisher: publish.yml on VectorInstitute/eval-agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page