Skip to main content

SAGE Benchmark Agent - Agent capability evaluation framework

Project description

SAGE Benchmark Agent

Configuration-driven experiment framework for evaluating agent capabilities.

Features

  • Tool Selection Evaluation: Tool retrieval and ranking benchmarks
  • Planning Evaluation: Multi-step planning with tool composition
  • Timing Detection: Timing judgment for tool invocation decisions

Quick Start

# Install
pip install isage-benchmark-agent

# Run tool selection experiment
sage-agent-bench tool-selection --config config/tool_selection_exp.yaml

# Run planning experiment
sage-agent-bench planning --config config/planning_exp.yaml

Documentation

See benchmark_agent/README.md for detailed documentation.

Development

# Clone
git clone https://github.com/intellistream/sage-agent-benchmark.git
cd sage-agent-benchmark

# Setup virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isage_benchmark_agent-0.1.0.1.tar.gz (273.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

isage_benchmark_agent-0.1.0.1-py2.py3-none-any.whl (306.9 kB view details)

Uploaded Python 2Python 3

isage_benchmark_agent-0.1.0.1-cp311-none-any.whl (144.4 kB view details)

Uploaded CPython 3.11

File details

Details for the file isage_benchmark_agent-0.1.0.1.tar.gz.

File metadata

  • Download URL: isage_benchmark_agent-0.1.0.1.tar.gz
  • Upload date:
  • Size: 273.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for isage_benchmark_agent-0.1.0.1.tar.gz
Algorithm Hash digest
SHA256 e9d681ddf8311852c3f8257521496898f2ab1d45460fff6d2676f8cc42e4f3d1
MD5 63b649172d9a337d834eac2227ab9227
BLAKE2b-256 aa973108af8eff75c0426e52ca5dae5d1135082bce81430ba09d502606df48b2

See more details on using hashes here.

File details

Details for the file isage_benchmark_agent-0.1.0.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for isage_benchmark_agent-0.1.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c2f37f3611eb676473cc3d4f4a2f29f5a86d0299c700d0f4381299e2edfb8c6f
MD5 1b6eec31fcc3305a63da20c3c0573dce
BLAKE2b-256 264aa2b2f99a70c2e7906d4213125e5b6362c805ef2a1d33831c44da6db01add

See more details on using hashes here.

File details

Details for the file isage_benchmark_agent-0.1.0.1-cp311-none-any.whl.

File metadata

File hashes

Hashes for isage_benchmark_agent-0.1.0.1-cp311-none-any.whl
Algorithm Hash digest
SHA256 dc66c241f08b78811244d71d1776b1c0e47af5aa9780bb0837268cd05b9ccd8d
MD5 be267ebac10eee0053ed1b32fc2eb51e
BLAKE2b-256 a1fc742f92ba9af37bae6074a47546fff7db7b8d5e61dd65dc02a0410ddbaa34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page