SAGE Tool Use Benchmark - Tool selection and use evaluation framework
Project description
SAGE Agentic Tool Use Benchmark
Configuration-driven experiment framework for evaluating tool selection and use capabilities.
Features
- Tool Selection Evaluation: Tool retrieval and ranking benchmarks
- Planning Evaluation: Multi-step planning with tool composition
- Timing Detection: Timing judgment for tool invocation decisions
Quick Start
# Install
pip install isage-agentic-tooluse-benchmark
# Run tool selection experiment
sage-agentic-tooluse-bench tool-selection --config config/tool_selection_exp.yaml
# Run planning experiment
sage-agentic-tooluse-bench planning --config config/planning_exp.yaml
Documentation
See benchmark_agent/README.md for detailed documentation.
Development
# Clone
git clone https://github.com/intellistream/sage-agentic-tooluse-benchmark.git
cd sage-agentic-tooluse-benchmark
# Setup virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isage_agentic_tooluse_benchmark-0.1.5.tar.gz.
File metadata
- Download URL: isage_agentic_tooluse_benchmark-0.1.5.tar.gz
- Upload date:
- Size: 124.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0504ac48730650719b2e74158fc18e9d7226139eed18788c5d78f6ff66a226b6
|
|
| MD5 |
345889a52a39a294c91acd96740f3361
|
|
| BLAKE2b-256 |
0b229c6430a3ea11c9443ae8ab46e5d6aa3926db76d2cfe648c76e42a113da39
|
File details
Details for the file isage_agentic_tooluse_benchmark-0.1.5-py2.py3-none-any.whl.
File metadata
- Download URL: isage_agentic_tooluse_benchmark-0.1.5-py2.py3-none-any.whl
- Upload date:
- Size: 144.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8b7b0db200f27b5b8ee32b954b171fa5ae8d8fe2973beaa209ed406ef5d7a2b
|
|
| MD5 |
154f5eff46866228e4e3c6d3c9c82681
|
|
| BLAKE2b-256 |
02791a75392c8f24c883704343a7e804c184e2ff73b63a14526cbb69090e6e87
|