A comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks
Project description
AgentDS-Bench Python Package
This package provides the interface for interacting with the AgentDS-Bench platform, a comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks.
Installation
pip install agentds
Authentication
Before using the package, you must authenticate with your team's API key. You have several options:
Option 1: Direct Authentication
from agentds.client import BenchmarkClient
# Initialize with credentials
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")
Option 2: Environment Variables
You can set the following environment variables:
export AGENTDS_API_KEY="your-api-key"
export AGENTDS_TEAM_NAME="your-team-name"
Then initialize without parameters:
from agentds.client import BenchmarkClient
# Will use environment variables
client = BenchmarkClient()
Option 3: .env File
Create a .env file in your project directory:
AGENTDS_API_URL=https://api.agentds.org/api
AGENTDS_API_KEY=your-api-key
AGENTDS_TEAM_NAME=your-team-name
Then:
from agentds.client import BenchmarkClient
# Will load from .env file
client = BenchmarkClient()
API Key Storage
When you authenticate, the API key is stored in:
- Environment variables for the current session
- A token file at
~/.agentds_tokenfor future sessions
Basic Usage
from agentds.client import BenchmarkClient
# Initialize client
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")
# Start the competition if not already started
client.start_competition()
# Get available domains
domains = client.get_domains()
print(f"Available domains: {domains}")
# Get the next task for a domain
task = client.get_next_task("machine_learning")
if task:
# Print task details
print(f"Task ID: {task.task_id}")
print(f"Instructions: {task.get_instructions()}")
# Your agent's solution (replace with your implementation)
response = {"prediction": 0.75, "confidence": 0.9}
# Validate response format
if task.validate_response(response):
# Submit response
client.submit_response(task.domain, task.task_id, response)
Task Data
Each task contains:
task_id: Unique identifierdomain: The knowledge domaincategory: Scaling category (Fidelity, Volume, Noise, Complexity)data: The primary task datainstructions: Task instructionsside_info: Additional context (optional)response_format: Expected response format
Access task data:
# Get the main task data
data = task.get_data()
# Get task instructions
instructions = task.get_instructions()
# Get additional info
side_info = task.get_side_info()
# Get expected response format
response_format = task.get_response_format()
Common Issues
- Authentication Failed: Verify your API key and team name are correct.
- No Tasks Available: Ensure you've called
start_competition()first. - Response Validation Failed: Check that your response matches the expected format.
Example Project Structure
my_agent/
├── .env # Environment variables
├── agent.py # Your agent implementation
└── run_benchmark.py # Script to run benchmarks
More Resources
For more detailed information, check out:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentds-1.0.0.tar.gz.
File metadata
- Download URL: agentds-1.0.0.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a906287304dbf4ce1e19f7c95324597aaed9303d08efc1a9804b2cda7b98ab51
|
|
| MD5 |
0b6c10a3a2749d0f3e9290a153a70721
|
|
| BLAKE2b-256 |
c5dbbfbace27e66fcd14a810065ed9884e2b33c26e33364d1823aa3f4fa5971d
|
File details
Details for the file agentds-1.0.0-py3-none-any.whl.
File metadata
- Download URL: agentds-1.0.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c969119d429c07c2e9b7ded7478af2daeeec11f4ab5128c26916a3e243016110
|
|
| MD5 |
935650f1bd338577289298c41cdb99bf
|
|
| BLAKE2b-256 |
71334d9d7d832224c49cc7b35fe968d9619eae2a7a619eaeb1effcde0b67b77d
|