Skip to main content

A comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks

Project description

AgentDS-Bench Python Package

This package provides the interface for interacting with the AgentDS-Bench platform, a comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks.

Installation

pip install agentds

Authentication

Before using the package, you must authenticate with your team's API key. You have several options:

Option 1: Direct Authentication

from agentds.client import BenchmarkClient

# Initialize with credentials
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")

Option 2: Environment Variables

You can set the following environment variables:

export AGENTDS_API_KEY="your-api-key"
export AGENTDS_TEAM_NAME="your-team-name"

Then initialize without parameters:

from agentds.client import BenchmarkClient

# Will use environment variables
client = BenchmarkClient()

Option 3: .env File

Create a .env file in your project directory:

AGENTDS_API_URL=https://api.agentds.org/api
AGENTDS_API_KEY=your-api-key
AGENTDS_TEAM_NAME=your-team-name

Then:

from agentds.client import BenchmarkClient

# Will load from .env file
client = BenchmarkClient()

API Key Storage

When you authenticate, the API key is stored in:

  • Environment variables for the current session
  • A token file at ~/.agentds_token for future sessions

Basic Usage

from agentds.client import BenchmarkClient

# Initialize client
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")

# Start the competition if not already started
client.start_competition()

# Get available domains
domains = client.get_domains()
print(f"Available domains: {domains}")

# Get the next task for a domain
task = client.get_next_task("machine_learning")
if task:
    # Print task details
    print(f"Task ID: {task.task_id}")
    print(f"Instructions: {task.get_instructions()}")
    
    # Your agent's solution (replace with your implementation)
    response = {"prediction": 0.75, "confidence": 0.9}
    
    # Validate response format
    if task.validate_response(response):
        # Submit response
        client.submit_response(task.domain, task.task_id, response)

Task Data

Each task contains:

  • task_id: Unique identifier
  • domain: The knowledge domain
  • category: Scaling category (Fidelity, Volume, Noise, Complexity)
  • data: The primary task data
  • instructions: Task instructions
  • side_info: Additional context (optional)
  • response_format: Expected response format

Access task data:

# Get the main task data
data = task.get_data()

# Get task instructions
instructions = task.get_instructions()

# Get additional info
side_info = task.get_side_info()

# Get expected response format
response_format = task.get_response_format()

Common Issues

  1. Authentication Failed: Verify your API key and team name are correct.
  2. No Tasks Available: Ensure you've called start_competition() first.
  3. Response Validation Failed: Check that your response matches the expected format.

Example Project Structure

my_agent/
├── .env                 # Environment variables
├── agent.py             # Your agent implementation
└── run_benchmark.py     # Script to run benchmarks

More Resources

For more detailed information, check out:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentds-1.0.0.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentds-1.0.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file agentds-1.0.0.tar.gz.

File metadata

  • Download URL: agentds-1.0.0.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for agentds-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a906287304dbf4ce1e19f7c95324597aaed9303d08efc1a9804b2cda7b98ab51
MD5 0b6c10a3a2749d0f3e9290a153a70721
BLAKE2b-256 c5dbbfbace27e66fcd14a810065ed9884e2b33c26e33364d1823aa3f4fa5971d

See more details on using hashes here.

File details

Details for the file agentds-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: agentds-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for agentds-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c969119d429c07c2e9b7ded7478af2daeeec11f4ab5128c26916a3e243016110
MD5 935650f1bd338577289298c41cdb99bf
BLAKE2b-256 71334d9d7d832224c49cc7b35fe968d9619eae2a7a619eaeb1effcde0b67b77d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page