A comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

AgentDS-Bench Python Package

This package provides the interface for interacting with the AgentDS-Bench platform, a comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks.

Installation

pip install agentds

Authentication

Before using the package, you must authenticate with your team's API key. You have several options:

Option 1: Direct Authentication

from agentds.client import BenchmarkClient

# Initialize with credentials
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")

Option 2: Environment Variables

You can set the following environment variables:

export AGENTDS_API_KEY="your-api-key"
export AGENTDS_TEAM_NAME="your-team-name"

Then initialize without parameters:

from agentds.client import BenchmarkClient

# Will use environment variables
client = BenchmarkClient()

Option 3: .env File

Create a .env file in your project directory:

AGENTDS_API_URL=https://api.agentds.org/api
AGENTDS_API_KEY=your-api-key
AGENTDS_TEAM_NAME=your-team-name

Then:

from agentds.client import BenchmarkClient

# Will load from .env file
client = BenchmarkClient()

API Key Storage

When you authenticate, the API key is stored in:

Environment variables for the current session
A token file at ~/.agentds_token for future sessions

Basic Usage

from agentds.client import BenchmarkClient

# Initialize client
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")

# Start the competition if not already started
client.start_competition()

# Get available domains
domains = client.get_domains()
print(f"Available domains: {domains}")

# Get the next task for a domain
task = client.get_next_task("machine_learning")
if task:
    # Print task details
    print(f"Task ID: {task.task_id}")
    print(f"Instructions: {task.get_instructions()}")
    
    # Your agent's solution (replace with your implementation)
    response = {"prediction": 0.75, "confidence": 0.9}
    
    # Validate response format
    if task.validate_response(response):
        # Submit response
        client.submit_response(task.domain, task.task_id, response)

Task Data

Each task contains:

task_id: Unique identifier
domain: The knowledge domain
category: Scaling category (Fidelity, Volume, Noise, Complexity)
data: The primary task data
instructions: Task instructions
side_info: Additional context (optional)
response_format: Expected response format

Access task data:

# Get the main task data
data = task.get_data()

# Get task instructions
instructions = task.get_instructions()

# Get additional info
side_info = task.get_side_info()

# Get expected response format
response_format = task.get_response_format()

Common Issues

Authentication Failed: Verify your API key and team name are correct.
No Tasks Available: Ensure you've called start_competition() first.
Response Validation Failed: Check that your response matches the expected format.

Example Project Structure

my_agent/
├── .env                 # Environment variables
├── agent.py             # Your agent implementation
└── run_benchmark.py     # Script to run benchmarks

More Resources

For more detailed information, check out:

Project details

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

1.0.0

Apr 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentds-1.0.0.tar.gz (13.4 kB view details)

Uploaded Apr 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentds-1.0.0-py3-none-any.whl (17.0 kB view details)

Uploaded Apr 24, 2025 Python 3

File details

Details for the file agentds-1.0.0.tar.gz.

File metadata

Download URL: agentds-1.0.0.tar.gz
Upload date: Apr 24, 2025
Size: 13.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for agentds-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a906287304dbf4ce1e19f7c95324597aaed9303d08efc1a9804b2cda7b98ab51`
MD5	`0b6c10a3a2749d0f3e9290a153a70721`
BLAKE2b-256	`c5dbbfbace27e66fcd14a810065ed9884e2b33c26e33364d1823aa3f4fa5971d`

See more details on using hashes here.

File details

Details for the file agentds-1.0.0-py3-none-any.whl.

File metadata

Download URL: agentds-1.0.0-py3-none-any.whl
Upload date: Apr 24, 2025
Size: 17.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for agentds-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c969119d429c07c2e9b7ded7478af2daeeec11f4ab5128c26916a3e243016110`
MD5	`935650f1bd338577289298c41cdb99bf`
BLAKE2b-256	`71334d9d7d832224c49cc7b35fe968d9619eae2a7a619eaeb1effcde0b67b77d`

See more details on using hashes here.

agentds 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentDS-Bench Python Package

Installation

Authentication

Option 1: Direct Authentication

Option 2: Environment Variables

Option 3: .env File

API Key Storage

Basic Usage

Task Data

Common Issues

Example Project Structure

More Resources

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes