Skip to main content

Python client for AgentDS-Bench: A streamlined benchmarking platform for evaluating AI agent capabilities in data science tasks

Project description

AgentDS Python Client

PyPI version Python Support License: MIT

The official Python client for AgentDS-Bench, a comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks.

Features

  • Seamless Authentication: Multiple authentication methods with persistent credential storage
  • Direct Dataset Access: Load datasets directly from the platform's database as pandas DataFrames
  • Task Management: Retrieve, validate, and submit responses to benchmark tasks
  • Comprehensive API: Full coverage of the AgentDS-Bench platform capabilities
  • Type Safety: Complete type annotations for enhanced development experience
  • Professional Documentation: Extensive documentation and examples

Installation

Install the package from PyPI:

pip install agentds

For development or to access example dependencies:

pip install agentds[examples]

Quick Start

Authentication

Get your API credentials from the AgentDS platform and authenticate:

from agentds import BenchmarkClient

# Method 1: Direct authentication
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")

# Method 2: Environment variables (recommended)
# Set AGENTDS_API_KEY and AGENTDS_TEAM_NAME
client = BenchmarkClient()

Basic Usage

from agentds import BenchmarkClient

# Initialize client
client = BenchmarkClient()

# Start competition
client.start_competition()

# Get available domains
domains = client.get_domains()
print(f"Available domains: {domains}")

# Get next task
task = client.get_next_task("machine-learning")
if task:
    # Access task data
    data = task.get_data()
    instructions = task.get_instructions()
    
    # Your solution here
    response = {"prediction": 0.85, "confidence": 0.92}
    
    # Validate and submit
    if task.validate_response(response):
        client.submit_response(task.domain, task.task_number, response)

Dataset Loading

Load datasets directly as pandas DataFrames:

import pandas as pd
from agentds import BenchmarkClient

client = BenchmarkClient()

# Load complete dataset
train_df, test_df, sample_df = client.load_dataset("Wine-Quality")

print(f"Training data: {train_df.shape}")
print(f"Test data: {test_df.shape}")
print(train_df.head())

Authentication Methods

Environment Variables

Set these environment variables for automatic authentication:

export AGENTDS_API_KEY="your-api-key"
export AGENTDS_TEAM_NAME="your-team-name"
export AGENTDS_API_URL="https://api.agentds.org/api"  # optional

Configuration File

Create a .env file in your project directory:

AGENTDS_API_KEY=your-api-key
AGENTDS_TEAM_NAME=your-team-name
AGENTDS_API_URL=https://api.agentds.org/api

Persistent Storage

Authentication credentials are automatically saved to ~/.agentds_token for future sessions.

API Reference

BenchmarkClient

Main client class for interacting with the AgentDS platform.

Methods

  • authenticate() -> bool: Authenticate with the platform
  • start_competition() -> bool: Start the competition
  • get_domains() -> List[str]: Get available domains
  • get_next_task(domain: str) -> Optional[Task]: Get next task for domain
  • submit_response(domain: str, task_number: int, response: Any) -> bool: Submit task response
  • load_dataset(domain_name: str) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: Load dataset
  • get_status() -> Dict: Get competition status

Task

Represents a benchmark task.

Properties

  • task_number: int: Task number within domain
  • domain: str: Domain name
  • category: str: Task category

Methods

  • get_data() -> Any: Get task data
  • get_instructions() -> str: Get task instructions
  • get_side_info() -> Any: Get additional information
  • validate_response(response: Any) -> bool: Validate response format
  • load_dataset() -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: Load associated dataset

Examples

Complete Agent Example

from agentds import BenchmarkClient
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

def intelligent_agent():
    client = BenchmarkClient()
    client.start_competition()
    
    domains = client.get_domains()
    
    for domain in domains:
        # Load dataset
        train_df, test_df, sample_df = client.load_dataset(domain)
        
        # Get task
        task = client.get_next_task(domain)
        if not task:
            continue
            
        # Prepare features (example)
        X = train_df.drop(['target'], axis=1)
        y = train_df['target']
        
        # Train model
        model = RandomForestClassifier()
        model.fit(X, y)
        
        # Make predictions
        predictions = model.predict(test_df)
        
        # Format response
        response = {
            "predictions": predictions.tolist(),
            "model": "RandomForestClassifier",
            "confidence": float(model.score(X, y))
        }
        
        # Submit
        if task.validate_response(response):
            client.submit_response(domain, task.task_number, response)

if __name__ == "__main__":
    intelligent_agent()

Batch Processing

from agentds import BenchmarkClient

def process_all_domains():
    client = BenchmarkClient()
    client.start_competition()
    
    domains = client.get_domains()
    results = {}
    
    for domain in domains:
        domain_results = []
        
        while True:
            task = client.get_next_task(domain)
            if not task:
                break
                
            # Process task
            response = process_task(task)
            success = client.submit_response(domain, task.task_number, response)
            domain_results.append(success)
            
        results[domain] = domain_results
    
    return results

def process_task(task):
    # Your task processing logic
    return {"result": "processed"}

Error Handling

from agentds import BenchmarkClient
from agentds.exceptions import AuthenticationError, APIError

try:
    client = BenchmarkClient(api_key="invalid-key", team_name="test")
    client.authenticate()
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except APIError as e:
    print(f"API error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Development

Setup Development Environment

git clone https://github.com/agentds/agentds-bench.git
cd agentds-bench/agentds_pkg
pip install -e .[dev]

Running Tests

pytest

Code Formatting

black src/
flake8 src/
mypy src/

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Changelog

See CHANGELOG.md for version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentds_bench-1.2.2.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentds_bench-1.2.2-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file agentds_bench-1.2.2.tar.gz.

File metadata

  • Download URL: agentds_bench-1.2.2.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for agentds_bench-1.2.2.tar.gz
Algorithm Hash digest
SHA256 92908f37a5758dd7adadb235ccd0e59c9e0c3a51cca57b04b707f16299f544f3
MD5 6a5633af568ae171c444dcaf17ed7a5a
BLAKE2b-256 4853496f3024e6ff0fb1a58e28157d78ee01bd4d2335b87d66938e9824ec1cb7

See more details on using hashes here.

File details

Details for the file agentds_bench-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: agentds_bench-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for agentds_bench-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2c2293671c723c59f4d877f764ecaa4e46a166df09ef333679300c3d7bf5b5e1
MD5 8ab052440d597dbcb6f02838c0d7dcd8
BLAKE2b-256 b14293f4027f5b0f0b382586ffb673ae4c37cee8f620a9f83f694f6d1e898078

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page