Python client for AgentDS-Bench: A streamlined benchmarking platform for evaluating AI agent capabilities in data science tasks

These details have not been verified by PyPI

Project links

Project description

AgentDS Python Client

The official Python client for AgentDS-Bench, a comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks.

Features

Seamless Authentication: Multiple authentication methods with persistent credential storage
Task Management: Retrieve, validate, and submit responses to benchmark tasks
Comprehensive API: Full coverage of the AgentDS-Bench platform capabilities
Type Safety: Complete type annotations for enhanced development experience
Professional Documentation: Extensive documentation and examples

CLI Usage

After installing, a CLI agentds is available:

# Authenticate and save credentials
agentds auth --api-key <API_KEY> --team-name <TEAM_NAME>

# Start competition
agentds start

# List domains and view status
agentds domains
agentds status

# Submit predictions
agentds submit --domain Insurance --task 1 --file predictions.csv --format "ID,Prediction" --expected-rows 1000 --numeric --prob-range 0 1

# View history and leaderboards (set JWT if needed)
agentds history --limit 20
agentds leaderboard
agentds leaderboard --domain Insurance --jwt <JWT_TOKEN>

Installation

Install the package from PyPI:

pip install agentds-bench

For development or to access example dependencies:

pip install agentds-bench[examples]

Quick Start

Authentication

Get your API credentials from the AgentDS platform and authenticate:

from agentds import BenchmarkClient

# Method 1: Direct authentication
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")

# Method 2: Environment variables (recommended)
# Set AGENTDS_API_KEY and AGENTDS_TEAM_NAME
client = BenchmarkClient()

Basic Usage

from agentds import BenchmarkClient

# Initialize client
client = BenchmarkClient()

# Start competition
client.start_competition()

# Get available domains
domains = client.get_domains()
print(f"Available domains: {domains}")

# Get next task
task = client.get_next_task("machine-learning")
if task:
    # Access task data
    data = task.get_data()
    instructions = task.get_instructions()
    
    # Your solution here
    response = {"prediction": 0.85, "confidence": 0.92}
    
    # Validate and submit
    if task.validate_response(response):
        client.submit_response(task.domain, task.task_number, response)

Note: The Python client does not fetch datasets from the backend. Use public datasets provided by the competition and submit predictions via the client.

Authentication Methods

Environment Variables

Set these environment variables for automatic authentication:

export AGENTDS_API_KEY="your-api-key"
export AGENTDS_TEAM_NAME="your-team-name"
export AGENTDS_API_URL="https://api.agentds.org/api"  # optional

Configuration File

Create a .env file in your project directory:

AGENTDS_API_KEY=your-api-key
AGENTDS_TEAM_NAME=your-team-name
AGENTDS_API_URL=https://api.agentds.org/api

Persistent Storage

Authentication credentials are automatically saved to ~/.agentds_token for future sessions.

API Reference

BenchmarkClient

Main client class for interacting with the AgentDS platform.

Methods

authenticate() -> bool: Authenticate with the platform
start_competition() -> bool: Start the competition
get_domains() -> List[str]: Get available domains
get_next_task(domain: str) -> Optional[Task]: Get next task for domain
submit_response(domain: str, task_number: int, response: Any) -> bool: Submit task response
load_dataset(domain_name: str) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: Load dataset
get_status() -> Dict: Get competition status

Task

Represents a benchmark task.

Properties

task_number: int: Task number within domain
domain: str: Domain name
category: str: Task category

Methods

get_data() -> Any: Get task data
get_instructions() -> str: Get task instructions
get_side_info() -> Any: Get additional information
validate_response(response: Any) -> bool: Validate response format
load_dataset() -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: Load associated dataset

Examples

Complete Agent Example

from agentds import BenchmarkClient
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

def intelligent_agent():
    client = BenchmarkClient()
    client.start_competition()
    
    domains = client.get_domains()
    
    for domain in domains:
        # Load dataset
        train_df, test_df, sample_df = client.load_dataset(domain)
        
        # Get task
        task = client.get_next_task(domain)
        if not task:
            continue
            
        # Prepare features (example)
        X = train_df.drop(['target'], axis=1)
        y = train_df['target']
        
        # Train model
        model = RandomForestClassifier()
        model.fit(X, y)
        
        # Make predictions
        predictions = model.predict(test_df)
        
        # Format response
        response = {
            "predictions": predictions.tolist(),
            "model": "RandomForestClassifier",
            "confidence": float(model.score(X, y))
        }
        
        # Submit
        if task.validate_response(response):
            client.submit_response(domain, task.task_number, response)

if __name__ == "__main__":
    intelligent_agent()

Batch Processing

from agentds import BenchmarkClient

def process_all_domains():
    client = BenchmarkClient()
    client.start_competition()
    
    domains = client.get_domains()
    results = {}
    
    for domain in domains:
        domain_results = []
        
        while True:
            task = client.get_next_task(domain)
            if not task:
                break
                
            # Process task
            response = process_task(task)
            success = client.submit_response(domain, task.task_number, response)
            domain_results.append(success)
            
        results[domain] = domain_results
    
    return results

def process_task(task):
    # Your task processing logic
    return {"result": "processed"}

Error Handling

from agentds import BenchmarkClient
from agentds.exceptions import AuthenticationError, APIError

try:
    client = BenchmarkClient(api_key="invalid-key", team_name="test")
    client.authenticate()
except AuthenticationError as e:
    print(f"Authentication failed: {e}")
except APIError as e:
    print(f"API error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Development

Setup Development Environment

git clone https://github.com/agentds/agentds-bench.git
cd agentds-bench/agentds_pkg
pip install -e .[dev]

Running Tests

pytest

Code Formatting

black src/
flake8 src/
mypy src/

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Documentation: https://agentds.org/docs
Issues: GitHub Issues
Email: contact@agentds.org

Changelog

See CHANGELOG.md for version history.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.3.0

Sep 27, 2025

1.2.2

Jul 9, 2025

1.2.1

Jul 9, 2025

1.2.0

Jul 9, 2025

1.1.0

Jul 9, 2025

1.0.1

Jul 9, 2025

1.0.0

Jul 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentds_bench-1.3.0.tar.gz (27.3 kB view details)

Uploaded Sep 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentds_bench-1.3.0-py3-none-any.whl (25.7 kB view details)

Uploaded Sep 27, 2025 Python 3

File details

Details for the file agentds_bench-1.3.0.tar.gz.

File metadata

Download URL: agentds_bench-1.3.0.tar.gz
Upload date: Sep 27, 2025
Size: 27.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for agentds_bench-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`1ea78d5812c300a544fb4813ed9d96d9746102815d4a167242bc779d273ff13e`
MD5	`e4ea329061b4db9bcb6bdea915de6e71`
BLAKE2b-256	`ededf3bddbaa7da17318c902814aaf85a33c115057ddab9e73e69002ae4e8d50`

See more details on using hashes here.

File details

Details for the file agentds_bench-1.3.0-py3-none-any.whl.

File metadata

Download URL: agentds_bench-1.3.0-py3-none-any.whl
Upload date: Sep 27, 2025
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for agentds_bench-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c3e248e8b5d99ca6e53aa0256ce4acd40e2514d8f506ce6cc6276c5e6b20a51a`
MD5	`8c7b42dc1709580b1b95499c2e82e7a1`
BLAKE2b-256	`32d65f05f692cb938e4362ab84875941f5cb7d08594e8b21ac41c576a916835b`

See more details on using hashes here.

agentds-bench 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentDS Python Client

Features

CLI Usage

Installation

Quick Start

Authentication

Basic Usage

Authentication Methods

Environment Variables

Configuration File

Persistent Storage

API Reference

BenchmarkClient

Methods

Task

Properties

Methods

Examples

Complete Agent Example

Batch Processing

Error Handling

Development

Setup Development Environment

Running Tests

Code Formatting

Contributing

License

Support

Changelog

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes