Python client for AgentDS-Bench: A streamlined benchmarking platform for evaluating AI agent capabilities in data science tasks
Project description
AgentDS Python Client
The official Python client for AgentDS-Bench, a comprehensive benchmarking platform for evaluating AI agent capabilities in data science tasks.
Features
- Seamless Authentication: Multiple authentication methods with persistent credential storage
- Direct Dataset Access: Load datasets directly from the platform's database as pandas DataFrames
- Task Management: Retrieve, validate, and submit responses to benchmark tasks
- Comprehensive API: Full coverage of the AgentDS-Bench platform capabilities
- Type Safety: Complete type annotations for enhanced development experience
- Professional Documentation: Extensive documentation and examples
Installation
Install the package from PyPI:
pip install agentds
For development or to access example dependencies:
pip install agentds[examples]
Quick Start
Authentication
Get your API credentials from the AgentDS platform and authenticate:
from agentds import BenchmarkClient
# Method 1: Direct authentication
client = BenchmarkClient(api_key="your-api-key", team_name="your-team-name")
# Method 2: Environment variables (recommended)
# Set AGENTDS_API_KEY and AGENTDS_TEAM_NAME
client = BenchmarkClient()
Basic Usage
from agentds import BenchmarkClient
# Initialize client
client = BenchmarkClient()
# Start competition
client.start_competition()
# Get available domains
domains = client.get_domains()
print(f"Available domains: {domains}")
# Get next task
task = client.get_next_task("machine-learning")
if task:
# Access task data
data = task.get_data()
instructions = task.get_instructions()
# Your solution here
response = {"prediction": 0.85, "confidence": 0.92}
# Validate and submit
if task.validate_response(response):
client.submit_response(task.domain, task.task_number, response)
Dataset Loading
Load datasets directly as pandas DataFrames:
import pandas as pd
from agentds import BenchmarkClient
client = BenchmarkClient()
# Load complete dataset
train_df, test_df, sample_df = client.load_dataset("Wine-Quality")
print(f"Training data: {train_df.shape}")
print(f"Test data: {test_df.shape}")
print(train_df.head())
Authentication Methods
Environment Variables
Set these environment variables for automatic authentication:
export AGENTDS_API_KEY="your-api-key"
export AGENTDS_TEAM_NAME="your-team-name"
export AGENTDS_API_URL="https://api.agentds.org/api" # optional
Configuration File
Create a .env file in your project directory:
AGENTDS_API_KEY=your-api-key
AGENTDS_TEAM_NAME=your-team-name
AGENTDS_API_URL=https://api.agentds.org/api
Persistent Storage
Authentication credentials are automatically saved to ~/.agentds_token for future sessions.
API Reference
BenchmarkClient
Main client class for interacting with the AgentDS platform.
Methods
authenticate() -> bool: Authenticate with the platformstart_competition() -> bool: Start the competitionget_domains() -> List[str]: Get available domainsget_next_task(domain: str) -> Optional[Task]: Get next task for domainsubmit_response(domain: str, task_number: int, response: Any) -> bool: Submit task responseload_dataset(domain_name: str) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: Load datasetget_status() -> Dict: Get competition status
Task
Represents a benchmark task.
Properties
task_number: int: Task number within domaindomain: str: Domain namecategory: str: Task category
Methods
get_data() -> Any: Get task dataget_instructions() -> str: Get task instructionsget_side_info() -> Any: Get additional informationvalidate_response(response: Any) -> bool: Validate response formatload_dataset() -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: Load associated dataset
Examples
Complete Agent Example
from agentds import BenchmarkClient
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
def intelligent_agent():
client = BenchmarkClient()
client.start_competition()
domains = client.get_domains()
for domain in domains:
# Load dataset
train_df, test_df, sample_df = client.load_dataset(domain)
# Get task
task = client.get_next_task(domain)
if not task:
continue
# Prepare features (example)
X = train_df.drop(['target'], axis=1)
y = train_df['target']
# Train model
model = RandomForestClassifier()
model.fit(X, y)
# Make predictions
predictions = model.predict(test_df)
# Format response
response = {
"predictions": predictions.tolist(),
"model": "RandomForestClassifier",
"confidence": float(model.score(X, y))
}
# Submit
if task.validate_response(response):
client.submit_response(domain, task.task_number, response)
if __name__ == "__main__":
intelligent_agent()
Batch Processing
from agentds import BenchmarkClient
def process_all_domains():
client = BenchmarkClient()
client.start_competition()
domains = client.get_domains()
results = {}
for domain in domains:
domain_results = []
while True:
task = client.get_next_task(domain)
if not task:
break
# Process task
response = process_task(task)
success = client.submit_response(domain, task.task_number, response)
domain_results.append(success)
results[domain] = domain_results
return results
def process_task(task):
# Your task processing logic
return {"result": "processed"}
Error Handling
from agentds import BenchmarkClient
from agentds.exceptions import AuthenticationError, APIError
try:
client = BenchmarkClient(api_key="invalid-key", team_name="test")
client.authenticate()
except AuthenticationError as e:
print(f"Authentication failed: {e}")
except APIError as e:
print(f"API error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Development
Setup Development Environment
git clone https://github.com/agentds/agentds-bench.git
cd agentds-bench/agentds_pkg
pip install -e .[dev]
Running Tests
pytest
Code Formatting
black src/
flake8 src/
mypy src/
Contributing
We welcome contributions! Please see our Contributing Guide for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- Documentation: https://agentds.org/docs
- Issues: GitHub Issues
- Email: contact@agentds.org
Changelog
See CHANGELOG.md for version history.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentds_bench-1.2.0.tar.gz.
File metadata
- Download URL: agentds_bench-1.2.0.tar.gz
- Upload date:
- Size: 18.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
846c712792a258dc7f63e20903eeca6182a2b65541e073cea940d1a09036804b
|
|
| MD5 |
462ea59227183fb8b4b7485917246a31
|
|
| BLAKE2b-256 |
83a443e410c97bf508d4101b34efb6bcb8fd7840fbc6cc271540888ac1f39163
|
File details
Details for the file agentds_bench-1.2.0-py3-none-any.whl.
File metadata
- Download URL: agentds_bench-1.2.0-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
836a88becdb6fe4ef25e9d6f974fbff34427f41dbd4a463f016929a59e61fc91
|
|
| MD5 |
db714cb9ed5906c3bc6479b8f28b6116
|
|
| BLAKE2b-256 |
8866a09547fc46450e7ebf3a1dfcbe97945c264505ebe1c7848f24df761bc47d
|