SDK for building framework adapters that integrate with TrustyAI EvalHub
Project description
EvalHub SDK
Framework Adapter SDK for TrustyAI EvalHub Integration
The EvalHub SDK provides a standardized way to create framework adapters that can be consumed by EvalHub, enabling a "Bring Your Own Framework" (BYOF) approach for evaluation frameworks.
Overview
The SDK creates a common API layer that allows EvalHub to communicate with ANY evaluation framework. Users only need to write minimal "glue" code to connect their framework to the standardized interface.
EvalHub → (Standard API) → Your Framework Adapter → Your Evaluation Framework
Architecture
graph LR
EH[EvalHub]
FA[Framework Adapter<br/>SDK + Glue Code]
YF[Your Framework<br/>LMEval, Custom,<br/>RAGAS, etc.]
API[Standard API<br/>─────────────<br/>/health<br/>/info<br/>/benchmarks<br/>/evaluations]
EH <--> FA
FA <--> YF
EH --> API
FA --> API
Package Organization
The SDK is organized into distinct, focused packages:
🏗️ Core (evalhub.models) - Shared data models and utilities
- Request/response models for API communication
- Common data structures used by both clients and adapters
🔧 Adapter SDK (evalhub.adapter) - Components for building framework adapters
- Framework adapter base class and configuration
- Server components for hosting your adapter
- API routing and endpoint implementations
- CLI tools for running and managing adapters
📡 Client SDK (evalhub.adapter.client) - Components for communicating with adapters
- HTTP client for connecting to framework adapters
- Discovery service for finding and managing multiple adapters
- Async communication patterns
Key Components
- Standard API: Common REST endpoints that all adapters must implement
- Framework Adapter Base Class: Abstract base class with the adapter contract (
evalhub.adapter.models) - Server Components: FastAPI-based server for exposing the standard API (
evalhub.adapter.server) - Client Components: HTTP client for EvalHub to communicate with adapters (
evalhub.adapter.client) - Data Models: Pydantic models for requests, responses, and metadata (
evalhub.models)
Quick Start
1. Installation
# Install from PyPI (when available)
pip install evalhub-sdk
# Install from source
git clone https://github.com/trustyai-explainability/evalhub-sdk.git
cd evalhub-sdk
pip install -e .[dev]
2. Create Your Adapter
Create a new Python file for your adapter:
# my_framework_adapter.py
from evalhub.adapter import FrameworkAdapter, AdapterConfig
from evalhub.models import *
class MyFrameworkAdapter(FrameworkAdapter):
async def initialize(self):
"""Initialize your framework here"""
# Load your evaluation framework
pass
async def list_benchmarks(self) -> List[BenchmarkInfo]:
"""Return available benchmarks from your framework"""
return [
BenchmarkInfo(
benchmark_id="my_benchmark",
name="My Custom Benchmark",
description="A custom benchmark",
category="reasoning",
metrics=["accuracy", "f1_score"]
)
]
async def submit_evaluation(self, request: EvaluationRequest) -> EvaluationJob:
"""Submit evaluation to your framework"""
# Translate request to your framework's format
# Run evaluation
# Return job information
pass
# Implement other required methods...
3. Run Your Adapter
# run_adapter.py
from evalhub.adapter import AdapterServer, AdapterConfig
from my_framework_adapter import MyFrameworkAdapter
config = AdapterConfig(
framework_id="my_framework",
adapter_name="My Framework Adapter",
port=8080
)
adapter = MyFrameworkAdapter(config)
server = AdapterServer(adapter)
server.run()
4. Test Your Adapter
# Run your adapter
python run_adapter.py
# Test health check
curl http://localhost:8080/api/v1/health
# Get framework info
curl http://localhost:8080/api/v1/info
# List benchmarks
curl http://localhost:8080/api/v1/benchmarks
Package Organization Guide
The EvalHub SDK is organized into distinct packages based on your use case:
📦 Which Package Should I Use?
| Use Case | Primary Package | Description |
|---|---|---|
| Building an Adapter | evalhub.adapter |
You're creating a new framework adapter |
| Connecting to Adapters | evalhub.adapter.client |
You're building a client to communicate with adapters |
| Data Models | evalhub.models |
You need request/response models for API communication |
| CLI Tools | evalhub.adapter.cli |
You want to run/manage adapters from command line |
🎯 Import Patterns by Role
Framework Adapter Developer:
# Building your adapter
from evalhub.adapter.models import FrameworkAdapter, AdapterConfig
from evalhub.adapter.server import AdapterServer
from evalhub.models.api import EvaluationRequest, EvaluationJob
# Running your adapter
from evalhub.adapter import * # Everything you need
Client Developer (EvalHub team):
# Communicating with adapters
from evalhub.adapter.client import AdapterClient, AdapterDiscovery
from evalhub.models.api import EvaluationRequest, ModelConfig
Integration Developer:
# Using both sides of the API
from evalhub.adapter.client import AdapterClient # Client side
from evalhub.adapter.models import FrameworkAdapter # Adapter side
from evalhub.models.api import * # Shared models
Complete Examples
LightEval Framework Example
See examples/lighteval_adapter/ for a production-ready example with:
Try the demo (notebook runs outside the container):
# Container: LightEval + adapter
# Notebook: External HTTP client
cd examples/
jupyter notebook lighteval_demo_external.ipynb
Standard API Endpoints
All framework adapters expose the same REST API:
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/info |
GET | Framework information |
/benchmarks |
GET | List available benchmarks |
/benchmarks/{id} |
GET | Get benchmark details |
/evaluations |
POST | Submit evaluation job |
/evaluations/{job_id} |
GET | Get job status |
/evaluations/{job_id}/results |
GET | Get evaluation results |
/evaluations/{job_id} |
DELETE | Cancel job |
/evaluations/{job_id}/stream |
GET | Stream job updates |
Example API Usage
# Submit evaluation
curl -X POST http://localhost:8080/api/v1/evaluations \
-H "Content-Type: application/json" \
-d '{
"benchmark_id": "my_benchmark",
"model": {
"name": "gpt-4",
"provider": "openai",
"parameters": {
"temperature": 0.1,
"max_tokens": 100
}
},
"num_examples": 100,
"experiment_name": "test_evaluation"
}'
# Check job status
curl http://localhost:8080/api/v1/evaluations/{job_id}
# Get results
curl http://localhost:8080/api/v1/evaluations/{job_id}/results
Framework Adapter Interface
Required Methods
Your adapter must implement these abstract methods:
class FrameworkAdapter(ABC):
@abstractmethod
async def initialize(self) -> None:
"""Initialize the framework"""
@abstractmethod
async def get_framework_info(self) -> FrameworkInfo:
"""Get framework information"""
@abstractmethod
async def list_benchmarks(self) -> List[BenchmarkInfo]:
"""List available benchmarks"""
@abstractmethod
async def get_benchmark_info(self, benchmark_id: str) -> Optional[BenchmarkInfo]:
"""Get benchmark details"""
@abstractmethod
async def submit_evaluation(self, request: EvaluationRequest) -> EvaluationJob:
"""Submit evaluation job"""
@abstractmethod
async def get_job_status(self, job_id: str) -> Optional[EvaluationJob]:
"""Get job status"""
@abstractmethod
async def get_evaluation_results(self, job_id: str) -> Optional[EvaluationResponse]:
"""Get evaluation results"""
@abstractmethod
async def cancel_job(self, job_id: str) -> bool:
"""Cancel job"""
@abstractmethod
async def health_check(self) -> HealthResponse:
"""Perform health check"""
@abstractmethod
async def shutdown(self) -> None:
"""Graceful shutdown"""
Data Models
Key data models for requests and responses:
# Evaluation request from EvalHub
class EvaluationRequest(BaseModel):
benchmark_id: str
model: ModelConfig
num_examples: Optional[int] = None
num_few_shot: Optional[int] = None
benchmark_config: Dict[str, Any] = {}
experiment_name: Optional[str] = None
# Model configuration
class ModelConfig(BaseModel):
name: str
provider: Optional[str] = None
parameters: Dict[str, Any] = {}
device: Optional[str] = None
batch_size: Optional[int] = None
# Evaluation job tracking
class EvaluationJob(BaseModel):
job_id: str
status: JobStatus # PENDING, RUNNING, COMPLETED, FAILED, CANCELLED
request: EvaluationRequest
submitted_at: datetime
started_at: Optional[datetime] = None
completed_at: Optional[datetime] = None
progress: Optional[float] = None # 0.0 to 1.0
error_message: Optional[str] = None
# Evaluation results
class EvaluationResponse(BaseModel):
job_id: str
benchmark_id: str
model_name: str
results: List[EvaluationResult]
overall_score: Optional[float] = None
num_examples_evaluated: int
completed_at: datetime
duration_seconds: float
# Individual metric result
class EvaluationResult(BaseModel):
metric_name: str
metric_value: Union[float, int, str, bool]
metric_type: str = "float"
num_samples: Optional[int] = None
CLI Usage
The SDK includes a CLI tool for running and testing adapters:
# Run an adapter
evalhub-adapter run my_adapter:MyAdapter --port 8080
# Get adapter info
evalhub-adapter info http://localhost:8080
# Check adapter health
evalhub-adapter health http://localhost:8080
# Discover multiple adapters
evalhub-adapter discover http://adapter1:8080 http://adapter2:8081
EvalHub Integration
Client Usage
EvalHub uses the provided client to communicate with adapters:
from evalhub.adapter.client import AdapterClient
from evalhub.models import EvaluationRequest, ModelConfig
async with AdapterClient("http://adapter:8080") as client:
# Get framework info
info = await client.get_framework_info()
print(f"Framework: {info.name}")
# List benchmarks
benchmarks = await client.list_benchmarks()
print(f"Available benchmarks: {len(benchmarks)}")
# Submit evaluation
request = EvaluationRequest(
benchmark_id="custom_benchmark",
model=ModelConfig(
name="llama-7b",
provider="vllm",
parameters={"temperature": 0.1}
),
num_examples=100
)
job = await client.submit_evaluation(request)
print(f"Job submitted: {job.job_id}")
# Wait for completion
final_job = await client.wait_for_completion(job.job_id)
# Get results
if final_job.status == JobStatus.COMPLETED:
results = await client.get_evaluation_results(job.job_id)
print(f"Results: {len(results.results)} metrics")
Discovery Service
EvalHub can automatically discover and manage multiple adapters:
from evalhub.adapter.client import AdapterDiscovery
discovery = AdapterDiscovery()
# Register adapters
discovery.register_adapter("http://lmeval-adapter:8080")
discovery.register_adapter("http://ragas-adapter:8081")
# Start health monitoring
await discovery.start_health_monitoring()
# Get healthy adapters
healthy_adapters = discovery.get_healthy_adapters()
# Find adapter for specific framework
lmeval_adapter = discovery.get_adapter_for_framework("lm_evaluation_harness")
Configuration
Adapter Configuration
config = AdapterConfig(
framework_id="my_framework",
adapter_name="My Framework Adapter",
version="1.0.0",
host="0.0.0.0",
port=8080,
max_concurrent_jobs=5,
job_timeout_seconds=3600,
log_level="INFO",
framework_config={
# Framework-specific settings
"model_cache_dir": "/models",
"device": "cuda",
"batch_size": 8
}
)
Configuration File
# adapter_config.yaml
framework_id: "my_framework"
adapter_name: "My Framework Adapter"
version: "1.0.0"
host: "0.0.0.0"
port: 8080
max_concurrent_jobs: 10
job_timeout_seconds: 7200
log_level: "DEBUG"
framework_config:
model_cache_dir: "/data/models"
device: "cuda:0"
batch_size: 16
enable_caching: true
Deployment
Podman with Red Hat UBI
# Framework Adapter Container
FROM registry.access.redhat.com/ubi9/python-311:latest
# Set environment variables for Python optimization
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
WORKDIR /app
# Copy source code
COPY . ./
# Install dependencies
RUN pip install -e .
EXPOSE 8080
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
CMD curl -f http://localhost:8080/api/v1/health || exit 1
CMD ["evalhub-adapter", "run", "my_adapter:MyAdapter", "--port", "8080"]
Building and Running with Podman
# Build the image
podman build -t your-adapter:latest .
# Run the container
podman run -d \
--name your-adapter \
-p 8080:8080 \
--health-cmd='curl -f http://localhost:8080/api/v1/health || exit 1' \
--health-interval=30s \
--health-timeout=10s \
--health-start-period=30s \
--health-retries=3 \
your-adapter:latest
# Check container health
podman ps
# View logs
podman logs your-adapter
# Stop and clean up
podman stop your-adapter
podman rm your-adapter
Note: For frameworks requiring additional build tools, see examples/lighteval_adapter/ for a production deployment example with UBI minimal and custom dependencies.
Development
Project Structure
The SDK uses a modern Python project structure with clear separation of concerns:
evalhub-sdk/
├── src/evalhub/ # Source code (src layout)
│ ├── models/ # 🏗️ Core: Shared data models
│ │ ├── api.py # Request/response models
│ │ └── __init__.py
│ ├── adapter/ # 🔧 Adapter SDK: Framework adapter components
│ │ ├── models/ # Adapter-specific models (FrameworkAdapter, AdapterConfig)
│ │ ├── server/ # FastAPI server for hosting adapters
│ │ ├── api/ # API endpoints and routing
│ │ ├── client/ # 📡 Client SDK: Communication with adapters
│ │ ├── cli.py # Command-line interface for adapters
│ │ └── __init__.py
│ ├── utils/ # 🛠️ Utilities and helpers
│ ├── cli.py # Main CLI interface
│ └── __init__.py # Public API exports
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── examples/ # Example adapters
│ ├── custom_framework_adapter.py
│ └── lighteval_adapter/
└── pyproject.toml # Project configuration
Package Usage Patterns
🏗️ Building an Adapter:
from evalhub.adapter import FrameworkAdapter, AdapterConfig, AdapterServer
from evalhub.models import EvaluationRequest, EvaluationJob
📡 Connecting to Adapters:
from evalhub.adapter.client import AdapterClient, AdapterDiscovery
from evalhub.models import EvaluationRequest, ModelConfig
🛠️ Framework Development:
# Access everything through the main package
from evalhub.adapter import * # All adapter components
from evalhub.models import * # All data models
Development Setup
# Clone the repository
git clone https://github.com/trustyai-explainability/evalhub-sdk.git
cd evalhub-sdk
# Install in development mode with all dependencies
pip install -e .[dev]
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
# Run tests with coverage
pytest --cov=src/evalhub --cov-report=html
# Run type checking
mypy src/evalhub
# Run linting
ruff check src/ tests/
ruff format src/ tests/
Testing Your Adapter
import pytest
from evalhub.adapter.client import AdapterClient
@pytest.mark.asyncio
async def test_adapter_health():
async with AdapterClient("http://localhost:8080") as client:
health = await client.health_check()
assert health.status == "healthy"
@pytest.mark.asyncio
async def test_list_benchmarks():
async with AdapterClient("http://localhost:8080") as client:
benchmarks = await client.list_benchmarks()
assert len(benchmarks) > 0
assert all(b.benchmark_id for b in benchmarks)
Development Server
# Run with auto-reload for development
evalhub-adapter run my_adapter:MyAdapter --reload --log-level DEBUG
Quality Assurance
Run all quality checks:
# Format code
ruff format .
# Lint and fix issues
ruff check --fix .
# Type check
mypy src/evalhub
# Run full test suite
pytest -v --cov=src/evalhub
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for your changes
- Run the test suite
- Submit a pull request
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eval_hub_sdk-0.1.0a0.tar.gz.
File metadata
- Download URL: eval_hub_sdk-0.1.0a0.tar.gz
- Upload date:
- Size: 35.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd749fb7763e502003f77d57bc59d75dfbca83ceb3d8154ad57879781286a5df
|
|
| MD5 |
6fffa534808ae47fc5647a69a7f659cd
|
|
| BLAKE2b-256 |
fc8dd0d4d1c22b658e710ffa084a7944c5dcca72137a71ece68c1a6b20a829e9
|
Provenance
The following attestation bundles were made for eval_hub_sdk-0.1.0a0.tar.gz:
Publisher:
publish-pypi.yml on eval-hub/eval-hub-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eval_hub_sdk-0.1.0a0.tar.gz -
Subject digest:
fd749fb7763e502003f77d57bc59d75dfbca83ceb3d8154ad57879781286a5df - Sigstore transparency entry: 850004765
- Sigstore integration time:
-
Permalink:
eval-hub/eval-hub-sdk@210f0d1c3965f88672fe1024c024ba57a261ce4b -
Branch / Tag:
refs/tags/v0.1.0a0 - Owner: https://github.com/eval-hub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@210f0d1c3965f88672fe1024c024ba57a261ce4b -
Trigger Event:
release
-
Statement type:
File details
Details for the file eval_hub_sdk-0.1.0a0-py3-none-any.whl.
File metadata
- Download URL: eval_hub_sdk-0.1.0a0-py3-none-any.whl
- Upload date:
- Size: 41.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0ffcb13a921145a341bb0046132e4410ef84c83439f36b1df28cd9713803a19
|
|
| MD5 |
c2b6ecd3ceaa80e37b3c38ebabe048cf
|
|
| BLAKE2b-256 |
cdfce8c3020a9576832d5669b2e2e961af684f4fa4b21ff337999e86b099e2ef
|
Provenance
The following attestation bundles were made for eval_hub_sdk-0.1.0a0-py3-none-any.whl:
Publisher:
publish-pypi.yml on eval-hub/eval-hub-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
eval_hub_sdk-0.1.0a0-py3-none-any.whl -
Subject digest:
b0ffcb13a921145a341bb0046132e4410ef84c83439f36b1df28cd9713803a19 - Sigstore transparency entry: 850004767
- Sigstore integration time:
-
Permalink:
eval-hub/eval-hub-sdk@210f0d1c3965f88672fe1024c024ba57a261ce4b -
Branch / Tag:
refs/tags/v0.1.0a0 - Owner: https://github.com/eval-hub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@210f0d1c3965f88672fe1024c024ba57a261ce4b -
Trigger Event:
release
-
Statement type: