Python SDK for the Epsilab model evaluation and improvement platform.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

epsilab

These details have not been verified by PyPI

Project links

Project description

Epsilab Python SDK

Official Python client for the Epsilab model evaluation and improvement platform.

What is Epsilab?

Epsilab runs model and harness evaluations on workflow-level tasks, detects recurring capability gaps, and exports evals, trajectories, preference data, SFT examples, and regression tests. Training-data exports anonymize model identities by default using labels such as target_model and reference_A.

Installation

pip install epsilab

Or install from source:

git clone https://github.com/EpsilabAI/epsilab-python.git
cd epsilab-python
pip install -e .

Quick Start

from epsilab import Epsilab

client = Epsilab(api_key="sk-...")

# Compare multiple models in one evaluation (use any OpenRouter model slug)
eval_result = client.create_evaluation(
    ["provider/model-a", "provider/model-b", "provider/model-c"],
    name="Frontier comparison",
    max_tasks=25,
)
print(f"Evaluation started: {eval_result.evaluation_id}")

# Wait for completion
run = client.wait_for_completion(eval_result.runs[0].run_id)
print(f"Completed: {run.task_count} tasks, {run.gap_count} gaps found")

# View capability gaps
for gap in client.get_gaps(run.run_id):
    print(f"  {gap.capability}: alpha={gap.alpha_score:.3f}")

# Export targeted training data (model identities are anonymized by default)
client.export_run(run.run_id, format="dpo", path="output/dpo_pairs.jsonl")

Configuration

Environment Variable	Constructor Param	Description
`EPSILAB_API_KEY`	`api_key`	Your API key
`EPSILAB_API_BASE`	`api_base`	API base URL (default: production)
`EPSILAB_HTTP_TIMEOUT`	`timeout_seconds`	Request timeout in seconds (default: 120)
—	`max_retries`	Auto-retry count for 429/5xx (default: 3)
—	`backoff_base`	Initial retry backoff in seconds (default: 1.0)
—	`load_dotenv`	Also read a local `.env` file (default: false)

The SDK reads process environment variables automatically. To also read a local .env file, opt in explicitly:

client = Epsilab(load_dotenv=True)

Multi-Model Evaluations

Compare multiple models side-by-side on the same task set:

# Simple: just pass model IDs (any OpenRouter-compatible slug)
eval_result = client.create_evaluation(
    ["provider/model-a", "provider/model-b", "provider/model-c"],
    name="Three-way comparison",
)

# Advanced: per-model harness overrides
eval_result = client.create_evaluation(
    [
        {"model_id": "provider/model-a", "harness": "codex"},
        {"model_id": "provider/model-b", "harness": "openhands"},
        "provider/model-c",  # uses default_harness
    ],
    default_harness="codex",
    max_tasks=50,
    domains=["coding", "math"],
)

# Check cost before running
estimate = client.estimate_evaluation_cost(
    ["provider/model-a", "provider/model-b"],
    max_tasks=25,
)
print(f"Cost: {estimate.total_credits} credits (balance: {estimate.balance})")
print(f"Sufficient: {estimate.sufficient}")
for m in estimate.per_model:
    print(f"  {m.model_id}: {m.credits} credits, {m.task_count} tasks")

Bring Your Own Model

Evaluate any OpenAI-compatible endpoint:

run = client.create_run(
    "internal-llm-v3",
    base_url="https://my-company.example.com/v1",
    api_key="sk-model-key",
)

Your model credentials are used only during the evaluation and are never stored. Training-data exports anonymize model identities by default using labels such as target_model and reference_A.

Client Methods

Models

Method	Description
`list_models(search, provider, limit)`	Browse available models with live pricing

Evaluations

Method	Description
`create_evaluation(models, ...)`	Compare multiple models in one evaluation
`estimate_evaluation_cost(models, ...)`	Estimate credit cost before running
`suggest_scope(instructions)`	AI-generated scope suggestions from a description

Runs

Method	Description
`create_run(model_name, ...)`	Submit a single model for evaluation
`get_run(run_id)`	Get run status and summary
`list_runs(status, limit, offset)`	List your evaluation runs (single page)
`iter_runs(status, page_size)`	Auto-paginating iterator over all runs
`wait_for_completion(run_id, ...)`	Block until run completes or fails
`cancel_run(run_id)`	Cancel a queued or running evaluation
`retry_run(run_id)`	Retry a failed run, reusing completed results
`resume_run(run_id, ...)`	Resume a failed run with optional new credentials
`delete_run(run_id)`	Delete a run

Results & Insights

Method	Description
`get_gaps(run_id)`	Get capability gaps from a completed run
`get_artifacts(run_id, ...)`	Get generated artifacts (single page)
`iter_artifacts(run_id, ...)`	Auto-paginating iterator over all artifacts
`get_insights(run_id)`	Get model rankings, J1/J2/J3 metrics, and analytics
`request_review(run_id, gap_ids)`	Request human review for specific gaps
`forge(run_id)`	Generate new tasks targeting run gaps
`export_run(run_id, format, path)`	Export training data or reports

Cross-Run Analytics

Method	Description
`get_leaderboard()`	Cross-run model leaderboard
`get_domain_leaderboard()`	Per-domain model scores across runs
`get_cost_analysis()`	Cost-efficiency rankings with live pricing
`get_precomputed_insights()`	Per-domain best-model recommendations

Tasks

Method	Description
`get_task(task_id)`	Get details for a specific task
`create_task(task)`	Create a single custom evaluation task
`upload_custom_tasks(tasks)`	Batch upload custom evaluation tasks
`get_task_upload_limits()`	Get max file size and task count per batch
`classify_tasks(tasks)`	Auto-classify tasks by domain and capability
`list_tasks(...)`	List available tasks (single page)
`iter_tasks(...)`	Auto-paginating iterator over all tasks
`delete_task(task_id)`	Delete a custom task

API Keys

Method	Description
`list_api_keys()`	List your API keys
`create_api_key(label)`	Create a new API key
`revoke_api_key(key_id)`	Revoke an API key

Billing

Method	Description
`get_credit_balance()`	Get current credit balance
`get_credit_ledger(...)`	Get credit transaction history
`get_usage(period)`	Get monthly usage summary

Export Formats

Format	Use Case
`dpo`	Direct Preference Optimization (chosen/rejected pairs)
`quality_dpo`	DPO pairs enriched with quality scores and feedback
`sft`	Supervised Fine-Tuning (prompt/completion pairs)
`kto`	Kahneman-Tversky Optimization (binary desirability)
`grpo`	Group Relative Policy Optimization (grouped completions)
`sharegpt`	Multi-turn conversation format
`jsonl`	Raw artifacts as NDJSON
`report`	Human-readable evaluation report
`yaml`	YAML configuration for reproduction
`pytest`	Pytest test cases from capability gaps

Training data exports use anonymized model labels (e.g. target_model, reference_A) rather than real model identifiers. Chosen/reference answers are verified gold answers, not raw model outputs. Evaluation prompts are included for enterprise accounts; standard accounts receive task ID references.

Automatic Retries

The SDK automatically retries on rate-limit (429), transient server errors (500, 502, 503, 504), and transient network failures with exponential backoff and jitter. For 429 responses, the Retry-After header is respected when valid.

# Default: 3 retries with 1s base backoff
client = Epsilab(api_key="sk-...")

# Customize retry behaviour
client = Epsilab(api_key="sk-...", max_retries=5, backoff_base=2.0)

# Disable retries entirely
client = Epsilab(api_key="sk-...", max_retries=0)

Pagination

List endpoints return a single page by default. Use the iter_* methods to auto-paginate:

# Iterate over all runs without manual offset management
for run in client.iter_runs(status="completed"):
    print(run.run_id, run.gap_count)

# Same for artifacts and tasks
for artifact in client.iter_artifacts(run_id):
    print(artifact.artifact_type)

for task in client.iter_tasks(domain="coding"):
    print(task["task_id"])

Error Handling

from epsilab import Epsilab, AuthError, InsufficientCreditsError, RateLimitError, ApiError

client = Epsilab(api_key="sk-...")

try:
    eval_result = client.create_evaluation(["provider/model-a", "provider/model-b"])
except AuthError:
    print("Invalid API key")
except InsufficientCreditsError as e:
    print(f"Not enough credits: {e}")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except ApiError as e:
    print(f"API error: {e.status_code}")

Examples

See examples/example.py for a complete workflow.

License

Apache 2.0 — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

epsilab

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

May 28, 2026

0.3.0

May 19, 2026

0.1.3

Sep 10, 2025

0.1.2

Sep 10, 2025

0.1.1

Sep 10, 2025

0.1.0

Sep 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epsilab-0.4.0.tar.gz (21.9 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

epsilab-0.4.0-py3-none-any.whl (34.8 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file epsilab-0.4.0.tar.gz.

File metadata

Download URL: epsilab-0.4.0.tar.gz
Upload date: May 28, 2026
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epsilab-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`3bc47a84fd371cee2d9743d9ab7bef4a4f06918d7fa52b98937a95f12cefe72f`
MD5	`dc4b2c3d58d781dca9fd663024b6ad94`
BLAKE2b-256	`01b35cb4adbff8a2d5e0a9e908f6fd7eb0d411cd48ebc783dfcfd8eea1adc234`

See more details on using hashes here.

Provenance

The following attestation bundles were made for epsilab-0.4.0.tar.gz:

Publisher: workflow.yml on EpsilabAI/epsilab-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: epsilab-0.4.0.tar.gz
- Subject digest: 3bc47a84fd371cee2d9743d9ab7bef4a4f06918d7fa52b98937a95f12cefe72f
- Sigstore transparency entry: 1652597568
- Sigstore integration time: May 28, 2026
Source repository:
- Permalink: EpsilabAI/epsilab-python@65fe67d4e939ecdd29cabe5262859c119f2ab388
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/EpsilabAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@65fe67d4e939ecdd29cabe5262859c119f2ab388
- Trigger Event: push

File details

Details for the file epsilab-0.4.0-py3-none-any.whl.

File metadata

Download URL: epsilab-0.4.0-py3-none-any.whl
Upload date: May 28, 2026
Size: 34.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epsilab-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c94f884cb2e4d2c5579c30a021c610cffe17da08fcd3e81aeb7166eda72c3e34`
MD5	`a4801e19145ea8c8a3de7810b83375ec`
BLAKE2b-256	`246aa4644f61f6e8c14ca245cdc55587c61262cff8ab74489f58d07a52a23dd6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for epsilab-0.4.0-py3-none-any.whl:

Publisher: workflow.yml on EpsilabAI/epsilab-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: epsilab-0.4.0-py3-none-any.whl
- Subject digest: c94f884cb2e4d2c5579c30a021c610cffe17da08fcd3e81aeb7166eda72c3e34
- Sigstore transparency entry: 1652597573
- Sigstore integration time: May 28, 2026
Source repository:
- Permalink: EpsilabAI/epsilab-python@65fe67d4e939ecdd29cabe5262859c119f2ab388
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/EpsilabAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yml@65fe67d4e939ecdd29cabe5262859c119f2ab388
- Trigger Event: push

epsilab 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Epsilab Python SDK

What is Epsilab?

Installation

Quick Start

Configuration

Multi-Model Evaluations

Bring Your Own Model

Client Methods

Models

Evaluations

Runs

Results & Insights

Cross-Run Analytics

Tasks

API Keys

Billing

Export Formats

Automatic Retries

Pagination

Error Handling

Examples

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance