Skip to main content

Serverless Posttraining for Agents - Core AI functionality and tracing

Project description

Synth-AI SDK

Python License PyPI Coverage Tests Blacksmith CI

Synth-AI — Serverless Posttraining for Agents.
Docs: Get Started →


🚀 Install latest version (0.2.25.dev1)

pip install synth-ai
# or
uv add synth-ai

Import:

import synth_ai

CLI (with uvx):

uvx synth-ai setup
uvx synth-ai demo
uvx synth-ai deploy
uvx synth-ai run
uvx synth-ai baseline  # For coding agents: get baseline scores

Full quickstart: https://docs.usesynth.ai/sdk/get-started


When you run uvx synth-ai setup, the SDK opens your browser to the Synth dashboard for a one‑time pairing (handshake) with your signed‑in session. The SDK will automatically:

Fast and effective serverless posttraining for agents, via an API.
Easily scale GPU topologies, train multi-node, and integrate with existing agent software.

Highlights

  • Scale GPU topologies (A10Gs, H100s, multi-node available on request)
  • Thin FastAPI wrapper integration
  • Supports OSS models like Qwen3 (GPT-OSS GA soon)
  • Own your trained models

🧭 Examples & Cookbooks

The old examples/ directory now lives in the Synth Cookbooks repo.


⚙️ Getting Started

Synth-AI ships with a built-in RL example: training Qwen3-0.6B on math reasoning.

  1. Create accounts at Synth and Modal

  2. Then run:

    uvx synth-ai demo
    uvx synth-ai setup
    uvx synth-ai deploy
    uvx synth-ai run
    
  3. To walk through your first RL run, see
    👉 Synth-AI SDK Docs


🤖 For Coding Agents: Get Started with Baselines

Baselines are the fastest way for coding agents to evaluate changes and measure improvement on Synth tasks.

Why Use Baselines?

Baselines provide a self-contained evaluation system that:

  • No infrastructure required — runs locally, no deployed task app needed
  • Quick feedback loop — get task-by-task results in seconds
  • Compare changes — establish a baseline score before making modifications
  • Auto-discoverable — finds baseline files automatically in your codebase

Quick Start for Coding Agents

# 1. List available baselines
uvx synth-ai baseline list

# 2. Run a quick 3-task baseline to get started
uvx synth-ai baseline banking77 --split train --seeds 0,1,2

# 3. Get your baseline score (full train split)
uvx synth-ai baseline banking77 --split train

# 4. Make your changes to the code...

# 5. Re-run to compare performance
uvx synth-ai baseline banking77 --split train --output results_after.json

Available Baselines

# Filter by task type
uvx synth-ai baseline list --tag rl          # RL tasks
uvx synth-ai baseline list --tag nlp         # NLP tasks
uvx synth-ai baseline list --tag vision      # Vision tasks

# Run specific baselines
uvx synth-ai baseline warming_up_to_rl       # Crafter survival game
uvx synth-ai baseline pokemon_vl             # Pokemon Red (vision)
uvx synth-ai baseline gepa                   # Banking77 classification

Baseline Results

Each baseline run provides:

  • Task-by-task results — see exactly which seeds succeed/fail
  • Aggregate metrics — success rate, mean/std rewards, total tasks
  • Serializable output — save to JSON with --output results.json
  • Model comparison — test different models with --model

Example output:

============================================================
Baseline Evaluation: Banking77 Intent Classification
============================================================
Split(s): train
Tasks: 10
Success: 8/10
Execution time: 12.34s

Aggregate Metrics:
  mean_outcome_reward: 0.8000
  success_rate: 0.8000
  total_tasks: 10

Creating Custom Baselines

Coding agents can create new baseline files to test custom tasks:

# my_task_baseline.py
from synth_ai.baseline import BaselineConfig, BaselineTaskRunner, DataSplit, TaskResult

class MyTaskRunner(BaselineTaskRunner):
    async def run_task(self, seed: int) -> TaskResult:
        # Your task logic here
        return TaskResult(...)

my_baseline = BaselineConfig(
    baseline_id="my_task",
    name="My Custom Task",
    description="Evaluate my custom task",
    task_runner=MyTaskRunner,
    splits={
        "train": DataSplit(name="train", seeds=list(range(10))),
    },
)

Place this file in your project (for example under cookbooks/dev/baseline/) or name it *_baseline.py for auto-discovery. Official baseline examples now live in the Synth Cookbooks repo.


🔐 SDK → Dashboard Pairing

When you run uvx synth-ai setup (or legacy uvx synth-ai rl_demo setup):

  • The SDK opens your browser to the Synth dashboard to pair your SDK with your signed-in session.

  • Automatically detects your user + organization

  • Ensures both API keys exist

  • Writes them to your project’s .env as:

    SYNTH_API_KEY=
    ENVIRONMENT_API_KEY=
    

✅ No keys printed or requested interactively — all handled via browser pairing.

Environment overrides


🌍 Language-Agnostic: Build Task Apps in Any Language

Synth works with any programming language. You don't need Python to build Task Apps or run prompt optimization. Implement the OpenAPI contract in your preferred language and start optimizing.

Supported Languages

We provide complete, tested examples in the Synth Cookbooks repo (cookbooks/dev/polyglot/):

  • Rust - Fast, type-safe implementation with Axum
  • Go - Zero dependencies, single static binary
  • TypeScript - Works with Node.js, Deno, Bun, and Cloudflare Workers
  • Zig - Minimal binaries, trivial cross-compilation

👉 See all examples: Synth Cookbooks (polyglot) — see dev/polyglot/

How It Works

Task Apps implement a simple HTTP contract:

  • GET /health - Health check
  • POST /rollout - Evaluate prompts and return rewards
  • GET /task_info - (Optional) Dataset metadata

The optimizer calls your endpoints with candidate prompts, and you return rewards. That's it—no Python required!

📖 Full guide: Polyglot Task Apps Documentation
📋 OpenAPI Contract: synth_ai/contracts/task_app.yaml
🔧 CLI Access: synth contracts show task-app or synth contracts path task-app


🎯 Prompt Optimization

Automatically optimize prompts for classification, reasoning, and instruction-following tasks using evolutionary algorithms. Synth supports two state-of-the-art algorithms: GEPA (Genetic Evolution of Prompt Architectures) and MIPRO (Meta-Instruction PROposer).

References:

  • GEPA: Agrawal et al. (2025). "GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning." arXiv:2507.19457
  • MIPRO: Opsahl-Ong et al. (2024). "Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs." arXiv:2406.11695

How It Works

Prompt optimization uses an interceptor pattern that ensures optimized prompts never reach task apps. All prompt modifications happen in the backend via an inference interceptor that substitutes prompts before they reach the LLM.

✅ CORRECT FLOW:
Backend → register_prompt → Interceptor → substitutes → LLM

❌ WRONG FLOW:
Backend → prompt_template in payload → Task App (NEVER DO THIS)

Algorithms

GEPA (Genetic Evolution of Prompt Architectures)

  • Population-based evolutionary search
  • LLM-guided mutations for intelligent prompt modifications
  • Pareto optimization balancing performance and prompt length
  • Best for: Broad exploration, diverse prompt variants, classification tasks
  • Results: Improves accuracy from 60-75% (baseline) to 85-90%+ over 15 generations

MIPRO (Meta-Instruction PROposer)

  • Meta-LLM (e.g., GPT-4o-mini) generates instruction variants
  • TPE (Tree-structured Parzen Estimator) guides Bayesian search
  • Bootstrap phase collects few-shot examples from high-scoring seeds
  • Best for: Efficient optimization, task-specific improvements, faster convergence
  • Results: Achieves similar accuracy gains with fewer evaluations (~96 rollouts vs ~1000 for GEPA)

Quick Start

  1. Build a prompt evaluation task app

    # Task app evaluates prompt performance (classification accuracy, QA correctness, etc.)
    
  2. Create a prompt learning config

    [prompt_learning]
    algorithm = "gepa"  # or "mipro"
    task_app_url = "https://my-task-app.modal.run"
    
    [prompt_learning.initial_prompt]
    messages = [
      { role = "system", content = "You are a banking assistant..." },
      { role = "user", pattern = "Customer Query: {query}..." }
    ]
    
    [prompt_learning.gepa]
    initial_population_size = 20
    num_generations = 15
    
  3. Launch optimization

    uvx synth-ai train --type prompt_learning --config config.toml
    
  4. Query results

    from synth_ai.learning import get_prompt_text
    best_prompt = get_prompt_text(job_id="pl_abc123", rank=1)
    

Full documentation: Prompt Learning Guide →


📚 Documentation


🧠 Meta

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synth_ai-0.3.1.dev1.tar.gz (620.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synth_ai-0.3.1.dev1-py3-none-any.whl (753.7 kB view details)

Uploaded Python 3

File details

Details for the file synth_ai-0.3.1.dev1.tar.gz.

File metadata

  • Download URL: synth_ai-0.3.1.dev1.tar.gz
  • Upload date:
  • Size: 620.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for synth_ai-0.3.1.dev1.tar.gz
Algorithm Hash digest
SHA256 f8f3cc547f1dddb6131972e02003c05f1f91da019c071ecd0fd8518368acf6a1
MD5 84422447b4d898afc86c444e51233a93
BLAKE2b-256 8cfd56a44f564d37bdee0af120b66545f1c87f3e59e198b98fbbecf43426b1b5

See more details on using hashes here.

File details

Details for the file synth_ai-0.3.1.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for synth_ai-0.3.1.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 32ecf2715970f07c9262e9294c2552eb6bf95047ca60c4bef70282cbb6d19cba
MD5 a816c11c4af14b0e040e755f6aeb3157
BLAKE2b-256 7eb4da5b82baadd44eb63d2d44328f6f7bc9492101960f86acedee0227820e9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page