Skip to main content

Retune your AI agents — self-improving evaluation & optimization framework

Project description

retune

Retune your AI agents -- self-improving evaluation & optimization framework.

Make any LLM agent, RAG pipeline, or workflow self-improving in production.

PyPI version Python 3.10+ License: MIT Tests

Website | Documentation | GitHub


What is retune?

retune is a framework-agnostic SDK that wraps any LLM-based system and makes it self-improving through automated observation, evaluation, and optimization. It works with LangChain, LangGraph, or any custom pipeline.

Installation

pip install retune
Extra Command What you get
LangChain pip install retune[langchain] LangChain adapter
LangGraph pip install retune[langgraph] LangGraph adapter
LLM Judge pip install retune[llm] OpenAI-powered evaluator
Anthropic pip install retune[anthropic] Claude models
Google pip install retune[google] Gemini models
Ollama pip install retune[ollama] Local models
Server pip install retune[server] FastAPI dashboard
Everything pip install retune[all] All of the above
Dev pip install retune[dev] pytest, ruff, mypy

Quickstart

from retune import Retuner, Mode

# Your existing agent (any callable)
def my_agent(query: str) -> str:
    return call_llm(query)

# Wrap it
retuner = Retuner(
    agent=my_agent,
    adapter="custom",
    mode=Mode.OBSERVE,
)

# Use it -- same interface, now with traces
response = retuner.run("What is machine learning?")
print(response.output)

Framework Adapters

Custom (any callable)

from retune import Retuner, Mode

retuner = Retuner(agent=my_fn, adapter="custom", mode=Mode.OBSERVE)
response = retuner.run("Hello")

LangChain

from retune import Retuner, Mode

chain = prompt | llm | parser  # your LangChain chain

retuner = Retuner(
    agent=chain,
    adapter="langchain",
    mode=Mode.EVALUATE,
    evaluators=["llm_judge", "retrieval", "latency"],
)

response = retuner.run("What is RAG?")
print(response.eval_results)  # quality scores

LangGraph

from retune import Retuner, Mode

compiled_graph = graph.compile()

retuner = Retuner(
    agent=compiled_graph,
    adapter="langgraph",
    mode=Mode.OBSERVE,
)

response = retuner.run("Plan a trip to Paris")
for step in response.trace.steps:
    print(f"[{step.step_type}] {step.name}")

The Fan Regulator Model

Control how much post-processing happens after each run:

Mode What it does Overhead
OFF Pure passthrough None
OBSERVE Capture execution traces Low
EVALUATE + Run evaluators, score quality Medium
IMPROVE + Generate optimization suggestions High
retuner.set_mode(Mode.IMPROVE)  # Turn up when tuning
# ... run evaluation dataset ...
retuner.set_mode(Mode.OFF)      # Turn off for production

Accept/Reject Optimization Flow

When in IMPROVE mode, retune generates suggestions that you can review:

response = retuner.run("test query")
for suggestion in response.suggestions:
    print(f"{suggestion.param_name}: {suggestion.old_value} -> {suggestion.new_value}")
    # Accept or reject each suggestion
    retuner.accept_suggestion(suggestion)
    # or: retuner.reject_suggestion(suggestion)

Beam Search APO (Automatic Prompt Optimization)

retune includes a beam-search-based prompt optimizer that iteratively rewrites your system prompt:

from retune import Retuner, BeamSearchConfig

config = BeamSearchConfig(
    beam_width=2,         # candidates kept per round
    branch_factor=2,      # rewrites per candidate
    rounds=3,             # search depth
    max_rollout_queries=5,
)

retuner = Retuner(agent=my_agent, adapter="custom", mode=Mode.IMPROVE)
best_prompt = retuner.optimize_prompt(
    initial_prompt="You are a helpful assistant.",
    eval_dataset=dataset,
    beam_config=config,
)

Multi-Provider LLM Support

retune works with any LLM provider through LangChain integrations:

from retune import set_default_llm

# OpenAI (default)
set_default_llm("openai", model="gpt-4o")

# Anthropic
set_default_llm("anthropic", model="claude-sonnet-4-20250514")

# Google
set_default_llm("google", model="gemini-pro")

# Local via Ollama
set_default_llm("ollama", model="llama3")

Evaluators

Evaluator What it scores Install
llm_judge Correctness, completeness, relevance retune[llm]
retrieval Document retrieval quality Built-in
latency Execution speed Built-in
cost Token usage efficiency Built-in

Self-Improvement Loop

# 1. Collect traces
retuner.set_mode(Mode.OBSERVE)
for query in queries:
    retuner.run(query)

# 2. Evaluate
retuner.set_mode(Mode.EVALUATE)
summary = retuner.get_eval_summary()

# 3. Get improvement suggestions
retuner.set_mode(Mode.IMPROVE)
response = retuner.run("test query")
for s in response.suggestions:
    print(f"{s.param_name}: {s.old_value} -> {s.new_value}")

# 4. Apply and run in production
retuner.set_mode(Mode.OFF)

Architecture

Your Agent/RAG (LangGraph, LangChain, custom)
        |
   Adapter Layer  -- framework-specific -> universal trace
        |
   Execution Trace  -- standard format for all frameworks
        |
   Evaluation Engine  -- modular scorers (LLM judge, retrieval, latency, cost)
        |
   Optimization Engine  -- beam search APO, rule-based suggestions
        |
   Accept/Reject  -- human-in-the-loop or auto-apply
        |
   Improved Config  -- better prompts, parameters, retrieval

Environment Variables

# LLM provider keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# retune config (all optional, sensible defaults)
RETUNE_STORAGE_PATH=./retune.db
RETUNE_DEFAULT_MODE=observe
RETUNE_LOG_LEVEL=INFO
RETUNE_EVAL_LLM_MODEL=gpt-4o-mini

License

MIT -- see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retune-0.1.0.tar.gz (95.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

retune-0.1.0-py3-none-any.whl (85.6 kB view details)

Uploaded Python 3

File details

Details for the file retune-0.1.0.tar.gz.

File metadata

  • Download URL: retune-0.1.0.tar.gz
  • Upload date:
  • Size: 95.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for retune-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c32dac0daae2de4fda0114ddec6dabff4e7812a87ab6c68ba54383415631914e
MD5 340d640a2f3f10733f6b4246112cbf17
BLAKE2b-256 f1eca51202eaad2428efc286e97f56daf53cce88c62fc457a1b4ba7e197a4dea

See more details on using hashes here.

Provenance

The following attestation bundles were made for retune-0.1.0.tar.gz:

Publisher: publish.yml on agentretune/retune

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file retune-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: retune-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 85.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for retune-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce362c09f90939ed3cb1f97d5faf77b820ca8a63688577a3f9248ee4be2a75ae
MD5 09f3ac122065d334d34b78ef7ae23300
BLAKE2b-256 0e9728dfb0555ec082c73afeccc04f604750986e1f1d1cb5a3e496f0af5e212c

See more details on using hashes here.

Provenance

The following attestation bundles were made for retune-0.1.0-py3-none-any.whl:

Publisher: publish.yml on agentretune/retune

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page