Autonomous prompt optimization agents — autoresearch loop built on shonku
Project description
Autoresearcher Shonku
Autonomous prompt optimization agents built on shonku.
Implements Karpathy's autoresearch pattern for prompts: propose an improvement, shadow-test it, measure, keep or discard, repeat.
Install
pip install autoresearcher-shonku
How it works
1. ANALYZE -- read prompt metrics and sample interactions
2. PROPOSE -- LLM generates an improved prompt version
3. VALIDATE -- safety rails check (similarity, length, template vars)
4. DEPLOY -- create experiment at low traffic weight
5. EVALUATE -- collect metrics on the new version
6. DECIDE -- keep if improved, discard if not
7. REPEAT
Agents
| Agent | Role |
|---|---|
PromptAnalyzerAgent |
Analyzes metrics to identify weaknesses |
PromptOptimizerAgent |
Proposes improved prompt versions |
ExperimentManagerAgent |
Manages A/B experiment lifecycle |
AutoResearcherAgent |
Orchestrates the full loop |
Usage
The autoresearcher does NOT own your data. You pass tools that wrap your storage. This works with any backend, not just autoresearch-prompt-manager.
Example: optimize email subject lines stored in a CSV
import csv
from autoresearcher_shonku import AutoResearcherAgent
from shonku import LLMConfig
from shonku.types import ToolSpec
# Your data lives wherever you want. Wrap access as tools.
subjects = {"welcome": {"body": "Welcome to our service", "version": 1}}
metrics = [{"quality": 5.2}, {"quality": 4.8}, {"quality": 6.0}]
def get_prompt(slug: str) -> str:
import json
s = subjects.get(slug, {})
return json.dumps({"slug": slug, **s})
def get_metrics(prompt_id: str, version_id: str, metric_name: str = "quality") -> str:
import json
vals = [m.get(metric_name, 0) for m in metrics]
return json.dumps({"count": len(vals), "mean": sum(vals)/len(vals)})
def get_sample_interactions(prompt_id: str, limit: str = "3") -> str:
return '[{"feedback": "too generic"}, {"feedback": "boring"}]'
def create_version(slug: str, content: str) -> str:
import json
subjects[slug] = {"body": content, "version": subjects.get(slug, {}).get("version", 0) + 1}
return json.dumps({"version": subjects[slug]["version"]})
def create_experiment(prompt_id: str, baseline_version_id: str, new_version_id: str, weight: str = "10") -> str:
return '{"experiment_id": "exp-1", "status": "running"}'
def conclude_experiment(experiment_id: str) -> str:
return '{"status": "concluded"}'
tools = [
ToolSpec(name="get_prompt", description="Get prompt by slug", callable=get_prompt),
ToolSpec(name="get_metrics", description="Get metrics", callable=get_metrics),
ToolSpec(name="get_sample_interactions", description="Get samples", callable=get_sample_interactions),
ToolSpec(name="create_version", description="Create new version", callable=create_version),
ToolSpec(name="create_experiment", description="Create experiment", callable=create_experiment),
ToolSpec(name="conclude_experiment", description="Conclude experiment", callable=conclude_experiment),
]
agent = AutoResearcherAgent()
result = await agent.run(
input="Optimize 'welcome' subject line. Quality is 5.3/10, target 7.0+.",
llm_config=LLMConfig(provider="groq", model="openai/gpt-oss-120b", api_key="..."),
tools=tools,
)
print(subjects["welcome"]["body"]) # improved version
With autoresearch-prompt-manager
When used with the full prompt-manager stack, the tools wrap the API instead of local data:
arpm-api up && arpm-api start # start the API
arpm-example loop # run the optimization loop
## Safety rails
The `AutoResearcherAgent` includes a built-in `check_safety_rails` tool that validates:
- Similarity to original (>= 30%)
- Non-empty content (> 10 chars)
- Within iteration budget
- Reasonable length (30%-300% of original)
## Configuration
### LLM settings
The autoresearcher receives LLM config at runtime. When used with autoresearch-prompt-manager, set:
```bash
export PM_LLM_PROVIDER=groq # or: anthropic, openai, gemini, openrouter
export PM_LLM_MODEL=openai/gpt-oss-120b # model ID
export PM_LLM_API_KEY=your-api-key # provider API key
Optimization settings
from autoresearcher_shonku import AutoResearcherConfig
config = AutoResearcherConfig(
max_iterations=10,
improvement_threshold=0.01,
max_edit_distance=0.5,
canary_weight=5.0,
rollback_on_regression=True,
)
Acknowledgements
- Optimization loop inspired by Karpathy's autoresearch
- Agent execution powered by agno and AgentOS
Part of autoresearch-prompt-manager
autoresearch-prompt-manager (prompt CRUD, experiments, metrics)
-> autoresearcher-shonku (this package -- optimization agents)
-> shonku (agent framework)
-> agno (runtime -- https://agno.com)
Install via the parent package: pip install autoresearch-prompt-manager[autoresearcher]
Contributing
- Fork autoresearch-prompt-manager
cd packages/autoresearcher_shonku && pip install -e '.[dev]'- Make changes,
pytest,ruff check src/ - Submit a PR
To add new optimization strategies, create a new agent in agents/ following the ShonkuAgent pattern.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autoresearcher_shonku-0.1.2.tar.gz.
File metadata
- Download URL: autoresearcher_shonku-0.1.2.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e7bcb4d3b2e63947e1f9eb3d9e0facc580bc7a9a73532c51d76f336f6fc8e0e
|
|
| MD5 |
f7fbfbcf4a46a0818b070ef6c431322d
|
|
| BLAKE2b-256 |
5543d5292cc2ea5a15e2b9f2681e54cbadda34745f09ed49f054bf17d93ac624
|
File details
Details for the file autoresearcher_shonku-0.1.2-py3-none-any.whl.
File metadata
- Download URL: autoresearcher_shonku-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4f32e786f37aebcabd9ca4175cec47cf1b6295cc83cedcb61a15d565cbedb46
|
|
| MD5 |
db11437b46aabff296aa6fd512ceda24
|
|
| BLAKE2b-256 |
86144f4f6d4d82d2416a5e6dab72aad7cf9a2655e0f327e871ed5647cb5d0155
|