Overmind — autonomous agent optimization through structured experimentation

These details have not been verified by PyPI

Project links

Project description

Overmind

An open-source optimizer for LLM agents. Point Overmind at your existing Python agent, give it a policy and a few test cases, and it iteratively rewrites prompts, tool descriptions, model choices, and pipeline logic to improve measured performance.

Documentation: Overmind guide

Overmind: Overmind Console

What it does

Overmind runs your agent against a test dataset, traces every LLM call and tool invocation, scores the outputs, and uses a strong reasoning model to generate concrete improvements. Changes that raise the score are kept; the rest are reverted. After several iterations you get a measurably better agent without manual prompt tweaking.

What makes Overmind different is policy-driven optimization. It goes beyond tracing by building deep context about your codebase and the behavior your agent is expected to follow. You define the decision rules, constraints, and expectations your agent must follow, and those policies guide every stage: evaluation criteria, test data synthesis, optimization diagnosis, and scoring.

What gets optimized

System prompts — more precise instructions, output format enforcement
Tool descriptions — clearer parameters, better usage guidance
Model selection — find the right quality/cost tradeoff
Agent logic — tool-call ordering, iteration limits, output parsing
Policy compliance — alignment with your domain rules and constraints

Skills Quickstart

Use these from Cursor, Codex, or Claude Code.

The skills are the recommended way to use Overmind because they keep the workflow inside your normal coding environment: they scaffold the entrypoint, check the repo, bootstrap .overmind/.env, generate the policy/spec/dataset in the right order, and run optimization.

Requirements: Python 3.10+, uv or pipx, and API keys for at least one LLM provider.

uv tool install overmind
# or
pipx install overmind

cd your-agent-project/
overmind init

This creates .overmind/ in your project root and prompts for API keys and default models. Safe to re-run anytime.

Available skills

Skill	Path	What it does
`/overmind-register-agent`	`overmind/skills/overmind-register-agent/SKILL.md`	Creates or verifies the entrypoint harness, registers the agent, smoke-tests invocation, and bootstraps `.overmind/.env`.
`/overmind-generate-spec-and-dataset`	`overmind/skills/overmind-generate-spec-and-dataset/SKILL.md`	Generates `policies.md`, `eval_spec.json`, and `dataset.json` in one ordered pass so schemas stay aligned.
`/overmind-optimize-agent`	`overmind/skills/overmind-optimize-agent/SKILL.md`	Runs the optimization loop from your coding environment, either via the CLI or host-driven `optimize-step`.

How to use the skills

Run the skills from your coding-agent chat in this order:

/overmind-register-agent path/to/your/agent.py
/overmind-generate-spec-and-dataset <agent-name>
/overmind-optimize-agent <agent-name>

overmind init is the only terminal step. It creates .overmind/.env, configures provider keys, and sets default models.

/overmind-register-agent inspects your repo, creates a thin entrypoint harness if needed, registers the agent, and runs smoke tests to confirm Overmind can invoke it reliably.

/overmind-generate-spec-and-dataset generates the behavioral policy, evaluation spec, and dataset together. This keeps the policy, scoring fields, and expected outputs aligned.

/overmind-optimize-agent runs the full optimization loop end to end from your coding environment.

The optimization loop

How it Works

Register → Generate policy → Build dataset → Optimize → Review report

1. Initialize (`overmind init`)

Configure API keys and default models. Writes .overmind/.env in the current directory. Safe to re-run. This is the only terminal step — everything else runs through Agent Skills in your coding environment.

2. Register your agent (`/overmind-register-agent`)

Run /overmind-register-agent path/to/your/agent.py in your Cursor or Claude Code chat. The skill inspects your repo, creates a thin entrypoint harness if needed, registers the agent in .overmind/agents.toml, and runs smoke tests to confirm Overmind can invoke it reliably.

Your entrypoint function receives an input dict and must return a dict:

def run(input_data: dict) -> dict:
    return {"response": result}

For framework-based agents, create a small wrapper that exposes this dict → dict contract.

3. Generate policy, spec, and dataset (`/overmind-generate-spec-and-dataset`)

Run /overmind-generate-spec-and-dataset <agent-name> in chat. The skill generates the behavioral policy, evaluation spec, and dataset in one ordered pass so their schemas stay aligned:

Phase	What happens
Agent analysis	An LLM reads your agent code to detect the input/output schema, tools, and decision logic.
Policy generation	If you have an existing policy, the skill analyzes it against the code and suggests improvements. Otherwise a policy is inferred automatically. You can refine it in a conversational loop before confirming.
Dataset	Overmind uses your existing test data or generates diverse synthetic cases from the policy and agent description.
Evaluation criteria	Scoring rules are proposed for each output field. Policy constraints inform stricter scoring where relevant.

This produces two artifacts in .overmind/agents/<name>/setup_spec/:

eval_spec.json — machine-readable evaluation spec used at runtime
policies.md — human-readable policy document you maintain

Both are editable after generation. A preview is shown in chat before anything is saved.

4. Optimize (`/overmind-optimize-agent`)

Run /overmind-optimize-agent <agent-name> in chat. The skill drives the full optimization loop end to end. You can adjust settings before it starts or accept the defaults.

Setting	Description
Analyzer model	The strong model that diagnoses failures and generates code fixes.
LLM-as-Judge	Optional semantic scoring alongside mechanical matching.
Iterations	Number of optimize, evaluate, accept/revert rounds. Default: 5.
Candidates per iteration	How many variant fixes to generate per round. Each biases edits toward a different area, such as tool descriptions, core logic, input handling, or system prompt.
Parallel execution	Run agent evaluations across multiple workers.

What happens each iteration

Run the agent on every test case and collect traces and outputs.
Score outputs against the eval spec across weighted output fields.
Diagnose — the analyzer receives traces, scores, policy, and code. It identifies failure patterns and root causes.
Generate N candidate fixes, each targeting a different area of the code. If N≥3, the last candidate uses a separate diagnosis for diversity.
Validate — syntax checks, interface checks, and a smoke test on a small case subset.
Evaluate — surviving candidates are scored on the full dataset.
Accept or revert — the best candidate is kept only if it improves the score without regressing too many individual cases.

Advanced settings include regression thresholds, train/holdout splits to detect overfitting, early stopping patience, and diagnosis visibility controls.

Multi-file agents

By default Overmind optimizes the single registered entry file. For agents split across multiple modules, it automatically resolves local imports, extracts individual functions and classes, and applies targeted edits back to the original files so your project structure stays intact.

Agent policies

Policies are the foundation of meaningful optimization. They tell the optimizer what the agent should do, not just how it currently scores, preventing improvements that raise numbers but violate business rules.

A policies.md looks like this:

# Agent Policy: Lead Qualification

## Purpose
Qualifies inbound sales leads by analyzing company data and inquiry content.

## Decision Rules
1. If the inquiry mentions "enterprise" or "custom pricing", classify as hot
2. Companies with 500+ employees get a minimum lead score of 60

## Constraints
- Never disqualify without checking company size
- Score and category must be consistent (hot = 70+, warm = 40-69, cold = <40)

## Priority Order
1. Accuracy of category classification
2. Score calibration
3. Reasoning quality

## Edge Cases
| Scenario | Expected Behaviour |
|---|---|
| Missing company name | Default to cold, note in reasoning |
| Competitor inquiry | Classify as cold, recommend nurture |

## Quality Expectations
- Reasoning should reference specific data points from the input
- Scores should be calibrated: hot leads 70-100, warm 40-69, cold 0-39

Policies feed into diagnosis prompts, code generation constraints, synthetic data generation, and LLM-as-Judge scoring so every stage of the pipeline respects your domain rules.

Using your own data

Data files are JSON arrays where each element has an input and expected_output:

[
  {
    "input": { "company_name": "Acme Corp", "inquiry": "Need enterprise pricing" },
    "expected_output": { "category": "hot", "lead_score": 85 }
  }
]

Place data files in your agent directory under data/ and Overmind will detect them during setup. If you do not have data, Overmind generates realistic synthetic test cases using the policy and agent description.

Artifacts, traces, reports, and CLI reference

Output

After optimization, results are saved under .overmind/agents/<name>/:

Path	Description
`setup_spec/policies.md`	Agent policy document
`setup_spec/eval_spec.json`	Evaluation criteria with embedded policy
`setup_spec/dataset.json`	Test dataset used for optimization
`experiments/best_agent.py`	The highest-scoring agent version for single-file agents
`experiments/best_agent/`	All optimized files for multi-file agents
`experiments/results.tsv`	Score history for every iteration
`experiments/traces/`	Detailed JSON traces of every agent run
`experiments/report.md`	Summary report with scores and diffs

Other paths under .overmind/ do not all exist until you run the skills.

Path	Required?	Notes
`agents.toml`	Yes	Registry of agent names and `module:fn` entrypoints. Written by `/overmind-register-agent`.
`.env`	Optional	API keys and model defaults from `overmind init`.
`agents/<name>/instrumented/`	Regenerated	Full mirror of the project root minus skips like `.git` and `venv`. Put `.overmind` next to a small project root so this tree stays small.
`agents/<name>/run_state.json`	Written by optimize	Regression cases and run history across sessions.
`logs/overmind.log`	Auto	Rotating CLI log from `setup_logging`.
`agents/<name>/instrumented/.overmind_runners/`	Ephemeral	Generated subprocess wrappers such as `_run_agent.py`; removed when the runner calls `cleanup()`; safe to delete manually.

All provider keys and model defaults live in .overmind/.env. Per-agent .env files are not supported.

Bundle scope and caps

For large repositories, the optimizer resolves a bounded import closure, defaulting to 24 files and 60k characters, and skips common paths such as tests/, docs/, and .overmind/ using built-in rules plus optional .overmindignore and .gitignore.

After /overmind-generate-spec-and-dataset runs, eval_spec.json includes a scope block with two path lists, both relative to the project root:

optimizable_paths — files the optimizer may edit.
read_only_paths — files materialized into the bundle but enforced not-editable at accept time.

Project-level drops go in .overmindignore, not the spec. Inspect what will load without running an LLM:

overmind doctor my-agent

CLI reference

The only terminal command in the normal workflow is overmind init. The rest of the workflow runs through Agent Skills in Cursor or Claude Code.

overmind init                                        Configure API keys and models
overmind doctor <name>                               Diagnose bundle scope and eval spec (read-only)

Run overmind <command> --help for full flag documentation.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.50

May 20, 2026

0.1.49

May 18, 2026

0.1.48

May 14, 2026

0.1.47

May 14, 2026

0.1.46

May 13, 2026

0.1.45

May 6, 2026

0.1.40

Apr 30, 2026

0.1.39

Apr 28, 2026

0.1.31

Mar 6, 2026

0.1.27

Feb 26, 2026

0.1.26

Feb 23, 2026

0.1.25

Feb 17, 2026

0.1.24

Feb 15, 2026

0.1.23

Feb 14, 2026

0.1.22

Feb 11, 2026

0.1.21

Feb 4, 2026

0.1.20

Feb 4, 2026

0.1.19

Jan 27, 2026

0.1.18

Dec 18, 2025

0.1.17

Dec 14, 2025

0.1.16

Dec 14, 2025

0.1.15

Oct 7, 2025

0.1.14

Aug 27, 2025

0.1.13

Aug 27, 2025

0.1.12

Aug 25, 2025

0.1.11

Aug 21, 2025

0.1.10

Aug 14, 2025

0.1.9

Aug 12, 2025

0.1.8

Aug 5, 2025

0.1.7

Jul 11, 2025

0.1.6

Jul 10, 2025

0.1.5

Jul 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

overmind-0.1.50.tar.gz (764.4 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

overmind-0.1.50-py3-none-any.whl (581.5 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file overmind-0.1.50.tar.gz.

File metadata

Download URL: overmind-0.1.50.tar.gz
Upload date: May 20, 2026
Size: 764.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for overmind-0.1.50.tar.gz
Algorithm	Hash digest
SHA256	`97ae01bbd37483c95a0b496c045b4ec5650139de1876aa7439333476b043a5b3`
MD5	`496b3d2a23a3dfb985391cadb7de4189`
BLAKE2b-256	`010fc9ba1d756a5ea9687c0cc6548c59a41013374c6639bb87c5fb88a4f9b358`

See more details on using hashes here.

Provenance

The following attestation bundles were made for overmind-0.1.50.tar.gz:

Publisher: publish.yml on overmind-core/overmind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: overmind-0.1.50.tar.gz
- Subject digest: 97ae01bbd37483c95a0b496c045b4ec5650139de1876aa7439333476b043a5b3
- Sigstore transparency entry: 1581516239
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: overmind-core/overmind@6ec94295bbb5fa532f2d455721f761d6a9d46d1b
- Branch / Tag: refs/heads/main
- Owner: https://github.com/overmind-core
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6ec94295bbb5fa532f2d455721f761d6a9d46d1b
- Trigger Event: push

File details

Details for the file overmind-0.1.50-py3-none-any.whl.

File metadata

Download URL: overmind-0.1.50-py3-none-any.whl
Upload date: May 20, 2026
Size: 581.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for overmind-0.1.50-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c0043c90cdbeaf6a626fe10115e4f90dec601b47a160a2e26eb86284f38390a`
MD5	`010e8d24a54845f47cd7f91e57385aff`
BLAKE2b-256	`48d082b5ce1cfbf2976caeecb3688fa821efd8e4fd3d45d2161c333551e4c97f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for overmind-0.1.50-py3-none-any.whl:

Publisher: publish.yml on overmind-core/overmind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: overmind-0.1.50-py3-none-any.whl
- Subject digest: 7c0043c90cdbeaf6a626fe10115e4f90dec601b47a160a2e26eb86284f38390a
- Sigstore transparency entry: 1581516412
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: overmind-core/overmind@6ec94295bbb5fa532f2d455721f761d6a9d46d1b
- Branch / Tag: refs/heads/main
- Owner: https://github.com/overmind-core
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6ec94295bbb5fa532f2d455721f761d6a9d46d1b
- Trigger Event: push

overmind 0.1.50

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overmind

What it does

What gets optimized

Skills Quickstart

Available skills

How to use the skills

How it Works

1. Initialize (overmind init)

2. Register your agent (/overmind-register-agent)

3. Generate policy, spec, and dataset (/overmind-generate-spec-and-dataset)

4. Optimize (/overmind-optimize-agent)

What happens each iteration

Multi-file agents

Agent policies

Using your own data

Output

Bundle scope and caps

CLI reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

1. Initialize (`overmind init`)

2. Register your agent (`/overmind-register-agent`)

3. Generate policy, spec, and dataset (`/overmind-generate-spec-and-dataset`)

4. Optimize (`/overmind-optimize-agent`)