Local Codex plugin for iterative Agent tuning with guided Skills, reusable runner templates, versioned results, and static validation.

These details have not been verified by PyPI

Project links

Project description

Agent Tune Kit

English | 简体中文

Agent Tune Kit is a local Codex plugin for evaluating and tuning your own local Agent.

If you already have a working Agent but do not know where it fails, why it fails, or what to change next, Agent Tune Kit helps you run the full loop: batch test the Agent, find failure cases, generate a report, let Codex tune the Agent, and verify the next run.

Architecture

Agent Tune Kit architecture

Who It Is For

Use it if you have:

A local Agent, chatbot, tool-using Agent, or RAG Agent.
A small evaluation dataset, preferably CSV; 5 to 20 rows are enough to start.
Inputs, expected answers, or human-checkable results.
A desire to let Codex help locate weak spots and tune prompts, code, parameters, or tool configuration.

Prerequisites

You need:

A local Agent project that Codex can inspect and edit.
An evaluation dataset, preferably in CSV format. Column names do not need to follow a strict schema; Codex will infer inputs and expected results where possible.

Install

One-command install:

uvx --from agent-tune-kit atk install

To keep the atk command available:

uv tool install agent-tune-kit
atk install

Or use pipx:

pipx install agent-tune-kit
atk install

After installation, open the plugin list in Codex:

/plugins

Select and enable Agent Tune Kit. If $atk-status and other completions do not appear immediately after enabling, restart Codex or reopen the current project session.

Minimal Tuning Loop

Run these commands in your Agent project, not in this repository.

1. Initialize

Tell Codex where your Agent starts and where the evaluation data lives:

$atk-init My Agent entrypoint is scripts/agent.py and the evaluation dataset is data/eval.csv

Codex generates:

.atk/runner/eval_runner.py

2. Run Evaluation

$atk-run

Results are written to:

.atk/results/v1/eval_results.csv

3. Find Failures

Let Codex judge which rows failed:

$atk-find-failures

If you already have a clear rule, create the rule script first and then apply it:

$atk-init-failure-rule rule: mark a row as failed when expected differs from agent_output
$atk-find-failures-by-rule

Failure cases are written to:

.atk/results/v1/failure_cases.csv

4. Generate Report

$atk-report

The report is written to:

.atk/results/v1/report.md

It summarizes results, failure cases, likely causes, and recommended tuning priorities.

5. Optional: Browse Failures

$atk-visualize-failures

This creates a local HTML page:

.atk/results/v1/failure_cases.html

Use it to search, filter, and manually review failure cases.

6. Let Codex Tune the Agent

$atk-tune

Codex edits your Agent based on the report and records the tuning plan:

.atk/results/v1/tuning_plan.md

Verify Improvement

After tuning, run another loop:

$atk-run
$atk-find-failures
$atk-report

New results are written to .atk/results/v2/. Starting with the second loop, the report compares against the previous tuning_plan.md and tells you whether the target issues were resolved, partially resolved, unresolved, or impossible to judge.

Output Structure

.atk/
├── datasets/
│   └── original.csv
├── runner/
│   ├── eval_runner.py
│   └── failure_rule.py
└── results/
    ├── v1/
    │   ├── eval_results.csv
    │   ├── failure_cases.csv
    │   ├── failure_cases.html
    │   ├── report.md
    │   └── tuning_plan.md
    └── v2/
        └── ...

Common output files:

eval_results.csv: actual Agent output for each row.
failure_cases.csv: rows selected as failures.
failure_cases.html: optional failure review page.
report.md: analysis and tuning recommendations.
tuning_plan.md: what Codex changed and why.

Common Skills

$atk-status: inspect progress and suggest the next step.
$atk-init: generate the test runner.
$atk-run: run evaluation and create a new result version.
$atk-find-failures: let Codex identify failure cases.
$atk-init-failure-rule: create or update the failure rule.
$atk-find-failures-by-rule: apply the rule to identify failures.
$atk-report: generate analysis and cross-loop validation.
$atk-visualize-failures: generate the failure review HTML page.
$atk-tune: tune the Agent based on the report.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.2

Jun 3, 2026

0.4.1

Jun 3, 2026

0.4.0

Jun 1, 2026

0.3.9

May 29, 2026

This version

0.3.8

May 27, 2026

0.3.7

May 26, 2026

0.3.6

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_tune_kit-0.3.8.tar.gz (2.9 MB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_tune_kit-0.3.8-py3-none-any.whl (2.9 MB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file agent_tune_kit-0.3.8.tar.gz.

File metadata

Download URL: agent_tune_kit-0.3.8.tar.gz
Upload date: May 27, 2026
Size: 2.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_tune_kit-0.3.8.tar.gz
Algorithm	Hash digest
SHA256	`bf2512a7d96c76fa3300b5c3da171f75a2c3b57cbfe4a235fe896911980740a9`
MD5	`a046364c81dfae6199782bd42389ce2f`
BLAKE2b-256	`752533135bb3a6929a14a5aa6ddde88bf151eaccae949da847dacecb645a695c`

See more details on using hashes here.

File details

Details for the file agent_tune_kit-0.3.8-py3-none-any.whl.

File metadata

Download URL: agent_tune_kit-0.3.8-py3-none-any.whl
Upload date: May 27, 2026
Size: 2.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_tune_kit-0.3.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`39f3aeb3c4ba81c7ca9c2584dd28384fa652002087d8ce7c7056a55e25326c2b`
MD5	`884949146bbb704bfc3e9ef2f6265ae4`
BLAKE2b-256	`016152d52130038380577a453dc63953d866838e30ead1f852b1745d5daebdb7`

See more details on using hashes here.

agent-tune-kit 0.3.8

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Agent Tune Kit

Architecture

Who It Is For

Prerequisites

Install

Minimal Tuning Loop

1. Initialize

2. Run Evaluation

3. Find Failures

4. Generate Report

5. Optional: Browse Failures

6. Let Codex Tune the Agent

Verify Improvement

Output Structure

Common Skills

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes