Optimize a software metric with Codex, git worktrees, and Docker

These details have not been verified by PyPI

Project links

Project description

codex-optimize

https://github.com/user-attachments/assets/7646dab7-d12a-4574-a493-9d130e9042e9

Optimize any software with the Codex SDK.

codopt clones your repository into a run directory, fans out candidate branches with git worktrees, runs one Codex agent per branch in its own Docker container, and evaluates each branch with a benchmark command plus a correctness test command. Surviving branches fork again in later rounds.

By default, codopt snapshots your current working tree into a disposable internal repo first, so local tracked edits are part of the optimization baseline even if they are not committed yet.

Why?

One appraoch to AI assisted software optimization is to just point it to some code and then tell it to optimize it. There are several problems with this:

Agents tend to cheat benchmarks, even unintentionally. One of the common behavior patterns when you tell an agent to maximize a value unconstrained is the agent will simply hack through the benchmarks and tests so produce a result that seems great but in closer inspection is not a substantive optimization.
Agents are non deterministic, so it can fail at the optimization one time and then the next time succeed even with the same prompt.
Agents can get lazy! This is very unintuitive but many times since it thinks that it has provided the answer, prompting "optimize" results in it concluding it is done. After it states that it is done, then since it being done is in its context it will just continue to believe this. In a sense, it has poisoned its own context.

codex-optimize attempt to sovle these problems:

codopt explicitly partions the source code, optimization tests, and correctness tests. since these parts are partioned and in git they can be reset to evaluate whether the source code changes were substantive while preventing the benchmark hacking behavior.
By running a beam search strategy, we can see a diverse variety of attempts and keep exploring the ones that work. The below example run shows a good example of this where some of the Codex agents actually degraded the quality of the optimization but the top candidates signficantly optimized the code.
By pruning nodes that are failing or stagnating, we can avoid context poisoning and get results over more iterations. This is also demonstrated in the example below were after some iterations some fail while some keep improving.

The core idea is to use the Codex SDK to optimize more deterministically than using Skills or prompting.

Quick Start

example/life contains a Conway's Game of Life challenge chosen to be optimizable but not one-shottable.

Install the CLI locally for testing:

uv tool install /path/to/codex-optimize

View the result of my run in the UI :

codopt ui --run-root example/life_result/run

Alternatively you can run it yourself.

Run:

codopt run \
  --edit example/life/life.py --metric example/life/metric.json --metric-key score --command "python3 example/life/benchmark.py" \
  --branch 3 --time 120 --info example/life/INFO.md --max-agents 6 --test "python3 example/life/tests.py" --docker-image codopt-life:latest --rounds 2

Read more about this run in the result's README.MD.

An alternative option to running the program yourself is asking your agent to use it! If this is your goal there is an optimize skill folder you can copy into ~/.codex/skills/optimize and restart Codex.

Here is a demo video of Codex using the codopt skill to generate a 33% optimization of token per second in LLM inference.

https://github.com/user-attachments/assets/f34ac402-c19c-4ced-9215-5ff9f2a0e889

Read more about that here or view the repo codopt created here.

CLI Flags

--edit: repeatable file or directory the agent may edit
--metric: metric file written by the benchmark command
--metric-key: JSON key to read when the metric file is JSON, default score
--lower-is-better: invert the parsed metric value for ranking
--command: benchmark command
--command-file: path to a shell snippet file executed with sh -eu; repo-local files run from the cloned repo path, external files are copied into the run root
--branch: children per surviving node
--time: per-node Codex time budget in seconds
--info / --info-file: background context file given to the agent, may be outside the repo
--info-text: inline background context for the agent
--max-agents: active-node cap used to decide survivor count
--test: correctness test command
--test-file: path to a shell snippet file executed with sh -eu; repo-local files run from the cloned repo path, external files are copied into the run root
--docker-image: optional prebuilt container image for agent and evaluation runs
--dockerfile: optional Dockerfile to build and use for agent and evaluation runs
--source-mode: working-tree (default) snapshots the current repo state; head uses Git HEAD only
--rounds: tournament depth
--allow-path: repeatable extra writable path
--keep-worktrees: keep worktree directories after completion

Metric Key

Your benchmark command does not need to match the Life example , but it does need to produce one metric file that codopt can parse:

if the metric file is plain text, it must contain a single numeric value
if the metric file is JSON, codopt reads one numeric field from it
by default that JSON field is score unless a metric-key flag is passed
by default higher values are treated as better unless the lower-is-better flag is passed

Requirements

Before running codopt, you need:

git
docker
uv
Python 3 on the host
an existing Codex login on the host in ~/.codex

Important setup notes:

run codopt from the root of the Git repo you want to optimize
Docker must be running
codopt seeds a run-local CODEX_HOME from your host ~/.codex, so you need to already be authenticated before starting
by default codopt auto-generates and builds a runtime image for the repo, with special handling for common project types like Python, Node, Rust, Go, Java, and Haskell
if you override with --docker-image or --dockerfile, the resulting image must contain python3, git, and uv
codopt removes the ephemeral images it builds itself after validate and run, so repeated runs do not keep piling up codopt-auto-* images

First-Run Pattern

For a new repo, prefer this sequence:

Wire a benchmark command, test command, and info text or info file.
Run codopt validate ....
If validation fails in the auto-generated image, only then add --dockerfile or --docker-image.
Once validation succeeds, run the full bounded tournament with codopt run ....

Starter scaffolding:

codopt scaffold --output-dir codopt_scaffold

This writes starter benchmark.sh, test.sh, Dockerfile, and INFO.md files you can adapt for a new repo.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Mar 26, 2026

This version

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codex_optimize-0.1.0.tar.gz (144.3 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

codex_optimize-0.1.0-py3-none-any.whl (149.2 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file codex_optimize-0.1.0.tar.gz.

File metadata

Download URL: codex_optimize-0.1.0.tar.gz
Upload date: Mar 25, 2026
Size: 144.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.3

File hashes

Hashes for codex_optimize-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`56cb6809d3e744d96256b8d726086375b0b0e811a820532eca25997628cdc8e2`
MD5	`bf878535e5f043907e5bb568098dadf9`
BLAKE2b-256	`facc53d109af1fe535a8392adbcb57cc9e1612ba82b5e0a428e000bc308a0162`

See more details on using hashes here.

File details

Details for the file codex_optimize-0.1.0-py3-none-any.whl.

File metadata

Download URL: codex_optimize-0.1.0-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 149.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.3

File hashes

Hashes for codex_optimize-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`696f6cf0d7cac2d3a90a1e6baf01dd9605af1e97c5e74eb016f646afe5d3bf1a`
MD5	`5c1c6a12d9d12af90ec4891a9caf2422`
BLAKE2b-256	`816ed7b45b9f5b72890ca7576f5664336dda76ea82abe3b3f4469b8941cee02c`

See more details on using hashes here.

codex-optimize 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

codex-optimize

Why?

Quick Start

CLI Flags

Metric Key

Requirements

First-Run Pattern

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes