Self-evolution loop for OpenAI Codex: Stop-hook reviewer + scheduled per-repo memory/recall/skill compilation, zero runtime dependencies.

These details have not been verified by PyPI

Project links

Project description

Codex Self-Evolution Plugin

Language: English | 中文

First time here? Jump to the step-by-step quickstart: docs/getting-started.md (中文 only, for now).

Already set up? One-shot install of the Codex Stop hook: ./scripts/install-codex-hook.sh (and uninstall-codex-hook.sh to remove).

A local Codex plugin that runs a staged self-evolution loop:

SessionStart — creates runtime state and injects stable background from USER.md + MEMORY.md, plus the recall policy and session-recall skill.
Stop — builds a normalized review snapshot, runs a provider-backed reviewer, and persists a structured SuggestionEnvelope (memory updates, recall candidates, skill actions).
compile-preflight — cheap scheduler wake/check step. Returns skip_empty, skip_locked, or run.
compile — writer-owned batch promotion step. Reads existing memory / recall as context, then runs a pluggable backend (script or agent:opencode) and writes final artifacts atomically.
recall / recall-trigger — focused recall during a live turn.

The compiler is the only component that writes final assets (memory, recall, managed skills, receipt); backends only produce structured artifacts.

Install

Codex CLI currently does not read plugin-manifest hooks (gap analysis), so the plugin is installed by writing directly to ~/.codex/hooks.json via the provided scripts. Hooks and the scheduler both invoke the CLI via uvx --from codex-self-evolution-plugin ..., so you don't need a long-lived venv or a repo clone. Full step-by-step guide: docs/getting-started.md (中文).

End-to-end happy-path install on macOS. The only prerequisite is uv (brew install uv):

# 1. grab the installer scripts (they're small; no pip/venv/clone needed)
curl -fsSL https://raw.githubusercontent.com/T0UGH/codex-self-evolution-plugin/main/scripts/install-codex-hook.sh -o /tmp/install-codex-hook.sh
curl -fsSL https://raw.githubusercontent.com/T0UGH/codex-self-evolution-plugin/main/scripts/install-scheduler.sh -o /tmp/install-scheduler.sh
chmod +x /tmp/install-*.sh

# 2. provider credentials (lives under ~/.codex-self-evolution/)
mkdir -p ~/.codex-self-evolution
curl -fsSL https://raw.githubusercontent.com/T0UGH/codex-self-evolution-plugin/main/.env.provider.example \
  -o ~/.codex-self-evolution/.env.provider
# edit the file and set MINIMAX_API_KEY (or OPENAI_API_KEY / ANTHROPIC_API_KEY)

# 3. Stop + SessionStart hooks in ~/.codex/hooks.json
/tmp/install-codex-hook.sh

# 4. launchd scheduler running `scan --backend agent:opencode` every 5 min
/tmp/install-scheduler.sh

# 5. sanity check
uvx --from codex-self-evolution-plugin codex-self-evolution status | python3 -m json.tool

Every invocation after that (hooks, scheduler, manual status) runs out of uvx's cached wheel (~100ms warm). Bumping the PyPI release auto-upgrades next time a hook fires.

Developer install (editable, for contributing): clone, pip install -e ., and use .venv/bin/codex-self-evolution ... directly — see 阶段 2 in the guide for a walkthrough that drives reviewer → compile → memory entirely from the CLI without touching Codex / launchd.

Removing everything: /tmp/install-scheduler.sh has a peer uninstall-scheduler.sh in the same repo; same for install-codex-hook.sh. Both are idempotent and won't touch other tools' hooks or launchd jobs.

Commands

Every subcommand is invokable via uvx --from codex-self-evolution-plugin codex-self-evolution <subcommand> once uv is installed. For readability the examples below drop the uvx --from codex-self-evolution-plugin prefix.

codex-self-evolution session-start --cwd /path/to/repo
codex-self-evolution stop-review --hook-payload /path/to/stop_payload.json
codex-self-evolution compile-preflight --state-dir data
codex-self-evolution compile --once --state-dir data --backend agent:opencode
codex-self-evolution scan --backend agent:opencode        # preflight+compile across all per-project buckets
codex-self-evolution status                               # read-only diagnostic snapshot
codex-self-evolution recall --query "context" --cwd /path/to/repo
codex-self-evolution recall-trigger --query "remember previous flow" --cwd /path/to/repo

The module form is equivalent:

python -m codex_self_evolution.cli session-start --cwd /path/to/repo

Configuration

This section lists everything you can configure. All variables are optional by default; if you only run the deterministic dummy / script path you need zero configuration. The "Required" column tells you what becomes mandatory in which scenario.

1. Runtime paths

Flag / arg	Required	Default	Purpose
`--cwd`	Required for `session-start`, `recall`, `recall-trigger`	—	Repo the session is operating on.
`--state-dir`	Optional	`~/.codex-self-evolution/projects/<mangled-cwd>/`	Root of persistent runtime state (suggestions, memory, recall, skills, compiler receipts, review snapshots, scheduler). Each repo gets an isolated bucket named after its absolute path with `/` → `-`, so user source trees stay clean. Override with `CODEX_SELF_EVOLUTION_HOME` to relocate the whole root.
`--repo-root`	Optional for `compile`, `compile-preflight`	CWD of process	Repo root used to resolve `state-dir` when `--state-dir` is omitted.
`--once`	Optional for `compile`	off	Run a single compile pass instead of looping.
`--backend`	Optional for `compile`	`script`	`script` or `agent:opencode`. The default scheduler plist uses `agent:opencode`.
`--explicit`	Optional for `recall-trigger`	off	Mark the recall trigger as user-explicit.

State layout under --state-dir (default ~/.codex-self-evolution/projects/<mangled-cwd>/):

~/.codex-self-evolution/
├── .env.provider                 # API keys (created by install-codex-hook.sh)
└── projects/
    └── -Users-alice-code-myrepo/ # one bucket per repo; / → -
        ├── suggestions/{pending,processing,done,failed,discarded}/
        ├── memory/               # USER.md, MEMORY.md, memory.json
        ├── recall/               # index.json, compiled.md
        ├── skills/managed/       # managed skill markdown + manifest.json
        ├── compiler/             # compile.lock, last_receipt.json
        ├── review/snapshots/     # normalized Stop-time snapshots
        ├── review/failed/        # raw reviewer response when parse fails
        └── scheduler/

2. Hook environment variables (Codex-provided)

These are injected by the Codex host when it invokes the hook commands defined in .codex-plugin/plugin.json. You do not set them manually.

Variable	Used by	Purpose
`CODEX_CWD`	`session-start`, `recall`, `recall-trigger`	Current repo working directory.
`CODEX_STATE_DIR`	all hooks	Points at runtime state dir.
`CODEX_HOOK_PAYLOAD`	`stop-review`	Path to the Stop payload JSON.
`CODEX_RECALL_QUERY`	`recall`, `recall-trigger`	Query string for focused recall.

3. Reviewer providers (`Stop` step)

The reviewer is provider-backed. Selection priority:

The reviewer_provider field in the Stop payload.
Otherwise: dummy.

Provider	Purpose	Required when used
`dummy`	Deterministic stub for tests / dry runs	nothing (optionally honors `provider_stub_response` in the Stop payload)
`openai-compatible`	OpenAI chat-completions dialect	`OPENAI_API_KEY` (or explicit `api_key` option)
`anthropic-style`	Anthropic messages dialect	`ANTHROPIC_API_KEY`
`minimax`	MiniMax (Anthropic-dialect endpoint)	`MINIMAX_API_KEY`

Reviewer env vars

Variable	Required	Default	Purpose
`OPENAI_API_KEY`	Required for `openai-compatible`	—	Bearer token.
`OPENAI_BASE_URL`	Optional	`https://api.openai.com/v1/chat/completions`	Override endpoint.
`OPENAI_REVIEW_MODEL`	Optional	`gpt-4.1-mini`	Model id sent in request body.
`ANTHROPIC_API_KEY`	Required for `anthropic-style`	—	`x-api-key` header.
`ANTHROPIC_BASE_URL`	Optional	`https://api.anthropic.com/v1/messages`	Override endpoint.
`ANTHROPIC_REVIEW_MODEL`	Optional	`claude-3-5-haiku-latest`	Model id.
`MINIMAX_API_KEY`	Required for `minimax`	—	Bearer token.
`MINIMAX_REGION`	Optional	`global`	`global` → `https://api.minimax.io/anthropic/v1/messages`. `cn` → `https://api.minimaxi.com/anthropic/v1/messages`.
`MINIMAX_BASE_URL`	Optional	derived from region	Full URL override. Takes precedence over `MINIMAX_REGION`.
`MINIMAX_REVIEW_MODEL`	Optional	`MiniMax-M2.7`	Model id.

Reviewer provider options (programmatic)

Passed in the provider_options dict when calling run_reviewer(...) directly. Each option overrides the corresponding env var.

Option	Default	Notes
`api_key`	from env	Overrides the provider's env-sourced key.
`api_base`	provider default	Full URL.
`model`	provider default	Model id.
`max_tokens`	`4096`	Output budget (not context — that's 200k). Safely within every supported model's 8k output ceiling; room for 10+ suggestions without truncation.
`timeout_seconds`	`30`	HTTP timeout.
`anthropic_version`	`2023-06-01`	`anthropic-version` header (Anthropic dialect only).
`stub_response`	—	Dummy provider only: canned reviewer JSON.

4. Compile backends

Selected via --backend:

Backend	Required	Notes
`script`	nothing	Deterministic Python merge. Safe default. Reads existing memory / recall and does conservative incremental merge (does not wipe stable entries).
`agent:opencode`	`opencode` binary on `PATH` or an explicit `opencode_command`	Writes the `{batch, existing_assets, repo, contract}` payload to a temp JSON file, invokes `opencode run --format json --file <payload> --dangerously-skip-permissions -- <prompt>`, parses the event stream, strips code fences, and extracts the first balanced JSON object. Any failure (binary missing / non-zero exit / timeout / empty output / schema mismatch) falls back to `script`, unless `allow_fallback=False`. Validated against opencode 1.4.0.

Agent compiler configuration

Channel	Variable / option	Default	Purpose
Env var	`CODEX_SELF_EVOLUTION_OPENCODE_COMMAND`	—	Space-separated argv used instead of the default `opencode run --format json --file <payload> --dangerously-skip-permissions -- <prompt>`. Use this if your opencode install has a different CLI shape or needs extra flags.
Env var	`CODEX_SELF_EVOLUTION_OPENCODE_MODEL`	—	`--model <provider/name>` appended to the default command. Useful when the default build model produces truncated or non-JSON output.
Env var	`CODEX_SELF_EVOLUTION_OPENCODE_AGENT`	—	`--agent <name>` appended to the default command. Pick a narrow opencode agent profile if you want to lock down tool access.
`options["opencode_command"]`	—	env var, else built via `_build_default_opencode_command`	Explicit argv list. Takes precedence over env var.
`options["opencode_model"]` / `options["opencode_agent"]`	—	env var fallback	Override model / agent per invocation.
`options["opencode_skip_permissions"]`	—	`True`	Pass `--dangerously-skip-permissions` so the agent can call file-read tools without a TUI prompt (required for headless subprocess use). Turn off only if you've vetted the agent profile.
`options["opencode_timeout_seconds"]`	—	`900` (15 min)	Subprocess timeout. Kept strictly below `DEFAULT_LOCK_STALE_SECONDS` so a hung agent times out, the backend falls back, and `finally` releases the lock before preflight evicts it.
`options["allow_fallback"]`	—	`True`	If `False`, the agent backend raises `RuntimeError` instead of falling back to `script` on failure.

Discard reasons appended to CompileArtifacts.discarded_items when the agent path fails:

opencode_unavailable — binary not on PATH and no custom invoker.
agent_invoke_failed — subprocess raised (non-zero exit, timeout, etc.); detail has the truncated error.
agent_output_invalid — stdout was not valid JSON, or did not match the response schema; detail has the parse error.

The agent response schema (src/codex_self_evolution/compiler/agent_io.py::COMPILE_CONTRACT) is:

{
  "memory_records": {"user": [...], "global": [...]},
  "recall_records": [...],
  "compiled_skills": [...],
  "manifest_entries": [...],
  "discarded_items": [...]
}

5. Compile runtime

Defined in src/codex_self_evolution/config.py:

Constant	Default	Purpose
`DEFAULT_BATCH_SIZE`	`100`	Max suggestions claimed per compile pass. Override by calling `run_compile(batch_size=...)` from your own scheduler.
`DEFAULT_LOCK_STALE_SECONDS`	`1800` (30 min)	Hard upper bound for a `compile.lock`. A normal compile is expected to finish well under this (target 5-10 min). See Compile lock protection for how stale locks are detected.
`PLUGIN_OWNER`	`codex-self-evolution-plugin`	Only managed skills owned by this string can be modified by the compiler. Used to reject writes to unmanaged skills.

6. Compile lock protection

A single compile.lock file under <state-dir>/compiler/compile.lock serializes compile runs. It is JSON: {created_at, pid}. A lock is considered stale and reclaimable by the next preflight/file_lock call if any of the following hold:

Condition	Detected by	Why
Owning `pid` is no longer a running process	`os.kill(pid, 0)` → `ProcessLookupError`	SIGKILL, crash, or machine reboot orphaned the lock. Cleared immediately.
Lock `created_at` is in the future (`age_seconds < 0`)	`utc_now() - created_at`	Clock skew / NTP rollback. Never trust a lock from the future.
Lock `created_at` older than `DEFAULT_LOCK_STALE_SECONDS` (30 min)	age threshold	Process is still alive but has been running past the tolerance.

Design contract: since there is no heartbeat, opencode_timeout_seconds (default 15 min) must stay strictly below the lock stale window (30 min). If the agent hangs, the subprocess times out → AgentCompilerBackend._fallback runs → finally releases the lock — all well before the next preflight would steal it. Changing one of these constants must preserve that invariant.

lock_status(paths) returns {locked, stale, stale_reason, pid_alive, age_seconds, owner_pid} for diagnostics.

7. Scheduler (launchd)

Template plist: docs/launchd/com.codex-self-evolution.preflight.plist.

You must edit:

Interpreter path (e.g. /Users/haha/hermes-agent/venv/bin/python3.11) to match your Python venv.
Working directory to your repo root.
--state-dir to the absolute path of your runtime state.

The job should wake cheaply, run compile-preflight, and only invoke compile when preflight returns run:

codex-self-evolution compile-preflight --state-dir data
# if status == run:
codex-self-evolution compile --once --state-dir data --backend agent:opencode

8. Docker / smoke tests

Variable	Used by	Default	Purpose
`PYTHON`	Makefile targets	`/Users/haha/hermes-agent/venv/bin/python3.11`	Interpreter for `make test`, `make preflight`, `make provider-smoke-*`. Override if your venv lives elsewhere.
`IMAGE`	`make docker-*`	`codex-self-evolution-e2e`	Docker image tag.
`ENV_FILE`	`make provider-smoke-*`	`~/.codex-self-evolution/.env.provider`	Sourced before running real-provider smoke tests. Lives under the plugin home dir so it's shared with the installed Stop hook. Set `ENV_FILE=.env.provider` if you still keep a repo-root copy.

.env.provider is auto-sourced by the Makefile if present. Copy from the template into the plugin home dir:

mkdir -p ~/.codex-self-evolution
cp .env.provider.example ~/.codex-self-evolution/.env.provider
# fill the keys you need — both make provider-smoke-* and the installed
# Stop hook read from this single location.

scripts/install-codex-hook.sh will auto-migrate a legacy <repo>/.env.provider into ~/.codex-self-evolution/.env.provider on its first run.

Reviewer runtime

Reviewer invocation lives in src/codex_self_evolution/review/runner.py. It:

Loads the baked prompt at src/codex_self_evolution/review/prompt.md.
Resolves a provider (dummy, openai-compatible, anthropic-style, minimax).
Sends the normalized review snapshot.
Parses the JSON response via parse_reviewer_output(...) and validates it against ReviewerOutput schema. Malformed output raises SchemaError and is rejected.

The main Stop path no longer trusts pre-baked reviewer_output in the payload: fixtures are test-only.

Compile pipeline

pending suggestion batch
  + existing memory / recall / manifest (loaded by build_compile_context)
  -> backend.compile(batch, context, options)
  -> apply_compiler_outputs(...)  # atomic writes to memory / recall / skills
  -> write_receipt(...)

build_compile_context reads memory/USER.md, memory/MEMORY.md, memory/memory.json, recall/index.json, recall/compiled.md, and the skill manifest. Missing or corrupt files fall back to empty values without raising.
ScriptCompilerBackend uses compile_memory(existing_index=...) and compile_recall(existing_records=...) — existing entries are preserved by default; new suggestions only append on new (scope, content) pairs (memory) or new sha1(content) (recall).
AgentCompilerBackend sends the full payload (batch + existing_assets + repo + contract) to opencode, parses strict JSON back, and falls back to script on any failure.
Final writes (apply_compiler_outputs) are owned by the compiler engine, not by a separate writer module.

Each suggestion in suggestions/ carries:

stable suggestion_id
idempotency_key
explicit state
attempt_count
optional failure_reason
transition_log

Docker E2E

A containerized smoke/e2e flow is included.

docker build -t codex-self-evolution-e2e .
docker run --rm codex-self-evolution-e2e
# or via compose:
docker compose run --rm e2e
# or one command:
make docker-e2e

The container runs scripts/docker-e2e.sh, which:

Runs pytest.
Runs session-start.
Generates a Stop payload and runs stop-review.
Runs compile-preflight.
Runs compile --backend agent:opencode (falls back to script in the container because opencode is not installed).
Runs recall-trigger.
Verifies final memory / skill / receipt artifacts.

Real provider smoke tests

make provider-smoke-minimax
make provider-smoke-openai
make provider-smoke-anthropic

Recommended first path: make provider-smoke-minimax.

Required env (per provider): MINIMAX_API_KEY, OPENAI_API_KEY, or ANTHROPIC_API_KEY. See Reviewer providers for optional overrides.

These run scripts/provider-smoke-test.py against the real provider APIs and print the structured reviewer output plus request payload metadata.

Local testing

make test           # pytest
make e2e-local      # scripts/docker-e2e.sh without Docker
make preflight      # one compile-preflight call against data/

Development Notes

Hook wiring lives only in .codex-plugin/plugin.json.
Final writes are owned by src/codex_self_evolution/compiler/engine.py (not a separate writer.py).
Managed skills are isolated under skills/managed/ and require plugin-owned manifest entries (owner = codex-self-evolution-plugin). The compiler refuses to modify skills owned by anything else.
Review snapshots are normalized and persisted under review/snapshots/ for debugging / auditability.
Recall uses repo/cwd-first ranking and exposes a trigger helper instead of preloading large recall material at session start.
When touching compile behaviour, read docs/2026-04-20-compiler-existing-assets-handoff.md for the rationale behind the current existing-assets pipeline.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.1

Apr 22, 2026

This version

0.7.0

Apr 22, 2026

0.6.0

Apr 22, 2026

0.5.2

Apr 22, 2026

0.5.1

Apr 22, 2026

0.5.0

Apr 22, 2026

0.4.0

Apr 22, 2026

0.3.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codex_self_evolution_plugin-0.7.0.tar.gz (145.9 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

codex_self_evolution_plugin-0.7.0-py3-none-any.whl (101.5 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file codex_self_evolution_plugin-0.7.0.tar.gz.

File metadata

Download URL: codex_self_evolution_plugin-0.7.0.tar.gz
Upload date: Apr 22, 2026
Size: 145.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for codex_self_evolution_plugin-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`911c36f6a4b87dbea2e6b3a353a5fc77abf48e098fea90ffe7a82cb691c1b0b0`
MD5	`5cdaafbcb646e2f735d94e62e0883554`
BLAKE2b-256	`be03ce41202d36070752f0e1df4f29f30bdc9aa87f5ef8b12bb86493db204192`

See more details on using hashes here.

File details

Details for the file codex_self_evolution_plugin-0.7.0-py3-none-any.whl.

File metadata

Download URL: codex_self_evolution_plugin-0.7.0-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 101.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for codex_self_evolution_plugin-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0303ec0b77c380a234680776998fd53ee49a2f0a81944fc3dfa6d3fca7c4a875`
MD5	`ef9ac19453860ed435dc37f9f4ee5cb6`
BLAKE2b-256	`2357857e6ee1165834e11d10d9680fa706bcd12c1f4072977ce7f71293fc8a36`

See more details on using hashes here.

codex-self-evolution-plugin 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Codex Self-Evolution Plugin

Install

Commands

Configuration

1. Runtime paths

2. Hook environment variables (Codex-provided)

3. Reviewer providers (Stop step)

Reviewer env vars

Reviewer provider options (programmatic)

4. Compile backends

Agent compiler configuration

5. Compile runtime

6. Compile lock protection

7. Scheduler (launchd)

8. Docker / smoke tests

Reviewer runtime

Compile pipeline

Docker E2E

Real provider smoke tests

Local testing

Development Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

3. Reviewer providers (`Stop` step)