ACE — Agentic Context Engineering: evolving, self-improving context playbooks for LLM agents. A faithful, framework-style implementation of the ICLR 2026 paper, with first-class OpenAI Agents SDK support.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

rrahimi

These details have not been verified by PyPI

Project links

Paper

Project description

🎮 ACE — Agentic Context Engineering

Evolving, self-improving context playbooks for LLM agents — a clean, tested, framework-style implementation of the ICLR 2026 paper, with first-class OpenAI Agents SDK support.

Stop re-prompting. Let your agent write its own playbook from experience.

📖 Documentation site · 📐 Architecture

Quickstart · Why ACE · Use on your own task · OpenAI Agents SDK · How it works · Results · Architecture

What is this?

LLM agents and domain experts increasingly improve through context adaptation — editing the inputs (instructions, strategies, evidence) instead of the weights. But the two dominant approaches break down:

Brevity bias — prompt optimizers collapse toward short, generic instructions and throw away hard-won domain detail.
Context collapse — letting an LLM rewrite the whole context every step compresses it into a lossy summary and craters accuracy (see below).

ACE fixes both. It treats context as an evolving playbook of small, itemized bullets that accumulate, refine, and organize strategies over time, through a modular Generator → Reflector → Curator loop with incremental delta updates and a grow-and-refine mechanism. The result: comprehensive, scalable, self-improving context — with low overhead.

This repository is a faithful, dependency-light, fully tested implementation you can use in a couple of commands and a few lines of code.

✨ Why ACE

	Prompt optimizers (GEPA, MIPRO)	Monolithic memory (full rewrite)	ACE
Keeps domain detail	❌ brevity bias	⚠️ erodes over time	✅ accumulates
Survives long horizons	⚠️	❌ context collapse	✅ incremental deltas
Update cost	🐢 full re-optimization	🐢 full re-ingest each step	⚡ tiny deltas, non-LLM merge
Works without labels	⚠️	✅	✅ execution feedback
Interpretable / editable	⚠️	⚠️	✅ inspectable bullets

🚀 Quickstart

git clone https://github.com/rrahimi-uci/agentic-context-engineering && cd agentic-context-engineering
pip install -e .            # core library (numpy + rich only)

Run the headline comparison — no API key required (uses a deterministic, offline teaching environment):

ace demo --html report.html

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Method                      ┃ Accuracy ┃ Playbook ┃ Note        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Base LLM (no context)       │ 44.4%    │ 0        │ —           │
│ ACE (offline → eval)        │ 83.3%    │ 5        │ +38.9 pts   │
│ Monolithic rewrite (online) │ 72.2%    │ 4        │ 2 collapses │
│ ACE (online)                │ 83.3%    │ 6        │ no collapse │
└─────────────────────────────┴──────────┴──────────┴─────────────┘

Watch a run adapt live in your terminal:

ace run            # animated dashboard: playbook growth, accuracy, deltas

…or in ~10 lines of Python

from ace import ACE, SimulatedLLM, TeachingEnvironment, build_teaching_task
from ace.baselines import StaticAgent

env  = TeachingEnvironment()
task = build_teaching_task()
train, test = task.split()

base = StaticAgent(SimulatedLLM(env)).run(test)        # no learning
ace  = ACE(SimulatedLLM(env))
ace.adapt_offline(train)                               # build a playbook from feedback
result = ace.evaluate(test)                            # measure on held-out data

print(f"Base {base.accuracy:.0f}%  →  ACE {result.accuracy:.0f}%")
print(ace.playbook.render())                           # human-readable playbook

🔌 Use it with the OpenAI Agents SDK

ACE plugs into the OpenAI Agents SDK as a self-improving memory. The playbook is injected into your agent's instructions on every run; after each task you hand back feedback (a label or just natural execution signal) and ACE grows the playbook.

pip install "ace-playbook[all]"     # adds openai + openai-agents
export OPENAI_API_KEY=sk-...

from agents import Agent, function_tool
from ace import ACE, OpenAILLM
from ace.integrations.openai_agents import ACEAgent

base = Agent(name="Support", instructions="You are a concise support agent.")
agent = ACEAgent(base, ace=ACE(OpenAILLM(model="gpt-4o-mini")))

# Run + learn from execution feedback — no ground-truth labels needed:
out = agent.run_and_learn(
    "Cancel order #C99",
    signal="Policy: cancellation requires identity verification first.",
)
print(out.output)
print(agent.ace.playbook.render())   # the agent just wrote itself a rule

Inside an existing event loop (FastAPI, notebooks, other async agents) use the async entry points — same semantics, no run_sync:

out = await agent.arun_and_learn("Cancel order #C99", signal="...")

ACEAgent accepts string or dynamic (callable) base instructions and composes the playbook beneath them; pass base_instructions=... to override. python examples/04_openai_agents.py is a runnable end-to-end example.

🧩 Use it on your own task

Two extension points make ACE general-purpose — bring your own Task and your own feedback (no ground-truth labels required):

from ace import ACE, Feedback, Sample, Task, OpenAILLM

my_task = Task(name="my-domain", samples=[Sample(id="1", question="...")],
               evaluate=lambda pred, s: my_score(pred, s))

def my_feedback(sample, generation) -> Feedback:
    # plug in execution signals, a reward fn, or an LLM judge — your call
    ok = run_my_checks(generation.answer)
    return Feedback(correct=ok, signal="tests passed" if ok else "tests FAILED")

ace = ACE(OpenAILLM(model="gpt-4o-mini"))
ace.adapt_online(my_task, feedback_fn=my_feedback)   # learns from YOUR signals

See examples/05_custom_task.py (runs offline). The Curator calls the LLM to propose ADD/UPDATE/REMOVE edits by default (deterministic fallback never drops a lesson); force deterministic curation with ACEConfig(curator_use_llm=False).

🧠 How it works

flowchart LR
    Q([Query]) --> G[Generator]
    PB[(Context Playbook)] -. injected .-> G
    G -->|trajectory + bullet usage| R[Reflector]
    FB([Feedback: labels or execution signal]) --> R
    R -->|insights, iterative refinement| C[Curator]
    C -->|delta items| M{{Deterministic Merge - non-LLM}}
    M --> PB
    M --> GR[Grow & Refine: dedupe / prune]
    GR --> PB
    classDef role fill:#1e293b,color:#fff;
    classDef store fill:#2563eb,color:#fff;
    classDef det fill:#16a34a,color:#fff;
    class G,R,C role;
    class PB store;
    class M,GR det;

Generator solves the query using the current playbook, flagging which bullets helped or misled.
Reflector critiques the trajectory against feedback and distills concrete, reusable insights (optionally over several refinement rounds).
Curator turns insights into a few delta operations (ADD / UPDATE / REMOVE).
Deterministic merge applies those edits to the playbook — no LLM, no rewrite, no collapse.
Grow-and-refine de-duplicates (semantic or lexical) and prunes consistently harmful bullets.

ACE runs in two regimes — multi-epoch offline optimization and sequential online test-time adaptation (which can be warm-started from an offline playbook):

flowchart LR
    subgraph Offline["Offline — system-prompt optimization"]
        TR[(Train split)] --> EP{Multi-epoch}
        EP --> ST[ACE.step] --> EP
        EP --> PBO[(Playbook)]
    end
    subgraph Online["Online — test-time memory"]
        S[Next sample] --> PR[predict] --> LE[learn] --> S
    end
    PBO -. optional warm start .-> Online
    classDef store fill:#2563eb,color:#fff;
    class PBO store;

Full diagrams (roles, bullet lifecycle, grow-and-refine, feedback regimes, data model — 14 in total) live in ARCHITECTURE.md and on the docs site.

📊 Results

Reproducible, in this repo (offline teaching environment, no API key)

These come straight from the bundled examples (examples/*.py) and are fully deterministic:

Demo	Base LLM	ACE	Δ
Quickstart (offline → held-out eval)	44.4%	83.3%	+38.9 pts
Context-collapse benchmark (online)	41.7%	88.3%	+46.6 pts
Offline warmup + online	34.5%	96.6%	+62.1 pts

In the context-collapse demo, the monolithic-rewrite baseline collapses its context 7× and stalls at 60.0%, while ACE never collapses. Adaptation token ingestion for ACE is −94.9% vs. full re-ingestion (deltas are tiny). Generate the visual report with ace demo --html report.html → sample report.

Reported in the paper (real benchmarks, DeepSeek-V3.1)

Benchmark	Baseline	+ ACE
AppWorld (agent, avg)	42.4% (ReAct)	59.5% (+17.1)
FiNER (financial NER)	70.7%	78.3%
Formula (financial reasoning)	67.5%	85.5%
Adaptation latency (offline AppWorld)	—	−86.9%
Token cost (online FiNER)	—	−83.6%

On the AppWorld leaderboard, ReAct+ACE with an open-source model matches the top-ranked production GPT-4.1 agent and surpasses it on the harder test-challenge split. (Numbers above are from the paper; this repo reproduces the mechanism and its qualitative behavior offline.)

🗂️ What's in the box

ace/
├── playbook.py      # Bullet + Playbook: the evolving, sectioned context
├── delta.py         # incremental ADD/UPDATE/REMOVE + deterministic merge
├── roles.py         # Generator · Reflector · Curator (+ prompts)
├── refine.py        # grow-and-refine: semantic dedupe + harmful pruning
├── engine.py        # ACE orchestrator: offline / online adaptation
├── llm.py           # LLM protocol · OpenAILLM · deterministic SimulatedLLM
├── feedback.py      # labeled or label-free execution feedback
├── tasks.py         # Sample/Task + offline TeachingEnvironment
├── baselines.py     # StaticAgent + MonolithicRewriteAgent (context collapse)
├── visualize.py     # live terminal dashboard + self-contained HTML report
├── integrations/
│   └── openai_agents.py   # ACEAgent: drop-in self-improving memory
└── cli.py           # `ace demo | run | playbook | version`
examples/            # 5 runnable demos (4 need no API key)
tests/               # 112 tests, run in <1s, zero network

🧪 Develop & test

pip install -e ".[dev]"
pytest                       # 112 tests, fully offline, ~1s
python examples/01_quickstart.py
python examples/02_context_collapse.py   # writes ace_report.html

The bundled SimulatedLLM + TeachingEnvironment make every demo and test deterministic and key-free, so the ACE control loop is exercised end-to-end in CI. Swap in OpenAILLM for real models and benchmarks — the algorithm and prompts are unchanged.

🔍 Key concepts (glossary)

Playbook — the evolving context, a set of itemized bullets grouped into sections.
Bullet — one atomic lesson with a stable id and helpful/harmful counters.
Delta update — a small, localized batch of ADD/UPDATE/REMOVE edits (vs. a full rewrite).
Grow-and-refine — append new bullets, update existing in place, semantically de-duplicate, prune harmful.
Generator / Reflector / Curator — the three specialized roles of the ACE loop.
Offline vs. online — multi-epoch optimization on a train split vs. sequential test-time adaptation.

📚 Citation

@inproceedings{zhang2026ace,
  title     = {Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models},
  author    = {Zhang, Qizheng and Hu, Changran and Upasani, Shubhangi and others},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2026},
  url       = {https://arxiv.org/abs/2510.04618}
}

This implementation is an independent, open-source reproduction for research and educational use. All credit for the ACE method belongs to the original authors.

📝 License

MIT. Contributions welcome — see CONTRIBUTING.md.

_{Built to make self-improving LLM agents easy: pip install → a few lines → a playbook that gets better with every task.}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

rrahimi

These details have not been verified by PyPI

Project links

Paper

Release history Release notifications | RSS feed

0.3.0

Jun 29, 2026

0.2.0

Jun 29, 2026

This version

0.1.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ace_playbook-0.1.0.tar.gz (54.4 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ace_playbook-0.1.0-py3-none-any.whl (44.0 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file ace_playbook-0.1.0.tar.gz.

File metadata

Download URL: ace_playbook-0.1.0.tar.gz
Upload date: Jun 27, 2026
Size: 54.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ace_playbook-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2463f218c88f8c9109a9a4374fe40039dc4c06da042d334d166b05b8494dabb5`
MD5	`7c4705e1b1e602372229bc936cadcc2c`
BLAKE2b-256	`834f60b2626af759f0c6bc100625f2b7ab67180bea5e4c29563d444f9a5fa642`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ace_playbook-0.1.0.tar.gz:

Publisher: publish.yml on rrahimi-uci/agentic-context-engineering

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ace_playbook-0.1.0.tar.gz
- Subject digest: 2463f218c88f8c9109a9a4374fe40039dc4c06da042d334d166b05b8494dabb5
- Sigstore transparency entry: 1984246562
- Sigstore integration time: Jun 27, 2026
Source repository:
- Permalink: rrahimi-uci/agentic-context-engineering@36207114949fe133269e7c12f91435eed3a21648
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/rrahimi-uci
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@36207114949fe133269e7c12f91435eed3a21648
- Trigger Event: release

File details

Details for the file ace_playbook-0.1.0-py3-none-any.whl.

File metadata

Download URL: ace_playbook-0.1.0-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 44.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ace_playbook-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e2f914043ee634009c73886c15614b75ea6f050199831090e55a1ce806784f34`
MD5	`10da257aba805cfd2625b3c1e9945cdc`
BLAKE2b-256	`00d3ca01a4851a5ff895aabbe4c79c6c429e59c80d097e62ec3b168e76384ad9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ace_playbook-0.1.0-py3-none-any.whl:

Publisher: publish.yml on rrahimi-uci/agentic-context-engineering

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ace_playbook-0.1.0-py3-none-any.whl
- Subject digest: e2f914043ee634009c73886c15614b75ea6f050199831090e55a1ce806784f34
- Sigstore transparency entry: 1984246682
- Sigstore integration time: Jun 27, 2026
Source repository:
- Permalink: rrahimi-uci/agentic-context-engineering@36207114949fe133269e7c12f91435eed3a21648
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/rrahimi-uci
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@36207114949fe133269e7c12f91435eed3a21648
- Trigger Event: release

ace-playbook 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎮 ACE — Agentic Context Engineering

Evolving, self-improving context playbooks for LLM agents — a clean, tested, framework-style implementation of the ICLR 2026 paper, with first-class OpenAI Agents SDK support.

What is this?

✨ Why ACE

🚀 Quickstart

…or in ~10 lines of Python

🔌 Use it with the OpenAI Agents SDK

🧩 Use it on your own task

🧠 How it works

📊 Results

Reproducible, in this repo (offline teaching environment, no API key)

Reported in the paper (real benchmarks, DeepSeek-V3.1)

🗂️ What's in the box

🧪 Develop & test

🔍 Key concepts (glossary)

📚 Citation

📝 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance