Open benchmark for AI conversation compaction methods
Project description
CompactBench
Open benchmark for AI conversation compaction methods.
CompactBench measures whether language models still behave correctly after long conversation history is replaced with a compacted representation. It runs adversarial, deterministic, multi-cycle benchmarks and publishes ranked results on a public leaderboard.
- Deterministic generation — same template + seed + version always yields the same case
- Hidden ranked set — public practice cases for development, hidden templates for ranked scoring
- Multi-cycle drift — methods are evaluated across repeated compact → continue → compact loops
- State-fidelity scoring — correctness of retained decisions, constraints, and entities, not output style
- Versioned everywhere — benchmark suite, template, scorer, model, and method versions are recorded with every result
Install
pip install compactbench
Or with uv (recommended for development):
uv pip install compactbench
Quickstart
Run a built-in compactor against the starter suite using a local Ollama model:
compactbench run \
--method built-in:hybrid-ledger \
--suite starter \
--provider ollama \
--model llama3.2
Generate a single case deterministically for inspection:
compactbench generate --template buried_constraint_v1 --seed 42
Score an existing results file:
compactbench score --results results.jsonl
Writing your own compactor
Implement the Compactor interface and register it.
from compactbench.compactors import Compactor
from compactbench.contracts import CompactionArtifact, Transcript
class MyCompactor(Compactor):
name = "my-method"
version = "0.1.0"
def compact(self, transcript: Transcript, config: dict) -> CompactionArtifact:
...
Then run:
compactbench run --method path/to/my_compactor.py:MyCompactor --suite elite_practice
See docs/writing-a-compactor.md for full details.
Leaderboard
The public leaderboard is at https://compactbench.github.io/compactbench/leaderboard.
Submissions are evaluated against hidden ranked benchmark cases by a maintainer-operated runner. To submit:
- Write and test your compactor locally against
elite_practice. - Open a PR to
submissions/with your method source and config. - A maintainer runs it against the hidden set and merges if it qualifies.
See docs/submitting.md for the full submission protocol.
Project status
v0.1.0 launch-ready. All ten workorders from the implementation roadmap have landed:
- Core: DSL parser, case generation, scoring engine, mock + real providers (Groq / Google AI Studio / Ollama)
- Methods: four built-in compactors (
naive-summary,structured-state,hierarchical-summary,hybrid-ledger) - Runtime: end-to-end
compactbench runwith drift cycles, JSONL event log,--resume - Leaderboard: PR-based submission workflow on GitHub-hosted runners, static site fed by a qualification + ranking core
- Content: 15 public Elite practice templates + 15 hidden ranked templates across three launch families
- Release: PyPI trusted-publishing workflow wired up; tag
v0.1.0to ship
See CHANGELOG.md for the full breakdown. Post-launch work (hidden-set content expansion, additional template families, shadow evaluation automation, custom domain) is tracked via GitHub issues.
Contributing
Bug reports, template proposals, and new compactors are welcome. See CONTRIBUTING.md.
Please also read our Code of Conduct.
License
Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file compactbench-0.1.0.tar.gz.
File metadata
- Download URL: compactbench-0.1.0.tar.gz
- Upload date:
- Size: 58.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5476f7a422a6e0d9b1b88b045e756eac49942470891742de4a99369e284903b3
|
|
| MD5 |
61086bc23730407ed62212404e35f8e3
|
|
| BLAKE2b-256 |
5b8e46a4656838c6afd32642cdb1df2bc4f856c1625e1e8bd15c08bf2e314aec
|
Provenance
The following attestation bundles were made for compactbench-0.1.0.tar.gz:
Publisher:
release.yml on compactbench/compactbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
compactbench-0.1.0.tar.gz -
Subject digest:
5476f7a422a6e0d9b1b88b045e756eac49942470891742de4a99369e284903b3 - Sigstore transparency entry: 1332364317
- Sigstore integration time:
-
Permalink:
compactbench/compactbench@438c48d5a52f0633957652f7b0e5d90d457c7fa3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/compactbench
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@438c48d5a52f0633957652f7b0e5d90d457c7fa3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file compactbench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: compactbench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 70.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08a8f6d36b8e4da75f8dbfdb118c897ea1e29175c522cb55c2027d56d5e5eb9c
|
|
| MD5 |
0db0361b4c19645577c96e595a842697
|
|
| BLAKE2b-256 |
da242e99c0ff8c2f53946075b86072e8e4aeda2220c4ec5afca13560bef33ea2
|
Provenance
The following attestation bundles were made for compactbench-0.1.0-py3-none-any.whl:
Publisher:
release.yml on compactbench/compactbench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
compactbench-0.1.0-py3-none-any.whl -
Subject digest:
08a8f6d36b8e4da75f8dbfdb118c897ea1e29175c522cb55c2027d56d5e5eb9c - Sigstore transparency entry: 1332364456
- Sigstore integration time:
-
Permalink:
compactbench/compactbench@438c48d5a52f0633957652f7b0e5d90d457c7fa3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/compactbench
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@438c48d5a52f0633957652f7b0e5d90d457c7fa3 -
Trigger Event:
push
-
Statement type: