A local memory + orchestration framework for token-efficient LLM agents: planner/executor split, retrieved tool descriptions, batch tool calls, Reflexion-style success-pattern mining, tactical/strategic memory over execution traces, polyglot subprocess tools, agent-as-tool composition, Anthropic prompt caching, per-run usage events, and TOON I/O.
Project description
fabri
A local, open-source memory + orchestration layer for a custom LLM agent. Built around two ideas:
- SCOPE (arXiv:2512.15374): treat the agent's prompt context as something that evolves automatically from execution traces, via a tactical/strategic memory split with conflict resolution before promotion.
- Context engineering over prompt engineering: keep retrieved context compact and just-in-time, give each tool a single clear job, and let tools be polyglot processes behind a uniform JSON contract.
Quickstart
pip install fabri # the `fabri` console command lands on your PATH
docker run -p 6333:6333 qdrant/qdrant # vector store for the agent's memory
export ANTHROPIC_API_KEY=...
fabri init demo && cd demo # scaffold a runnable starter project
fabri --config agent.yaml run "greet Ada with the hello tool"
fabri init writes an agent.yaml, an example tool under tools/agent_tools/,
and a docker-compose.yml — edit those, not the library. For OpenAI models,
pip install "fabri[openai]" and set llm.provider: openai.
Setup (from a checkout)
docker compose up -d # starts Qdrant on :6333
python3 -m venv .venv && .venv/bin/pip install -e ".[dev]"
export ANTHROPIC_API_KEY=... # only needed for AnthropicLLMBackend
Installing puts a fabri console command on your PATH — that's the entry point
everything below uses. Embeddings run fully locally via
sentence-transformers/all-MiniLM-L6-v2 — no embedding API calls, no cloud
dependency beyond the LLM itself.
Usage
fabri run "some task description"
fabri --config agent.yaml run "some task description" # config-driven agent, see docs/creating-an-agent.md
fabri --verbose run "some task description" # DEBUG logging to console too
fabri inspect-memory "a query to test retrieval"
fabri ingest-traces <session-id>
(python cli.py ... still works from a source checkout — it's a thin shim over
the same fabri.cli:main the console script points at.)
See docs/creating-an-agent.md for using this as a library/framework in
another project: installing it, writing an agent.yaml, adding tools, and a
worked example.
run executes the agent loop, then immediately mines the resulting trace for
failures and synthesizes any new guidelines. ingest-traces re-runs that
synthesis step against a past session's trace on its own. run and
ingest-traces both require ANTHROPIC_API_KEY (they fail fast with a clear
message if it's unset, rather than a raw stack trace) — inspect-memory does
not, since it only talks to Qdrant.
Every run produces two records, both keyed by session_id: a structured
JSONL trace (.fabri/traces/<session_id>.jsonl, the machine-readable
record used by the pipeline) and a human-readable log
(.fabri/logs/<session_id>.log, always
DEBUG-level regardless of --verbose — that flag only controls what's also
echoed to the console) with LLM call latency/token usage, tool dispatch
latency, and every promotion/dedup decision the pipeline makes.
Both land under .fabri/ in the directory you run from (override with
$FABRI_HOME), so each consuming project keeps its own traces and logs
rather than scattering them into wherever the package happens to be installed.
Add .fabri/ to your project's .gitignore.
Each run returns an outcome: success (clean), success_with_recovery
(finished, but at least one tool call failed along the way), or incomplete
(hit the step limit with no final answer).
Architecture
src/fabri/ # src/ layout: repo is "fabri", package is "fabri"
memory/ embeddings, schema, Qdrant store, compression, pruning/promotion
orchestrator/ retrieval, trace logging, the trace -> guideline pipeline
tools/ manifest schema, subprocess runner, example tools, agent-as-tool adapter
core/ the ReAct agent loop, decompose, and the pluggable LLM backend
runtime.py build_llm/build_tools/build_tool_defs -- shared by cli and agent_runner_tool
admin.py admin-only config inspection + dashboard (stub auth seam, see below)
config.py YAML agent config loader (DEFAULT_CONFIG + load_config)
toon.py TOON codec: compact JSON-shaped encoding for tool results (token savings)
paths.py project-local .fabri/ resolution for traces + logs
cli.py the `fabri` console command (composition over the public API)
__init__.py public API: run_agent, load_config, ToolRegistry, QdrantMemoryStore, ...
cli.py thin shim: `python cli.py ...` -> fabri.cli:main
Agents as tools
A tools.agents entry in a config exposes a different agent.yaml as a tool
on this one (fabri/tools/agent_tool.py builds the manifest,
agent_runner_tool.py runs the sub-agent as an ordinary subprocess tool —
stdin {"task": ...}, stdout {"final_text", "outcome"}). This is
composition, not a new orchestrator: each sub-agent is just another tool call
in the parent's normal ReAct loop. See ludexel/.agent/game_content_agent.yaml
for a worked example wiring four domain agents (character/tiles/map/story)
into one top-level agent.
A tools.agents entry may carry optional model / max_tokens keys that
override the sub-agent's llm.model / llm.max_tokens at spawn time, so the
parent can run on Sonnet while a cheap classifier sub-agent runs on Haiku
without duplicating the full agent.yaml:
tools:
agents:
- name: classify
description: Classify a snippet into one of N labels.
config: tools/agent_tools/classifier.yaml
model: claude-haiku-4-5 # cheap override; sub-agent's own yaml keeps the default
max_tokens: 256
Admin CLI / dashboard
cli.py admin config --config agent.yaml prints the merged config plus the
resolved tool registry as JSON; cli.py admin dashboard --config agent.yaml
prints a human-readable summary (agent/llm/tools/memory counts). Both are
gated by require_admin() (fabri/admin.py): if FABRI_ADMIN_TOKEN
is unset the gate is open (no auth backend exists yet), but once it's set,
--admin-token must match it. This is a placeholder seam for real auth
(SSO/API gateway/etc.), not a security boundary by itself.
Tool contract
A tool is a JSON manifest (name, description, command, schemas, timeout)
next to an executable in any language. runner.py invokes command as a
subprocess, writes the call's args as JSON to stdin, and parses stdout as
JSON. The result is always normalized to {ok, error?, result?, stderr?}
before the agent sees it — malformed output, timeouts, and tool error exits
are distinct, well-defined failure modes rather than crashes. See
tools/examples/ for a Python tool (echo), a Go tool (sum), and a
deliberately misbehaving tool (broken) used only by the test suite.
Token efficiency (TOON)
Tool results are encoded into the model's context as TOON
(Token-Oriented Object Notation, toon.py) rather than JSON by default — a
compact indentation-based encoding that drops braces and collapses uniform
arrays to a single header row plus data rows, typically ~30–40% fewer
characters on tabular results. The framework encodes this itself (no model
reliability risk); the trace/logs keep raw JSON. Pipeline: json → toon → llm
inbound, and JSON outbound by default (tools.result_format,
agent.output_format — see docs/creating-an-agent.md). Native tool-call
arguments stay provider JSON. The codec round-trips any JSON-shaped value and
fabri.toon.encode/.decode are public.
Memory lifecycle
- Every agent run logs a JSONL trace (
traces/<session_id>.jsonl). orchestrator/pipeline.pyscans a trace for tool-call failures and asks the LLM to compress each into a short, generalized guideline (memory/compress.pyenforces a hard token cap regardless of what the LLM returns).memory/pruning.pychecks the new guideline against existing tactical entries by cosine similarity. A near-duplicate increments that entry's recurrence count instead of inserting a copy; once it has recurred across 3 distinct sessions (PROMOTION_THRESHOLD_SESSIONS), it's promoted to strategic.orchestrator/retrieval.pyembeds the next task, pulls the top-k most relevant guidelines (tactical + strategic) from Qdrant, and formats them as a compact bullet list injected into the agent's system prompt. If any available tool's name appears in the task text, a second query filters by that tool's tag (MemoryEntry.tools, populated in step 2 from whichever tool actually failed) and those hits are guaranteed inclusion — this surfaces tool-specific guidelines even when their wording is too dissimilar from the query for cosine similarity alone to rank them in the top-k. (A graph-DB-backed version of this was considered and rejected: the only capability needed was a single-hop "guidelines for this tool" lookup, which a payload filter gives you for free — a second database would have added a dependency, a migration risk, and a way for the graph and the vector store to drift out of sync, for no real capability gain.)
This is what closes the loop: a failure in session N becomes retrievable context in session N+1, without a human re-writing the prompt by hand.
Known v1 limitations (intentional, not bugs)
- No embedding-model migration path. Each memory point stores a
model_versionpayload field, but swapping the embedding model means recreating the Qdrant collection from scratch — old vectors are not re-embedded automatically. - Single-writer assumption. Concurrent distinct agent processes ingesting guidelines at the same time are not locked against each other. This is made safe-by-construction for the common case (same guideline text) via deterministic point IDs (hash of the compressed text), but two different writers racing on genuinely different new guidelines is untested.
- No automatic eviction.
memory/pruning.py:evict_staleexists but is not wired into a scheduled job — run it manually if the strategic store grows large with low-value entries.
Tests
.venv/bin/python -m pytest tests/ -q
Covers store round-trip/idempotency, the dedup + promotion rule, the tool
runner's normalized failure contract (Python + Go + malformed output +
timeout), the token-cap enforcement in compress.py, the TOON codec
(round-trip + fuzz), and the fabri init scaffold.
Releasing
Tag a version (git tag v0.1.0 && git push origin v0.1.0) and GitHub Actions
publishes to PyPI via Trusted Publishing — see RELEASING.md for the one-time
setup.
License
Apache-2.0 © Rushikesh Patade.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fabri-0.4.2.tar.gz.
File metadata
- Download URL: fabri-0.4.2.tar.gz
- Upload date:
- Size: 106.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe0a1c2cd130e040a62f305aee88647e9d6949d1db00798db6333f7a2916056e
|
|
| MD5 |
004003c4e37b166cb1996d9939658cc8
|
|
| BLAKE2b-256 |
46a14a591105cda04c8a2265907d6e90ae3a3a5d9babaf84eaa22dbcc2085751
|
Provenance
The following attestation bundles were made for fabri-0.4.2.tar.gz:
Publisher:
release.yml on Rushour0/fabri
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fabri-0.4.2.tar.gz -
Subject digest:
fe0a1c2cd130e040a62f305aee88647e9d6949d1db00798db6333f7a2916056e - Sigstore transparency entry: 1888043073
- Sigstore integration time:
-
Permalink:
Rushour0/fabri@b6398503c56c165f9222e8ed04189901bb549f8d -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/Rushour0
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b6398503c56c165f9222e8ed04189901bb549f8d -
Trigger Event:
push
-
Statement type:
File details
Details for the file fabri-0.4.2-py3-none-any.whl.
File metadata
- Download URL: fabri-0.4.2-py3-none-any.whl
- Upload date:
- Size: 95.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbd3f72d9fcd90f2cbded4638e566685e1f431556f6bc81e4f7a6f234d1884f9
|
|
| MD5 |
10806d399dda64d841aaefe1f8b0b2b9
|
|
| BLAKE2b-256 |
d5c27fcc073c9c1caba7aa08e31c33fc7975d63e39a7e7074e6ae2c01f1f4945
|
Provenance
The following attestation bundles were made for fabri-0.4.2-py3-none-any.whl:
Publisher:
release.yml on Rushour0/fabri
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fabri-0.4.2-py3-none-any.whl -
Subject digest:
bbd3f72d9fcd90f2cbded4638e566685e1f431556f6bc81e4f7a6f234d1884f9 - Sigstore transparency entry: 1888043216
- Sigstore integration time:
-
Permalink:
Rushour0/fabri@b6398503c56c165f9222e8ed04189901bb549f8d -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/Rushour0
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b6398503c56c165f9222e8ed04189901bb549f8d -
Trigger Event:
push
-
Statement type: