Skip to main content

Open source knowledge management pipeline — append-only intake, tiered compilation, configurable schemas

Project description

Athenaeum

PyPI version Python versions License

Open source knowledge management pipeline for AI agents — append-only intake, tiered compilation, configurable schemas.

Using Claude Code? Athenaeum ships a transparent memory sidecar — a SessionStart + UserPromptSubmit hook pair that auto-recalls wiki pages relevant to each prompt and lets Claude save observations without explicit /remember calls. Jump to Transparent sidecar (hooks).

Architecture

Athenaeum implements a novel approach to persistent AI agent memory:

  • Append-only intake — safety through write constraints, not trust scores
  • Wikipedia-style footnote trust — source entities build an emergent trust graph
  • Configurable observation filter — a self-improving "what to remember" prompt
  • Three types of contradiction — factual (fix), contextual (keep both), principled (revise axiom)
  • Four-tier compilation — programmatic → fast LLM → capable LLM → human escalation

Installation

pip install athenaeum

Quick start

# Initialize a knowledge directory
athenaeum init

# Or specify a custom path
athenaeum init --path ~/my-knowledge

Usage

Running the pipeline

# Run the librarian pipeline (processes raw files into wiki entities)
athenaeum run

# Dry run — inspect what would happen without writing files
athenaeum run --dry-run

# Custom paths and limits
athenaeum run \
  --raw-root ~/knowledge/raw \
  --wiki-root ~/knowledge/wiki \
  --knowledge-root ~/knowledge \
  --max-files 50 \
  --max-api-calls 200 \
  --verbose

Checking status

# Show knowledge base status (entity counts, pending files, last run)
athenaeum status
athenaeum status --path ~/my-knowledge

MCP memory server

Athenaeum includes an MCP-compatible server that gives AI agents remember and recall tools for persistent knowledge management.

# Install with MCP support
pip install athenaeum[mcp]

# Start the server
athenaeum serve --path ~/knowledge

Smoke-test the round-trip without a live session:

athenaeum test-mcp
#   PASS  remember_write
#   PASS  recall_search (keyword)
#   PASS  create_server (FastMCP)
#
# 3 passed, 0 failed

When wired to Claude Code, the agent can save facts mid-conversation:

User: Tristan's partner is Amanda; they met at Stanford GSB.

(Claude calls remember(content="Tristan's partner is Amanda; they met at Stanford GSB.", source="claude-session"))

A raw observation lands in raw/claude-session/20260417T…-…md. On the next athenaeum run, the pipeline compiles it into Tristan's wiki entity (under "Key Contacts") and Amanda's own entity if she doesn't exist yet. Later sessions can ask "who is Amanda?" and recall returns the compiled page.

Vector search (optional)

Athenaeum supports a vector search backend (chromadb + all-MiniLM-L6-v2) for semantic recall alongside the default FTS5 keyword backend.

pip install athenaeum[vector]

Enable it in athenaeum.yaml:

search_backend: vector

The recall hook runs a hybrid FTS5 + vector merge when vector is configured. Each backend rescues a failure class the other one has:

  • FTS5 rescues proper-noun collisions in embedding space. Short queries like Return Path embed closer to generic pages containing the word "path" than to a sparse entity page about the company. FTS5 phrase matching surfaces the entity. Vector-only recall misses it.
  • Vector rescues semantic queries with no lexical overlap. A query like "iterative feedback loops" has no literal token overlap with Innovation Accounting, but the vector index places them as neighbours. FTS5-only recall misses it.

Removing either backend collapses recall for its rescue class. See docs/recall-architecture.md for the full walkthrough and the four invariants a future simplification must not remove.

Query-topic extraction (optional)

athenaeum query-topics "your prompt" runs a Haiku classifier that returns substantive topics and ignores meta-instructions:

$ athenaeum query-topics "Without calling any tools, quote the block about Return Path verbatim"
Return Path

Compare to the naive regex+stopword fallback, which returns block,calling,quote,return,tools,verbatim,without — burying "Return Path" behind meta-instruction tokens and dropping the phrase boundary entirely. The example recall hook uses query-topics to rescue named-entity recall on instruction-heavy prompts; it falls back silently to the regex extractor if the API key or CLI is unavailable.

Claude Code integration — add to your MCP config and it auto-starts with every session:

claude mcp add --scope user athenaeum -- athenaeum serve --path ~/knowledge

The server exposes two tools:

  • remember — save observations to raw intake (append-only, never overwrites)
  • recall — search the compiled wiki by keyword (frontmatter-weighted scoring)

Raw files written by remember are compiled into wiki entities on the next athenaeum run.

Transparent sidecar (hooks)

For a fully transparent experience where Claude automatically recalls context and saves observations without explicit commands, configure Claude Code hooks:

  1. Copy the example hooks from examples/claude-code/ to your scripts directory
  2. Add hook entries to ~/.claude/settings.json (see examples/claude-code/settings-snippet.json)
  3. Add CLAUDE.md instructions for proactive memory (see examples/claude-code/CLAUDE.md.example)

This gives you:

  • Auto-recall — a SQLite FTS5 index is built at session start (~300ms); each user message triggers a <50ms search that injects relevant wiki pages into context
  • Auto-remember — Claude proactively saves important facts without being asked
  • Context checkpointing — observations are saved before context window compaction

See examples/claude-code/README.md for complete setup instructions, a smoke test, and the full environment-variable reference.

Environment variables

Variable Required Description
ANTHROPIC_API_KEY Yes (unless --dry-run) API key for Tier 2/3 LLM calls
ATHENAEUM_CLASSIFY_MODEL No Override Tier 2 model (default: claude-haiku-4-5-20251001)
ATHENAEUM_WRITE_MODEL No Override Tier 3 model (default: claude-sonnet-4-6)
ATHENAEUM_TOPIC_MODEL No Override query-topic model (default: claude-haiku-4-5-20251001)
ATHENAEUM_OP_KEY_PATH No 1Password path for the session-start ANTHROPIC_API_KEY bootstrap (default: op://Agent Tools/Anthropic API Key/credential)
AUTO_RECALL No Per-turn recall on/off (hook shell env; overrides athenaeum.yaml's auto_recall). Default: true
SEARCH_BACKEND No fts5 or vector (hook shell env; overrides athenaeum.yaml's search_backend). Default: fts5
ATHENAEUM_HOOK_DEBUG No Set to 1 to log vector-backend errors from user-prompt-recall.sh to stderr

Note on shell-env overrides. AUTO_RECALL and SEARCH_BACKEND are read from the shell environment after the hook sources ~/.cache/athenaeum/config.env, so anything exported in your shell profile beats the cached config. That's intentional (it lets an adopter A/B-test a backend without editing athenaeum.yaml), but it's also the first thing to check when the hook "ignores" a config change.

Note on Claude Code auth. Claude Code's own CLAUDE_CODE_OAUTH_TOKEN is scoped to its inference endpoint and the general Anthropic Messages API rejects it with 401 OAuth authentication is currently not supported. The pipeline and the example hooks need a separate console API key — see docs/recall-architecture.md for the 1Password bootstrap pattern.

Raw file format

Raw intake files live in raw/{source}/*.md and follow the naming convention:

{timestamp}-{uuid8}.md

Example: 20240406T120000Z-aabb0011.md

Each file is a plain markdown document containing observations, notes, or session transcripts. The {source} directory name (e.g., sessions, imports) identifies the origin of the data.

Output

The pipeline produces wiki entity pages in wiki/ with YAML frontmatter:

---
uid: a1b2c3d4
type: person
name: Alice Zhang
aliases: [Alice]
access: internal
tags: [active]
created: '2024-04-06'
updated: '2024-04-06'
---

Entity pages are indexed in wiki/_index.md, grouped by type. Conflicts requiring human review are appended to wiki/_pending_questions.md.

At the end of each run, token usage and estimated costs are logged.

Known limitations (v0.2.x)

Athenaeum is pre-1.0. The following trade-offs are intentional for this release and slated for revisit in v0.3:

  • No retrieval benchmarks yet. The hybrid-search claim rests on the concrete failure modes above (proper-noun collision, no-overlap semantic queries) and production use — not on a published eval against mem0 / Letta / Zep / Cognee. If you need benchmarked recall@k on a closed corpus, pick a tool that publishes numbers. If you want a knowledge base that survives your tool choices, this is for you. PRs adding an eval harness are very welcome.
  • FTS5 index rebuilds are non-atomic and unlocked. A shell hook and the librarian run rebuilding simultaneously can race; the window is small and single-user wikis do not hit it in practice, but multi-writer safety is v0.3 work. Workaround: don't invoke athenaeum rebuild-index and athenaeum run concurrently on the same $KNOWLEDGE_ROOT.
  • The keyword search backend is a scan-on-query fallback. It reads every wiki page on every query; fine under ~1,000 entities, painful past that. Use search_backend: fts5 (default in the CLI and hooks) for any non-trivial wiki. The keyword backend exists as a zero-dependency baseline for tests and bootstrap.
  • Tier 4 (human escalation) is a file, not a workflow. Conflicts land in wiki/_pending_questions.md; you read it and decide. No PR-opening, no Slack integration, no UI — on purpose, for now.

Development

# Clone and install in development mode
git clone https://github.com/Kromatic-Innovation/athenaeum.git
cd athenaeum
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Lint
ruff check src/ tests/

License

Apache 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

athenaeum-0.2.2.tar.gz (88.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

athenaeum-0.2.2-py3-none-any.whl (48.8 kB view details)

Uploaded Python 3

File details

Details for the file athenaeum-0.2.2.tar.gz.

File metadata

  • Download URL: athenaeum-0.2.2.tar.gz
  • Upload date:
  • Size: 88.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for athenaeum-0.2.2.tar.gz
Algorithm Hash digest
SHA256 34185413f4133ef543c87ee87be3d5cee98f1357d97e6be45ace57a8deeb018d
MD5 cc10ff93aa7407dcc16800cbfd959715
BLAKE2b-256 a30180811ab8d71a87910c232d47b5ab104c757996aa2aac4f02c9b40d087033

See more details on using hashes here.

Provenance

The following attestation bundles were made for athenaeum-0.2.2.tar.gz:

Publisher: release.yml on Kromatic-Innovation/athenaeum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file athenaeum-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: athenaeum-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 48.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for athenaeum-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dd33e9ba38aa49393253aa65fd26ce7c08fae5a69359e3087c2d611eaca8e238
MD5 92e08b54944406af546937f7da511ee8
BLAKE2b-256 d4042bab73d4b1817c2c5459015873658f6113bd352cd67e45ab4aa778fcc3c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for athenaeum-0.2.2-py3-none-any.whl:

Publisher: release.yml on Kromatic-Innovation/athenaeum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page