Skip to main content

Open source knowledge management pipeline — append-only intake, tiered compilation, configurable schemas

Project description

Athenaeum

Open source knowledge management pipeline for AI agents — append-only intake, tiered compilation, configurable schemas.

Using Claude Code? Athenaeum ships a transparent memory sidecar — a SessionStart + UserPromptSubmit hook pair that auto-recalls wiki pages relevant to each prompt and lets Claude save observations without explicit /remember calls. Jump to Transparent sidecar (hooks).

Architecture

Athenaeum implements a novel approach to persistent AI agent memory:

  • Append-only intake — safety through write constraints, not trust scores
  • Wikipedia-style footnote trust — source entities build an emergent trust graph
  • Configurable observation filter — a self-improving "what to remember" prompt
  • Three types of contradiction — factual (fix), contextual (keep both), principled (revise axiom)
  • Four-tier compilation — programmatic → fast LLM → capable LLM → human escalation

Installation

pip install athenaeum

Quick start

# Initialize a knowledge directory
athenaeum init

# Or specify a custom path
athenaeum init --path ~/my-knowledge

Usage

Running the pipeline

# Run the librarian pipeline (processes raw files into wiki entities)
athenaeum run

# Dry run — inspect what would happen without writing files
athenaeum run --dry-run

# Custom paths and limits
athenaeum run \
  --raw-root ~/knowledge/raw \
  --wiki-root ~/knowledge/wiki \
  --knowledge-root ~/knowledge \
  --max-files 50 \
  --max-api-calls 200 \
  --verbose

Checking status

# Show knowledge base status (entity counts, pending files, last run)
athenaeum status
athenaeum status --path ~/my-knowledge

MCP memory server

Athenaeum includes an MCP-compatible server that gives AI agents remember and recall tools for persistent knowledge management.

# Install with MCP support
pip install athenaeum[mcp]

# Start the server
athenaeum serve --path ~/knowledge

Smoke-test the round-trip without a live session:

athenaeum test-mcp
#   PASS  remember_write
#   PASS  recall_search (keyword)
#   PASS  create_server (FastMCP)
#
# 3 passed, 0 failed

When wired to Claude Code, the agent can save facts mid-conversation:

User: Tristan's partner is Amanda; they met at Stanford GSB.

(Claude calls remember(content="Tristan's partner is Amanda; they met at Stanford GSB.", source="claude-session"))

A raw observation lands in raw/claude-session/20260417T…-…md. On the next athenaeum run, the pipeline compiles it into Tristan's wiki entity (under "Key Contacts") and Amanda's own entity if she doesn't exist yet. Later sessions can ask "who is Amanda?" and recall returns the compiled page.

Vector search (optional)

Athenaeum supports a vector search backend (chromadb + all-MiniLM-L6-v2) for semantic recall alongside the default FTS5 keyword backend.

pip install athenaeum[vector]

Enable it in athenaeum.yaml:

search_backend: vector

The recall hook runs a hybrid FTS5 + vector merge when vector is configured — FTS5 rescues short proper-noun queries that collide in vector space, vector discovers semantic neighbours with no lexical overlap. See docs/recall-architecture.md for why the hybrid is load-bearing and what invariants must not be removed.

Query-topic extraction (optional)

athenaeum query-topics "your prompt" runs a Haiku classifier that returns substantive topics and ignores meta-instructions:

$ athenaeum query-topics "Without calling any tools, quote the block about Return Path verbatim"
Return Path

Compare to the naive regex+stopword fallback, which returns block,calling,quote,return,tools,verbatim,without — burying "Return Path" behind meta-instruction tokens and dropping the phrase boundary entirely. The example recall hook uses query-topics to rescue named-entity recall on instruction-heavy prompts; it falls back silently to the regex extractor if the API key or CLI is unavailable.

Claude Code integration — add to your MCP config and it auto-starts with every session:

claude mcp add --scope user athenaeum -- athenaeum serve --path ~/knowledge

The server exposes two tools:

  • remember — save observations to raw intake (append-only, never overwrites)
  • recall — search the compiled wiki by keyword (frontmatter-weighted scoring)

Raw files written by remember are compiled into wiki entities on the next athenaeum run.

Transparent sidecar (hooks)

For a fully transparent experience where Claude automatically recalls context and saves observations without explicit commands, configure Claude Code hooks:

  1. Copy the example hooks from examples/claude-code/ to your scripts directory
  2. Add hook entries to ~/.claude/settings.json (see examples/claude-code/settings-snippet.json)
  3. Add CLAUDE.md instructions for proactive memory (see examples/claude-code/CLAUDE.md.example)

This gives you:

  • Auto-recall — a SQLite FTS5 index is built at session start (~300ms); each user message triggers a <50ms search that injects relevant wiki pages into context
  • Auto-remember — Claude proactively saves important facts without being asked
  • Context checkpointing — observations are saved before context window compaction

See examples/claude-code/README.md for complete setup instructions, a smoke test, and the full environment-variable reference.

Environment variables

Variable Required Description
ANTHROPIC_API_KEY Yes (unless --dry-run) API key for Tier 2/3 LLM calls
ATHENAEUM_CLASSIFY_MODEL No Override Tier 2 model (default: claude-haiku-4-5-20251001)
ATHENAEUM_WRITE_MODEL No Override Tier 3 model (default: claude-sonnet-4-6)
ATHENAEUM_TOPIC_MODEL No Override query-topic model (default: claude-haiku-4-5-20251001)
ATHENAEUM_OP_KEY_PATH No 1Password path for the session-start ANTHROPIC_API_KEY bootstrap (default: op://Agent Tools/Anthropic API Key/credential)
AUTO_RECALL No Per-turn recall on/off (hook shell env; overrides athenaeum.yaml's auto_recall). Default: true
SEARCH_BACKEND No fts5 or vector (hook shell env; overrides athenaeum.yaml's search_backend). Default: fts5
ATHENAEUM_HOOK_DEBUG No Set to 1 to log vector-backend errors from user-prompt-recall.sh to stderr

Note on shell-env overrides. AUTO_RECALL and SEARCH_BACKEND are read from the shell environment after the hook sources ~/.cache/athenaeum/config.env, so anything exported in your shell profile beats the cached config. That's intentional (it lets an adopter A/B-test a backend without editing athenaeum.yaml), but it's also the first thing to check when the hook "ignores" a config change.

Note on Claude Code auth. Claude Code's own CLAUDE_CODE_OAUTH_TOKEN is scoped to its inference endpoint and the general Anthropic Messages API rejects it with 401 OAuth authentication is currently not supported. The pipeline and the example hooks need a separate console API key — see docs/recall-architecture.md for the 1Password bootstrap pattern.

Raw file format

Raw intake files live in raw/{source}/*.md and follow the naming convention:

{timestamp}-{uuid8}.md

Example: 20240406T120000Z-aabb0011.md

Each file is a plain markdown document containing observations, notes, or session transcripts. The {source} directory name (e.g., sessions, imports) identifies the origin of the data.

Output

The pipeline produces wiki entity pages in wiki/ with YAML frontmatter:

---
uid: a1b2c3d4
type: person
name: Alice Zhang
aliases: [Alice]
access: internal
tags: [active]
created: '2024-04-06'
updated: '2024-04-06'
---

Entity pages are indexed in wiki/_index.md, grouped by type. Conflicts requiring human review are appended to wiki/_pending_questions.md.

At the end of each run, token usage and estimated costs are logged.

Development

# Clone and install in development mode
git clone https://github.com/Kromatic-Innovation/athenaeum.git
cd athenaeum
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Lint
ruff check src/ tests/

License

Apache 2.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

athenaeum-0.2.1.tar.gz (80.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

athenaeum-0.2.1-py3-none-any.whl (47.7 kB view details)

Uploaded Python 3

File details

Details for the file athenaeum-0.2.1.tar.gz.

File metadata

  • Download URL: athenaeum-0.2.1.tar.gz
  • Upload date:
  • Size: 80.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for athenaeum-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d830154b23f939fc4caf46ac85e8d14a02ec2256f112005395d852cc0cf78c14
MD5 19115264dbe6877cafc0e2ec0a8e8c8f
BLAKE2b-256 9896f8885118f29cab1b3ec779100dce8ee29d1dd6c21fffc68643570b32877a

See more details on using hashes here.

Provenance

The following attestation bundles were made for athenaeum-0.2.1.tar.gz:

Publisher: release.yml on Kromatic-Innovation/athenaeum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file athenaeum-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: athenaeum-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 47.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for athenaeum-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eac2257a4b102b8f47d76776138d4beac0b94e27565f7d1fa3ce7abaa7b09eb9
MD5 984f7d414010cba9d83953761dba6903
BLAKE2b-256 01effd48ac47118a1208435020589ac0f47f61a5bde2e8101ef23d29064240a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for athenaeum-0.2.1-py3-none-any.whl:

Publisher: release.yml on Kromatic-Innovation/athenaeum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page