Anchor — agent-first knowledge canvas with PDF ingest, FMU simulation, and source-grounded provenance
Project description
Anchor · agent-first knowledge canvas
A canvas your AI agent can actually drive. Drop a PDF datasheet onto a workspace, ask the agent for the operating limits, get a grounded spec table where every value points back to its source page+bbox. Wire those values into a simulation. No managed cloud, no vendor lock-in, your data stays on your laptop.
Anchor is two things: a generic canvas primitive (workspace, nodes, edges, real-time multi-client sync over MCP/HTTP/SSE) and a PDF-ingestion extension (medallion bronze→silver→gold pipeline producing structured, page-and-bbox-anchored regions). Today PDFs are the canonical use case; the canvas itself is domain-agnostic and other extensions (transcription, code, web pages) will land alongside.
New here? Walk through docs/TUTORIAL.md — five minutes from uv tool install to "agent fills my engineering specs while I watch".
Install
You need: Python ≥ 3.12, Node ≥ 20 (only for the frontend). macOS or Linux today; Windows on the roadmap.
From a built wheel (recommended)
# build the frontend first so it ships inside the wheel
pnpm --dir web install
pnpm --dir web build
# build the Python wheel
uv build --wheel
# → dist/anchor_kb-0.2.0-py3-none-any.whl
# install once, run forever
uv tool install ./dist/anchor_kb-0.2.0-py3-none-any.whl
anchor version # → 0.2.0
anchor and anchor-mcp are now on your PATH globally.
From source (for development)
git clone https://github.com/Novia-RDI-Seafaring/anchor-kb-ui-RAG
cd anchor-kb-ui-RAG
uv sync --extra dev # `--extra dev` adds pytest, ruff, import-linter
pnpm --dir web install
# dev mode: two processes
uv run anchor serve &
pnpm --dir web dev
# → backend on :8002, Vite HMR on :5173
Default --data-dir is ~/anchor-data; override per-command or via
ANCHOR_DATA_DIR.
Optional extras:
| Extra | Install | Adds |
|---|---|---|
fmus |
uv sync --extra fmus |
FMU simulation runtime (fmpy). Without it, FMU tools fail at call time. |
Quick start
# 0. One-shot: ingest the bundled LKH-5 sample, seed a `demo` workspace
# with six placeholder spec slots, and start the server.
anchor demo
# Or step by step:
# 1. Pick a folder for your data; create your first canvas
anchor canvas create my-first-canvas --data-dir ~/anchor-data
# 2. Start the server (serves the canvas + the API + MCP-SSE on one port)
anchor serve --data-dir ~/anchor-data
# 3. (in another terminal) Ingest a PDF
anchor ingest /path/to/datasheet.pdf --data-dir ~/anchor-data
# 4. Open http://localhost:8002/c/my-first-canvas in your browser
See docs/TUTORIAL.md for a walked-through anchor demo → "agent fills the placeholders" tour.
That's the whole loop. Every PDF you ingest becomes a structured set of regions on disk; every canvas you create is a folder you can zip and email.
Using Anchor with an AI agent (Claude Code, Cursor, your own)
Anchor exposes its tools over MCP (Model Context Protocol). One command registers it with Claude Code:
anchor install claude-code --data-dir ~/anchor-data
This writes:
~/.claude/mcp.json— the MCP server entry pointing at youranchor-mcpbinary~/.claude/skills/anchor/SKILL.md— a skill description so Claude knows when to invoke Anchor
Restart Claude Code (Cmd+Q, reopen). In any conversation, run /mcp and you should see anchor listed with 14 tools (5 ingest + 9 canvas). Then talk normally:
"Ingest the PDF at ~/Downloads/lkh-pump.pdf and create a canvas called pump-analysis with a document node for it."
"What does the document say about max inlet pressure for the LKH-5 at 50 Hz? Place the answer as a fact card on the pump-analysis canvas, with an evidence edge back to the source page."
Claude calls the MCP tools directly. Your browser tab on localhost:8002/c/pump-analysis, if open, sees nodes appear live via SSE. Multi-client real-time sync between agents and humans is the default.
For Cursor or any other MCP-speaking harness:
anchor install cursor --data-dir ~/anchor-data
# or print the install plan without writing anything:
anchor install print
Where data lives
Every canvas is a folder. Every document is a folder. Both shareable as zips, both diffable in git.
~/anchor-data/
├─ bronze/ # raw PDFs (your originals)
│ └─ datasheet.pdf
├─ silver/<doc-slug>/ # docling extraction + per-page markdown + page PNGs
│ ├─ index.json # outline, tables, figures
│ ├─ pages.meta.json
│ └─ pages/{1.md, 1.png, …}
├─ gold/<doc-slug>/ # structured regions with page + bbox provenance
│ └─ pages/{1.regions.json, 1/r1-spec-block.png, …}
└─ canvases/<canvas-slug>/
├─ meta.json
├─ state.json # latest snapshot
└─ events.jsonl # append-only log; every action ever taken
This layout is the contract. You can hand-edit JSON files, copy a canvas folder to another machine, or version-control the whole thing.
Configuration
Anchor reads its config from environment variables prefixed ANCHOR_:
| Variable | Default | Purpose |
|---|---|---|
ANCHOR_DATA_DIR |
~/anchor-data |
Where canvases + documents live |
ANCHOR_HTTP_PORT |
8002 |
Backend HTTP/SSE port |
ANCHOR_HTTP_HOST |
0.0.0.0 |
Backend listen host |
ANCHOR_OPENAI_API_KEY |
(unset) | Optional — enables LLM polish + region extraction in the gold layer |
ANCHOR_OPENAI_BASE_URL |
(unset) | Override the OpenAI endpoint. Point at Azure OpenAI, Ollama (http://localhost:11434/v1), vLLM, LM Studio, ... |
ANCHOR_POLISH_MODEL |
gpt-5.4 |
Model name for page-MD polishing |
ANCHOR_REGION_MODEL |
gpt-5.4 |
Model name for region extraction |
ANCHOR_EMBED_MODEL |
BAAI/bge-small-en-v1.5 |
Local sentence-transformer model used for semantic search. Recorded in every embeddings.json so cross-model search refuses to mix vectors. |
ANCHOR_DPI |
150 |
Render DPI for page images |
ANCHOR_LOG_LEVEL |
INFO |
Logging |
If you don't set ANCHOR_OPENAI_API_KEY, ingest still produces silver (deterministic Docling extraction + per-page markdown). Gold extraction (LLM-driven structured regions) is skipped. The system stays useful without an API key — silver is the workable substrate; gold is the polish.
For Azure / Ollama / local-LLM recipes, see docs/ADOPTION.md.
Commands
anchor serve [--data-dir DIR] [--host HOST] [--port PORT]
anchor ingest PDF_PATH [--data-dir DIR] [--skip-polish] [--skip-regions]
anchor list [--data-dir DIR]
anchor index SLUG [--data-dir DIR]
anchor regions SLUG [--page N] [--data-dir DIR]
anchor page-text SLUG PAGE [--data-dir DIR]
anchor embed [SLUG] [--overwrite] [--data-dir DIR]
anchor search "<query>" [--k N] [--data-dir DIR]
anchor canvas list [--data-dir DIR]
anchor canvas create SLUG [--title TITLE] [--data-dir DIR]
anchor install claude-code [--data-dir DIR] [--dry-run]
anchor install cursor [--data-dir DIR] [--dry-run]
anchor install print [--data-dir DIR]
anchor extensions list [--data-dir DIR] [--verbose]
anchor extensions info NAME [--data-dir DIR]
anchor extensions add MANIFEST_PATH [--scope system|project] [--data-dir DIR] [--force]
anchor extensions remove NAME [--scope system|project] [--data-dir DIR]
anchor extensions discover [--data-dir DIR]
anchor extensions schema
anchor version
anchor-mcp [--data-dir DIR] runs the MCP server over stdio (used by Claude Code's MCP harness; you don't normally invoke it yourself).
Architecture (one paragraph)
Anchor is a hexagonal modular monolith. Pure domain code in core/ (no I/O, no framework imports — enforced by lint-imports). Concrete protocol implementations in infra/. Transport adapters in adapters/ (HTTP, MCP, CLI, SSE). The Python wheel ships the React frontend bundle inside it (anchor/_web_dist/) so one process serves both the API and the UI. State changes are events, persisted to events.jsonl per canvas, broadcast to every subscriber (agents on MCP, browsers on SSE) within ~50 ms. The full architecture diagram is at paperx/figures/anchor_v2_architecture.png.
Extensions and the Open Ingestion Protocol
Anchor's canvas is one OIP consumer. PDF ingestion is one OIP producer, bundled with this build. The protocol — specified at github.com/Novia-RDI-Seafaring/OIP — is governance-neutral: any tool that produces ingested knowledge in OIP shape can plug in, and any OIP-aware consumer can read its output. A transcription tool, a code-region extractor, a web crawler, your own ingestion logic — none of them need to import Anchor; they just ship an OIP manifest at a known location and Anchor picks them up.
The CLI surfaces this:
anchor extensions list # what producers can this Anchor see?
anchor extensions discover # where does it look for manifests?
anchor extensions add <path-to-manifest.json> # register a new producer (system-wide)
anchor extensions schema # print a starter manifest to edit
anchor extensions info anchor-pdfs # full manifest for one producer
Discovery, in priority order:
- Per-data-dir —
<data-dir>/.oip/producers.d/*.json(highest priority; bound to a specific workspace tree) - System-wide —
~/.config/oip/producers.d/*.json(any installer can drop a manifest here; visible to every OIP consumer on the machine) - Bundled — compiled into this Anchor wheel (currently just
anchor-pdfs)
For implementation status: today, an OIP-registered producer is visible in extensions list but Anchor doesn't yet spawn external producer MCP servers and proxy their tools. That's the next engineering lift — see the OIP repo for the spec and EXTENSIONS.md for Anchor's host-side roadmap.
Tests
uv sync --extra dev # one-time: install pytest/ruff/import-linter
uv run pytest # ~315 backend tests
uv run lint-imports # 6 dependency-rule contracts
pnpm --dir web test # ~180 web tests (Vitest)
pnpm --dir web exec tsc --noEmit # web typecheck
The test seam is function-based pytest (matches the legacy backend/tests/ style) with in-memory implementations of every port. Real I/O tests use tmp_path. The frontend tests cover canvas primitives, the SSE event store, and the inline-edit hooks.
Status & roadmap
v0.2 (current): canvas primitive + PDF ingestion in one package, real-time SSE sync, MCP integration, skill installer for Claude Code/Cursor, ~315 Python + ~180 web tests, hexagonal contracts enforced.
Near-term: real pnpm build integration in the wheel build, port the remaining 14 v1 node renderers (image, FMU, plot, model, …), assets system (SVG/PNG upload + serve), screenshot mechanism (browser-as-screenshotter via the EventBus), viewport/visibility math, lock state on nodes.
Mid-term: split the canvas primitive (anchor-canvas) and PDF extension (anchor-canvas-pdfs) into separately-publishable packages, formal extension contract for third-party authors, per-project anchor.toml for declarative extension sets, PyPI publication.
Longer term: other ingestion extensions (audio/video transcription, code, web), shared org docs / personal canvases topology, optional Postgres event store for very large workspaces.
Security model — read before exposing
Anchor's HTTP server is unauthenticated by design. It edits local engineering data (workspaces, documents, FMU files) and is meant to run on your own machine.
- Default bind is
127.0.0.1(loopback). Nothing else on the LAN can reach it unless you pass--host 0.0.0.0. - CORS is restricted to the dev Vite origin (
localhost:5173); setANCHOR_CORS_ORIGINS=https://your-hostfor explicit overrides. - Workspace slugs and upload filenames are policy-checked and containment-asserted before they hit disk — the v2 codebase does not trust client-supplied paths.
If you want to share an Anchor instance on a network, add your own reverse proxy with auth in front of it (Tailscale, OAuth proxy, basic-auth nginx, ...). Don't expose the unauthenticated port directly.
Limitations (v0.2)
These extensions are intentionally rough; we ship them so you can see the shape and contribute, not as finished features:
anchor_cad— parametric-CAD producer (jscad/openSCAD) ships as a proof of concept; full feature parity with STEP/STL viewing is on the roadmap. SVG export still has a known font-handling bug.anchor_sysml— SysML import (BSD-3-Clause fixtures from the OMG reference) and export to SVG/markdown are experimental; we'll swap the hand-rolled IR for the official Pydantic model when that lands.anchor_fmus— FMU simulation requiresfmpy(install viauv pip install 'anchor[fmus]'). Without it the extension fails closed; setANCHOR_FMU_DEMO=1to use the synthetic-output runtime (every result is stampedsynthetic=trueso the UI can warn you).
License
MIT — see LICENSE.
Contributing
PRs welcome on the feat/architecture branch. Run uv run pytest && uv run lint-imports before pushing. The extensions/ folder convention for new extensions lands once the canvas/extension split is finalised — see EXTENSIONS.md for the proposed contract.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anchor_kb-0.2.0.tar.gz.
File metadata
- Download URL: anchor_kb-0.2.0.tar.gz
- Upload date:
- Size: 905.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b0f59241ec35d8ec4cbbc86c3bc56ddd77b336a7cc2b6ac59a13485cbcac93e
|
|
| MD5 |
731b738bd354b8ad498bc666d1297aea
|
|
| BLAKE2b-256 |
747d4536ec0d40dd49adf5cfeb09e63af066c9aea3f4790c339b7d57bc5fc276
|
File details
Details for the file anchor_kb-0.2.0-py3-none-any.whl.
File metadata
- Download URL: anchor_kb-0.2.0-py3-none-any.whl
- Upload date:
- Size: 693.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31fe089c204de16869f8646a5e6c774c84846527beea4e2c868dcee5a9c7ec10
|
|
| MD5 |
e780149aed3a05a84919d088c6c3bbae
|
|
| BLAKE2b-256 |
5e4a9a5b62ccef385a53bbad2025f1574f5e067ca8611ce983c82ef8f4bb6f69
|