Sheaf — Your personal knowledge layer. Paste a link, AI does the rest.
Project description
English | 中文
Sheaf
Harvest your knowledge. Bundle it. Share it.
Sheaf turns the links you save every day into a searchable knowledge base your AI agents can actually use. Paste a link — it fetches, classifies, summarizes. Crystallize many into portable knowledge cards. Local-first, open-source, no cloud.
A sheaf is a bundle of grain a farmer brings to market. Sheaf does the same for knowledge — gather, bundle, trade.
Quick Start
1 · Install (works everywhere):
pip install sheaf-ai
sheaf config setup # one-time: pick a provider, paste an OpenAI-compatible API key
2 · Connect your agent:
sheaf setup # auto-detects Claude Code / Codex / Cursor / Windsurf / WorkBuddy,
# writes the MCP config + deploys the bundled skill
3 · Use it:
sheaf collect https://arxiv.org/abs/2401.00000 # save a link
sheaf search "transformer architecture" # search your collection
sheaf crystallize AI # distill knowledge cards
No accounts, no cloud. Your data lives locally — ./data/ inside a project, else ~/.sheaf/data — as Markdown + JSON. Override with SHEAF_DATA_DIR.
Even faster on Claude Code — no install at all:
claude mcp add sheaf -- uvx --from sheaf-ai sheaf-mcp
uvxis the npm-style runner (brew install uv/winget install astral-sh.uv). Then set a key viauvx --from sheaf-ai sheaf config setup.
Why
You save links every day — articles, papers, repos, tutorials. 90% never get opened again.
Not because you're lazy. Bookmarks serve human reading, not agent workflows. Ask your coding agent "what did I read about MCP last week?" — it has no idea.
Sheaf fixes this. Every link becomes a structured entry. Crystallize enough of them and you get knowledge cards — portable, searchable, agent-ready.
Features
| What it does | |
|---|---|
| 🌾 Harvest | Paste a link (or --text for a note). Sheaf fetches + classifies + summarizes — web articles, arXiv papers, WeChat / Zhihu, ChatGPT shares, pasted insights. |
| ✨ Crystallize | Distill 3+ entries into knowledge cards with confidence scores and evidence tracing. |
| 🤖 Agent-ready | Built-in MCP server — any agent searches, cites, and reasons over your knowledge base. |
| 🔒 Local-first | No cloud, no telemetry, no accounts. Your data stays on your machine. |
Crystallize — your second brain
Sheaf's killer feature. Instead of letting bookmarks rot, sheaf crystallize synthesizes insights across entries:
$ sheaf crystallize AI
✨ 5 knowledge cards crystallized:
📌 RAG faces retrieval relevance challenges (90%)
RAG systems heavily depend on retrieval quality; errors degrade output reliability.
📌 CRAG framework improves RAG robustness (95%)
CRAG introduces a retrieval evaluator, web search augmentation, and document decomposition.
Each card carries a confidence score, evidence tracing (which sources contributed), topic provenance, and tags. Semantic-search across all of them with sheaf crystallize --semantic "query".
Connect Your Agent
sheaf setup writes the right MCP config for each tool and deploys a bundled skill / agents-note so the agent knows how to use Sheaf:
sheaf setup --target claude # ~/.claude.json + ~/.claude/skills/sheaf-guide.md
sheaf setup --target codex # ~/.codex/config.toml + ~/.codex/AGENTS.sheaf.md
sheaf setup --target cursor # .cursor/mcp.json (also: windsurf, workbuddy)
sheaf setup # auto-detect from CWD + installed agents
sheaf setup --target codex --dry-run # preview without writing
MCP + skill are one install.
sheaf setupdeploys the MCP server and the skill together — the skill is what tells your agent when to proactively capture a note or recall from the KB, so it's not optional decoration. Prefer it over the bareuvxone-liner (which wires MCP only). Do it all at once — key + MCP + skill + health check — withsheaf init --auto.
The MCP server exposes 4 core tools — sheaf_collect, sheaf_search, sheaf_crystallize, sheaf_get_card — covering ~90% of automated agent workflows and kept lean by design (~1.5k vs ~5k tokens). 7 more stay reachable via the sheaf CLI (--json) or MCP tools/call; re-expose all with SHEAF_MCP_TOOLS=all. Full tool matrix + rationale: Issue #91. Setup details: docs/mcp-setup.md.
Agents can also browse the knowledge base read-only via MCP Resources — sheaf://entries/recent, sheaf://entries/{id}, sheaf://stats, sheaf://tags (resources/list / resources/read). Spec: docs/agent-query-spec.md.
Commands
sheaf help # grouped command overview
sheaf collect <url> | --text "…" # save a link, or a pasted note (tagged 'note')
sheaf search <query> # full-text search (results show the entry id)
sheaf list [--page N] # browse entries, paginated
sheaf get <id> # full detail of one entry
sheaf crystallize <topic> # crystallize knowledge cards from a topic
sheaf stats | tags | weekly | insights | urgent
sheaf mcp # start the MCP server (stdio)
Run sheaf <cmd> --help for per-command options.
Agent-native: semantic exit codes
Sheaf returns typed exit codes so agents can branch on error type instead of parsing stderr:
| Code | Name | Meaning |
|---|---|---|
| 0 | SUCCESS |
completed successfully |
| 1 | PARTIAL |
partial success (e.g. batch with skips) |
| 2 | DUPLICATE |
entry already exists (dedup skip) |
| 3 | QUALITY |
quality gate rejected the input |
| 4 | NETWORK |
network / API connectivity failure |
| 5 | CONFIG |
missing key, invalid URL, bad config |
| 6 | LLM |
LLM API failure (rate limit, bad response) |
| 7 | STORAGE |
file I/O / storage failure |
In --json mode, error payloads carry exit_code, exit_code_name, error_type, and hint for programmatic introspection.
Privacy & Local-First
Your data never leaves your machine unless you choose to.
- All content stored locally —
./data/inside a project, otherwise~/.sheaf/data(override:SHEAF_DATA_DIR) - LLM calls go to your chosen API provider — nothing routed through Sheaf
- No telemetry, no analytics, no accounts
- Markdown + JSONL format — fully portable, zero lock-in
Configuration
sheaf config setup is the recommended path — interactive, OS-agnostic, stores keys in ~/.sheaf/config.json with restricted permissions. Any OpenAI-compatible endpoint works:
| Provider | Key env var | Default model |
|---|---|---|
| OpenAI | OPENAI_API_KEY |
gpt-4o |
| DeepSeek | DEEPSEEK_API_KEY |
deepseek-chat |
| SiliconFlow | SILICONFLOW_API_KEY |
deepseek-ai/DeepSeek-V3.2 |
| Together AI | TOGETHER_API_KEY |
meta-llama/Llama-3.3-70B-Instruct-Turbo |
| Groq | GROQ_API_KEY |
llama-3.3-70b-versatile |
sheaf config use deepseek # switch default provider
sheaf config list # show configured providers
Advanced: env vars / .env (CI or temporary use)
Sheaf auto-reads a .env file in your working directory (see .env.example). Or set per-shell — the only OS difference is the syntax:
# macOS / Linux
export OPENAI_API_KEY=sk-...
# Windows PowerShell: $env:OPENAI_API_KEY="sk-..."
# Windows CMD: set OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.openai.com/v1 # optional — for non-OpenAI endpoints
Architecture
URL → fetch → classify → summarize → store → query
(3-strategy (LLM tags (summary (JSONL + MD
fallback) + topics) + deadline) index)
↓
crystallize → KnowledgeCard → EmbeddingEngine → semantic search
| Module | Purpose |
|---|---|
sheaf_ai/ |
Core — pipeline, storage, search, CLI, MCP server, crystallize engine |
sheaf_cards/ |
Knowledge card engine — base types, embeddings, generation |
prompts/ |
LLM prompt templates (classify, summarize, crystallize) |
data/ |
Local knowledge base (JSONL + Markdown, gitignored) |
Requirements
- Python 3.10+
- An OpenAI-compatible API key
- Playwright Chromium (optional, JS-heavy sites):
pip install -e ".[browser]" && playwright install chromium
Development
git clone https://github.com/zhelunSun/sheaf-ai.git && cd sheaf-ai
python -m pip install -e ".[dev]"
python -m pytest tests/ -q # 1024 passed, 19 skipped
python -m ruff check sheaf_ai/ tests/ sheaf_cards/
Extras: .[dev] for local dev, .[server] for the HTTP API, .[browser] for Playwright fetching.
Status & Chrome Extension
Sheaf is early alpha. The collect → search → crystallize → MCP pipeline works and is covered by 1024 passing tests. We're validating with real users before beta.
A Chrome extension (extension/) adds one-click collect + search from any page: start the local API with sheaf serve, load extension/ unpacked (Chrome → Manage Extensions → Developer mode), then Alt+Shift+S or right-click any page → "🌾 Collect with Sheaf".
Try it: save 20+ links, run sheaf crystallize <topic>, then ask your agent to find them. If it clicks for you, open an issue or discussion and tell us what you'd change.
⭐ If Sheaf saves you time, a star on GitHub helps others find it.
License
A sheaf is a bundle of harvested grain — the unit a farmer brings to market. In mathematics, a sheaf attaches local data to open sets and glues them into a global picture. Sheaf the tool does both: gather scattered knowledge into coherent bundles, ready for your agents to consume or for you to share.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sheaf_ai-0.7.0.tar.gz.
File metadata
- Download URL: sheaf_ai-0.7.0.tar.gz
- Upload date:
- Size: 277.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
206969d026dce2f4b244e747c39c223b9f9088aff8dcb6c1b61a70638070eaae
|
|
| MD5 |
237cfe160c3fbe3bc59b23af3bdeb423
|
|
| BLAKE2b-256 |
fa925a4f0db9b7807a01c5150f38b386103f3fba0f8d498311f788d693b37ee1
|
Provenance
The following attestation bundles were made for sheaf_ai-0.7.0.tar.gz:
Publisher:
publish.yml on zhelunSun/sheaf-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sheaf_ai-0.7.0.tar.gz -
Subject digest:
206969d026dce2f4b244e747c39c223b9f9088aff8dcb6c1b61a70638070eaae - Sigstore transparency entry: 1899046991
- Sigstore integration time:
-
Permalink:
zhelunSun/sheaf-ai@1eb549716aac9ae588c41218d78a9d6496742389 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/zhelunSun
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1eb549716aac9ae588c41218d78a9d6496742389 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sheaf_ai-0.7.0-py3-none-any.whl.
File metadata
- Download URL: sheaf_ai-0.7.0-py3-none-any.whl
- Upload date:
- Size: 208.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f9f66a3fa8303fdba0750940954a423a0916a4b2391a22bdc143d588b55d41c
|
|
| MD5 |
d4f16f5c529e92720d258307c34c5847
|
|
| BLAKE2b-256 |
3b4f0a8d5edbfcde09a4827c0e02b7c67859f2025892cbba17271cfae6084463
|
Provenance
The following attestation bundles were made for sheaf_ai-0.7.0-py3-none-any.whl:
Publisher:
publish.yml on zhelunSun/sheaf-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sheaf_ai-0.7.0-py3-none-any.whl -
Subject digest:
6f9f66a3fa8303fdba0750940954a423a0916a4b2391a22bdc143d588b55d41c - Sigstore transparency entry: 1899047150
- Sigstore integration time:
-
Permalink:
zhelunSun/sheaf-ai@1eb549716aac9ae588c41218d78a9d6496742389 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/zhelunSun
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1eb549716aac9ae588c41218d78a9d6496742389 -
Trigger Event:
push
-
Statement type: