Sheaf — Your personal knowledge layer. Paste a link, AI does the rest.
Project description
English | 中文
Sheaf
Harvest your knowledge. Bundle it. Share it.
A sheaf is a bundle of grain — the basic unit a farmer brings to market. Sheaf does the same for knowledge: gather what you read, crystallize it into structured bundles, and make it tradable. Your AI agents can search, cite, and reason over everything you've collected.
One-liner: A global bookmark manager + agent memory layer. Local-first, open-source.
Quick Start
Add Sheaf to Claude Code in one line — no Python install needed:
claude mcp add sheaf -- uvx --from sheaf-ai sheaf-mcp
Then ask your agent to collect a URL, search your knowledge base, or crystallize a topic — that's it. uvx is the npm-style runner (brew install uv / winget install astral-sh.uv); an OpenAI-compatible API key is needed — set SHEAF_API_KEY, or run uvx --from sheaf-ai sheaf config setup once.
OpenAI Codex CLI — add to ~/.codex/config.toml
[mcp_servers.sheaf]
command = "uvx"
args = ["--from", "sheaf-ai", "sheaf-mcp"]
Prefer the CLI or Python library?
pip install sheaf-ai
sheaf config setup # interactive API-key wizard (recommended)
Already have a key? Set it via environment variable instead.
macOS / Linux
export OPENAI_API_KEY=sk-...
Windows PowerShell
$env:OPENAI_API_KEY="sk-..."
Windows CMD
set OPENAI_API_KEY=sk-...
# First-time onboarding (collects 3 sample articles)
sheaf init
# Collect a link
sheaf collect https://arxiv.org/abs/2401.00000
# Search your collection
sheaf search "transformer architecture"
# Crystallize knowledge cards from collected articles
sheaf crystallize AI
No accounts. No cloud. Your data lives locally — ./data/ when run inside a project dir, otherwise ~/.sheaf/data — as Markdown + JSON. Override with SHEAF_DATA_DIR.
The Problem
You save links every day — articles, repos, papers, tutorials. 90% never get opened again.
Not because you're lazy. Because bookmarks serve human reading, not agent workflows. When you ask your coding agent "what did I read about MCP last week?", it has no idea.
Sheaf fixes this. Every link you save becomes a structured entry — a single stalk of grain. Crystallize enough of them, and you get a bundle: a portable, searchable knowledge pack any agent can consume.
What It Does
- Harvest — paste a link, Sheaf fetches, classifies, and summarizes it
- Crystallize — distill 3+ related entries into structured knowledge cards with evidence tracing
- Bundle — package cards into a portable
.sheafunit (coming soon) - Agent-ready — built-in MCP server lets any LLM agent query your knowledge base
Core Commands
sheaf collect <url> # Collect an article, paper, or webpage
sheaf search <query> # Full-text search across your collection
sheaf stats # Collection statistics with topic trends
sheaf crystallize <topic> # Crystallize knowledge cards from a topic
sheaf crystallize --list # List all crystallized cards
sheaf crystallize --semantic <q> # Semantic vector search across cards
sheaf tags # Tag statistics
sheaf weekly # Weekly summary report
sheaf insights # Cross-topic association discovery
sheaf urgent # Show entries with upcoming deadlines
sheaf mcp # Start MCP server (stdio transport)
sheaf setup --target <platform> # One-command MCP config (cursor/claude/workbuddy/windsurf)
sheaf init # First-time onboarding with demo
# API Key & Provider management
sheaf config setup # Interactive setup wizard (recommended)
sheaf config list # Show configured providers
sheaf config set-key --provider <id> # Add/update a provider key
sheaf config use <provider> # Switch default provider
Crystallize: Your Second Brain
This is Sheaf's killer feature. Instead of leaving your bookmarks to rot, sheaf crystallize synthesizes insights across multiple entries:
$ sheaf crystallize AI
Crystallizing 'AI'...
✨ 5 knowledge cards crystallized:
📌 RAG faces retrieval relevance challenges (90%)
RAG systems heavily depend on retrieval quality; errors degrade LLM output reliability.
📌 CRAG framework improves RAG robustness (95%)
CRAG introduces a retrieval evaluator, web search augmentation, and document decomposition.
📌 Retrieval granularity significantly impacts performance (90%)
Finer-grained units like propositions outperform traditional passage-level retrieval.
Each card includes:
- Confidence score (0-100%)
- Evidence tracing — which source entries contributed
- Topic provenance — what topic this card belongs to
- Tags — for filtering and cross-referencing
Use sheaf crystallize --semantic "query" for vector-based semantic search across all your cards.
MCP Server
Sheaf ships with a built-in Model Context Protocol server. Any MCP-compatible agent can query your knowledge base:
sheaf mcp
Available tools (4 core + 7 CLI-only, see Issue #91):
| Tool | Type | Description |
|---|---|---|
sheaf_collect |
Core MCP | Add a new URL to your collection |
sheaf_search |
Core MCP | Full-text + semantic search (keyword/hybrid/quick) |
sheaf_crystallize |
Core MCP | Crystallize knowledge cards from a topic |
sheaf_get_card |
Core MCP | Get full card details by ID |
sheaf_collect_batch |
CLI / on-demand | Batch-collect via sheaf collect URL1 URL2 ... |
sheaf_list |
CLI / on-demand | sheaf list [--topic T] [--tag T] [--json] |
sheaf_get |
CLI / on-demand | Full entry detail (MCP tools/call fallback) |
sheaf_correct |
CLI / on-demand | Correct classification (MCP tools/call fallback) |
sheaf_insights |
CLI / on-demand | sheaf insights — cross-topic associations |
sheaf_crosscheck |
CLI / on-demand | Fact matrix (MCP tools/call; see also sheaf matrix) |
sheaf_list_cards |
CLI / on-demand | sheaf crystallize --list |
The 4 core tools cover ~90% of automated agent workflows and keep the MCP schema
lean (~1.5k tokens vs ~5k before). The 7 demoted tools keep their handlers
registered for backward compat — call them via tools/call, or re-expose the
full set with SHEAF_MCP_TOOLS=all sheaf mcp (power users / migration). For
Claude Code & Codex, sheaf setup also deploys a bundled skill / AGENTS note
(sheaf-guide.md / AGENTS.sheaf.md) so the agent knows when to use the
sheaf CLI for the demoted tools.
Agent Integration (One Command)
Connect Sheaf to your AI coding agent in a single step — this writes the MCP config and deploys the bundled skill / agents note (Claude Code & Codex):
# Cursor / Windsurf / WorkBuddy
sheaf setup --target cursor # writes .cursor/mcp.json
sheaf setup --target windsurf # writes .windsurf/mcp.json
sheaf setup --target workbuddy # writes ~/.workbuddy/mcp.json
# Claude Code — writes ~/.claude.json + deploys ~/.claude/skills/sheaf-guide.md
sheaf setup --target claude
# OpenAI Codex — writes ~/.codex/config.toml + deploys ~/.codex/AGENTS.sheaf.md
sheaf setup --target codex
# Auto-detect from CWD + installed agents
sheaf setup # detects claude/codex/cursor/windsurf/workbuddy
Zero-install alternative (no pip install needed):
# Claude Code
claude mcp add sheaf -- uvx --from sheaf-ai sheaf-mcp
# Codex — add to ~/.codex/config.toml:
# [mcp_servers.sheaf]
# command = "uvx"
# args = ["--from", "sheaf-ai", "sheaf-mcp"]
Preview without writing:
sheaf setup --target codex --dry-run
See docs/mcp-setup.md for detailed platform guides and troubleshooting.
What You Can Collect
Sheaf handles more than just web articles:
| Input | Example | What Sheaf does |
|---|---|---|
| Web articles | sheaf collect https://arxiv.org/abs/2401.00000 |
Fetches full text, extracts title/author/abstract, classifies topic |
| AI chat shares | sheaf collect https://chatgpt.com/share/... |
Extracts the Q&A conversation, structures it as reusable knowledge |
| WeChat / Zhihu posts | sheaf collect https://mp.weixin.qq.com/s/... |
Handles paywalls and dynamic rendering via Playwright fallback |
| Pasted text | sheaf collect --text "Key insight..." |
Wraps freeform text into a structured entry with auto-classification |
Under the hood, every input goes through the same pipeline: fetch → classify → summarize → store. The output is always a structured entry your agents can search and cite.
Exit Codes (Agent-Native)
Sheaf returns semantic exit codes so agents can programmatically branch on error type instead of parsing stderr:
| Code | Name | Meaning |
|---|---|---|
| 0 | SUCCESS |
Operation completed successfully |
| 1 | PARTIAL |
Partial success (e.g. batch ops with skips) |
| 2 | DUPLICATE |
Entry already exists (dedup skip) |
| 3 | QUALITY |
Content quality gate rejected the input |
| 4 | NETWORK |
Network connectivity or API call failure |
| 5 | CONFIG |
Missing API key, invalid URL, or bad config |
| 6 | LLM |
LLM API failure (rate limit, bad response) |
| 7 | STORAGE |
File I/O or data storage failure |
In --json mode, error payloads include exit_code, exit_code_name, and error_type fields for programmatic introspection:
{"success": false, "error": "Config error: missing API key", "exit_code": 5, "exit_code_name": "CONFIG", "error_type": "ConfigError", "hint": "Set SHEAF_API_KEY"}
Architecture
URL → fetch → classify → summarize → store → query
↓ ↓ ↓ ↓
3-strategy LLM tags summary JSONL + MD
fallback + topics + deadline index
↓
crystallize → KnowledgeCard → EmbeddingEngine
↓ ↓
CLI/MCP semantic search
| Module | Purpose |
|---|---|
sheaf_ai/ |
Core — pipeline, storage, search, CLI, MCP server, crystallize engine |
sheaf_cards/ |
Knowledge card engine — base types, embeddings, generation |
prompts/ |
LLM prompt templates (classify, summarize, crystallize) |
data/ |
Local knowledge base (JSONL + Markdown, gitignored) |
Privacy & Local-First
Your data never leaves your machine unless you choose to.
- All content stored locally —
./data/inside a project dir, otherwise~/.sheaf/data(override viaSHEAF_DATA_DIR) - LLM calls go to your chosen API provider
- No telemetry, no analytics, no accounts
- Markdown + JSONL format — fully portable, zero lock-in
Configuration
Sheaf supports any OpenAI-compatible API and offers three ways to configure your keys:
Option 1: Interactive Wizard (Recommended)
sheaf config setup
Guides you through selecting a provider, entering your API key (hidden input), and setting defaults. Keys are stored securely in ~/.sheaf/config.json with restricted file permissions.
Option 2: Environment Variables
Best for CI/CD or temporary use. Sheaf reads standard .env files automatically.
macOS / Linux
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.openai.com/v1 # optional
Windows PowerShell
$env:OPENAI_API_KEY="sk-..."
$env:OPENAI_BASE_URL="https://api.openai.com/v1" # optional
Windows CMD
set OPENAI_API_KEY=sk-...
set OPENAI_BASE_URL=https://api.openai.com/v1 # optional
Or create a .env file in your working directory:
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1
Option 3: CLI Config Commands
# Add/update a provider
sheaf config set-key --provider deepseek
sheaf config set-key --provider siliconflow
# Switch default provider
sheaf config use deepseek
# List configured providers
sheaf config list
# Remove a provider
sheaf config remove groq
Supported Providers
| Provider | Key Env Var | Default Model |
|---|---|---|
| OpenAI | OPENAI_API_KEY |
gpt-4o |
| DeepSeek | DEEPSEEK_API_KEY |
deepseek-chat |
| SiliconFlow | SILICONFLOW_API_KEY |
deepseek-ai/DeepSeek-V3.2 |
| Together AI | TOGETHER_API_KEY |
meta-llama/Llama-3.3-70B-Instruct-Turbo |
| Groq | GROQ_API_KEY |
llama-3.3-70b-versatile |
Set the default provider via sheaf config use <id> or the DEFAULT_PROVIDER environment variable.
Requirements
- Python 3.10+
- An LLM API key — any OpenAI-compatible endpoint
- Playwright Chromium (optional, for JS-heavy sites):
pip install -e ".[browser]" && playwright install chromium
Development
git clone https://github.com/zhelunSun/sheaf-ai.git
cd sheaf-ai
python -m pip install -e ".[dev]"
python -m pytest tests/ -q # 988 passed, 19 skipped
python -m ruff check sheaf_ai/ tests/ sheaf_cards/
Dependencies are managed through pyproject.toml extras. Use .[dev] for local development, .[server] for the HTTP API, and .[browser] for Playwright-based fetching.
Alpha Status
Sheaf is in early alpha. The core collect → search → crystallize → MCP pipeline works and is tested with 988 passing tests. We're validating with real users before beta.
Chrome Extension
The browser extension provides one-click collection and search from any webpage:
- Start the local API:
sheaf serve - Load the extension from
extension/(Chrome → Manage Extensions → Developer mode → Load unpacked) - Click the Sheaf icon to collect the current page or search your knowledge base
- Right-click any page or link → "🌾 Collect with Sheaf"
- Keyboard shortcut:
Alt+Shift+S
The extension connects to your local sheaf serve instance. Its manifest version is independent from the Python package.
Try it: save 20+ links, run sheaf crystallize <topic>, then ask your agent to find them. If it works for you, open an issue or discussion to tell us what you'd change.
License
A sheaf is a bundle of harvested grain — the unit a farmer brings to market. In mathematics, a sheaf attaches local data to open sets and glues them into a global picture. Sheaf the tool does both: gather scattered knowledge into coherent bundles, ready for your agents to consume or for you to share.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sheaf_ai-0.6.1.tar.gz.
File metadata
- Download URL: sheaf_ai-0.6.1.tar.gz
- Upload date:
- Size: 267.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32fce1c1d0a3b5486011a61d5c7eaff807cb5cce904b009fb451e4c5d118611c
|
|
| MD5 |
4372ded428918ec36d29bbb5f0f6c434
|
|
| BLAKE2b-256 |
749541ece8ff317e76a847747209a9e0e62a6aa0b7e4b942ec034e6f243d609f
|
Provenance
The following attestation bundles were made for sheaf_ai-0.6.1.tar.gz:
Publisher:
publish.yml on zhelunSun/sheaf-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sheaf_ai-0.6.1.tar.gz -
Subject digest:
32fce1c1d0a3b5486011a61d5c7eaff807cb5cce904b009fb451e4c5d118611c - Sigstore transparency entry: 1871145342
- Sigstore integration time:
-
Permalink:
zhelunSun/sheaf-ai@f8da76b35f7bbae7661e07c5b9ec59c9ef581f81 -
Branch / Tag:
refs/tags/v0.6.1 - Owner: https://github.com/zhelunSun
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f8da76b35f7bbae7661e07c5b9ec59c9ef581f81 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sheaf_ai-0.6.1-py3-none-any.whl.
File metadata
- Download URL: sheaf_ai-0.6.1-py3-none-any.whl
- Upload date:
- Size: 202.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf82699d19dd810e0b6eb59b69e28ae0934ae9fd8d6d80298b1aeca0c4854b5f
|
|
| MD5 |
9eff18fafd46459aeb1196f4abe8fef6
|
|
| BLAKE2b-256 |
12fdc7c9128366a1fefb3490c64e78b1399feb5873cf17d9b0e0fc4835bbb341
|
Provenance
The following attestation bundles were made for sheaf_ai-0.6.1-py3-none-any.whl:
Publisher:
publish.yml on zhelunSun/sheaf-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sheaf_ai-0.6.1-py3-none-any.whl -
Subject digest:
cf82699d19dd810e0b6eb59b69e28ae0934ae9fd8d6d80298b1aeca0c4854b5f - Sigstore transparency entry: 1871145433
- Sigstore integration time:
-
Permalink:
zhelunSun/sheaf-ai@f8da76b35f7bbae7661e07c5b9ec59c9ef581f81 -
Branch / Tag:
refs/tags/v0.6.1 - Owner: https://github.com/zhelunSun
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f8da76b35f7bbae7661e07c5b9ec59c9ef581f81 -
Trigger Event:
push
-
Statement type: