Skip to main content

Sheaf — Your personal knowledge layer. Paste a link, AI does the rest.

Project description

English | 中文

Sheaf Logo

Sheaf

Harvest your knowledge. Bundle it. Share it.

Python 3.10+ License: Apache 2.0 Tests PyPI Python Version


A sheaf is a bundle of grain — the basic unit a farmer brings to market. Sheaf does the same for knowledge: gather what you read, crystallize it into structured bundles, and make it tradable. Your AI agents can search, cite, and reason over everything you've collected.

One-liner: A global bookmark manager + agent memory layer. Local-first, open-source.

Quick Start

Add Sheaf to Claude Code in one line — no Python install needed:

claude mcp add sheaf -- uvx --from sheaf-ai sheaf-mcp

Then ask your agent to collect a URL, search your knowledge base, or crystallize a topic — that's it. uvx is the npm-style runner (brew install uv / winget install astral-sh.uv); an OpenAI-compatible API key is needed — set SHEAF_API_KEY, or run uvx --from sheaf-ai sheaf config setup once.

OpenAI Codex CLI — add to ~/.codex/config.toml
[mcp_servers.sheaf]
command = "uvx"
args = ["--from", "sheaf-ai", "sheaf-mcp"]

Prefer the CLI or Python library?

pip install sheaf-ai
sheaf config setup          # interactive API-key wizard (recommended)

Already have a key? Set it via environment variable instead.

macOS / Linux
export OPENAI_API_KEY=sk-...
Windows PowerShell
$env:OPENAI_API_KEY="sk-..."
Windows CMD
set OPENAI_API_KEY=sk-...
# First-time onboarding (collects 3 sample articles)
sheaf init

# Collect a link
sheaf collect https://arxiv.org/abs/2401.00000

# Search your collection
sheaf search "transformer architecture"

# Crystallize knowledge cards from collected articles
sheaf crystallize AI

No accounts. No cloud. Your data lives locally — ./data/ when run inside a project dir, otherwise ~/.sheaf/data — as Markdown + JSON. Override with SHEAF_DATA_DIR.

The Problem

You save links every day — articles, repos, papers, tutorials. 90% never get opened again.

Not because you're lazy. Because bookmarks serve human reading, not agent workflows. When you ask your coding agent "what did I read about MCP last week?", it has no idea.

Sheaf fixes this. Every link you save becomes a structured entry — a single stalk of grain. Crystallize enough of them, and you get a bundle: a portable, searchable knowledge pack any agent can consume.

What It Does

  1. Harvest — paste a link, Sheaf fetches, classifies, and summarizes it
  2. Crystallize — distill 3+ related entries into structured knowledge cards with evidence tracing
  3. Bundle — package cards into a portable .sheaf unit (coming soon)
  4. Agent-ready — built-in MCP server lets any LLM agent query your knowledge base

Core Commands

sheaf collect <url>         # Collect an article, paper, or webpage
sheaf search <query>        # Full-text search across your collection
sheaf stats                 # Collection statistics with topic trends
sheaf crystallize <topic>   # Crystallize knowledge cards from a topic
sheaf crystallize --list    # List all crystallized cards
sheaf crystallize --semantic <q>  # Semantic vector search across cards
sheaf tags                  # Tag statistics
sheaf weekly                # Weekly summary report
sheaf insights              # Cross-topic association discovery
sheaf urgent                # Show entries with upcoming deadlines
sheaf mcp                   # Start MCP server (stdio transport)
sheaf setup --target <platform>  # One-command MCP config (cursor/claude/workbuddy/windsurf)
sheaf init                  # First-time onboarding with demo

# API Key & Provider management
sheaf config setup          # Interactive setup wizard (recommended)
sheaf config list           # Show configured providers
sheaf config set-key --provider <id>   # Add/update a provider key
sheaf config use <provider> # Switch default provider

Crystallize: Your Second Brain

This is Sheaf's killer feature. Instead of leaving your bookmarks to rot, sheaf crystallize synthesizes insights across multiple entries:

$ sheaf crystallize AI
Crystallizing 'AI'...
✨ 5 knowledge cards crystallized:
  📌 RAG faces retrieval relevance challenges (90%)
     RAG systems heavily depend on retrieval quality; errors degrade LLM output reliability.
  📌 CRAG framework improves RAG robustness (95%)
     CRAG introduces a retrieval evaluator, web search augmentation, and document decomposition.
  📌 Retrieval granularity significantly impacts performance (90%)
     Finer-grained units like propositions outperform traditional passage-level retrieval.

Each card includes:

  • Confidence score (0-100%)
  • Evidence tracing — which source entries contributed
  • Topic provenance — what topic this card belongs to
  • Tags — for filtering and cross-referencing

Use sheaf crystallize --semantic "query" for vector-based semantic search across all your cards.

MCP Server

Sheaf ships with a built-in Model Context Protocol server. Any MCP-compatible agent can query your knowledge base:

sheaf mcp

Available tools (4 core + 7 CLI-only, see Issue #91):

Tool Type Description
sheaf_collect Core MCP Add a new URL to your collection
sheaf_search Core MCP Full-text + semantic search (keyword/hybrid/quick)
sheaf_crystallize Core MCP Crystallize knowledge cards from a topic
sheaf_get_card Core MCP Get full card details by ID
sheaf_collect_batch CLI / on-demand Batch-collect via sheaf collect URL1 URL2 ...
sheaf_list CLI / on-demand sheaf list [--topic T] [--tag T] [--json]
sheaf_get CLI / on-demand Full entry detail (MCP tools/call fallback)
sheaf_correct CLI / on-demand Correct classification (MCP tools/call fallback)
sheaf_insights CLI / on-demand sheaf insights — cross-topic associations
sheaf_crosscheck CLI / on-demand Fact matrix (MCP tools/call; see also sheaf matrix)
sheaf_list_cards CLI / on-demand sheaf crystallize --list

The 4 core tools cover ~90% of automated agent workflows and keep the MCP schema lean (~1.5k tokens vs ~5k before). The 7 demoted tools keep their handlers registered for backward compat — call them via tools/call, or re-expose the full set with SHEAF_MCP_TOOLS=all sheaf mcp (power users / migration). For Claude Code & Codex, sheaf setup also deploys a bundled skill / AGENTS note (sheaf-guide.md / AGENTS.sheaf.md) so the agent knows when to use the sheaf CLI for the demoted tools.

Agent Integration (One Command)

Connect Sheaf to your AI coding agent in a single step — this writes the MCP config and deploys the bundled skill / agents note (Claude Code & Codex):

# Cursor / Windsurf / WorkBuddy
sheaf setup --target cursor      # writes .cursor/mcp.json
sheaf setup --target windsurf    # writes .windsurf/mcp.json
sheaf setup --target workbuddy   # writes ~/.workbuddy/mcp.json

# Claude Code — writes ~/.claude.json + deploys ~/.claude/skills/sheaf-guide.md
sheaf setup --target claude

# OpenAI Codex — writes ~/.codex/config.toml + deploys ~/.codex/AGENTS.sheaf.md
sheaf setup --target codex

# Auto-detect from CWD + installed agents
sheaf setup                      # detects claude/codex/cursor/windsurf/workbuddy

Zero-install alternative (no pip install needed):

# Claude Code
claude mcp add sheaf -- uvx --from sheaf-ai sheaf-mcp
# Codex — add to ~/.codex/config.toml:
#   [mcp_servers.sheaf]
#   command = "uvx"
#   args = ["--from", "sheaf-ai", "sheaf-mcp"]

Preview without writing:

sheaf setup --target codex --dry-run

See docs/mcp-setup.md for detailed platform guides and troubleshooting.

What You Can Collect

Sheaf handles more than just web articles:

Input Example What Sheaf does
Web articles sheaf collect https://arxiv.org/abs/2401.00000 Fetches full text, extracts title/author/abstract, classifies topic
AI chat shares sheaf collect https://chatgpt.com/share/... Extracts the Q&A conversation, structures it as reusable knowledge
WeChat / Zhihu posts sheaf collect https://mp.weixin.qq.com/s/... Handles paywalls and dynamic rendering via Playwright fallback
Pasted text sheaf collect --text "Key insight..." Wraps freeform text into a structured entry with auto-classification

Under the hood, every input goes through the same pipeline: fetch → classify → summarize → store. The output is always a structured entry your agents can search and cite.

Exit Codes (Agent-Native)

Sheaf returns semantic exit codes so agents can programmatically branch on error type instead of parsing stderr:

Code Name Meaning
0 SUCCESS Operation completed successfully
1 PARTIAL Partial success (e.g. batch ops with skips)
2 DUPLICATE Entry already exists (dedup skip)
3 QUALITY Content quality gate rejected the input
4 NETWORK Network connectivity or API call failure
5 CONFIG Missing API key, invalid URL, or bad config
6 LLM LLM API failure (rate limit, bad response)
7 STORAGE File I/O or data storage failure

In --json mode, error payloads include exit_code, exit_code_name, and error_type fields for programmatic introspection:

{"success": false, "error": "Config error: missing API key", "exit_code": 5, "exit_code_name": "CONFIG", "error_type": "ConfigError", "hint": "Set SHEAF_API_KEY"}

Architecture

URL → fetch → classify → summarize → store → query
         ↓          ↓          ↓         ↓
    3-strategy   LLM tags   summary   JSONL + MD
    fallback     + topics   + deadline  index

              ↓
         crystallize → KnowledgeCard → EmbeddingEngine
              ↓              ↓
          CLI/MCP       semantic search
Module Purpose
sheaf_ai/ Core — pipeline, storage, search, CLI, MCP server, crystallize engine
sheaf_cards/ Knowledge card engine — base types, embeddings, generation
prompts/ LLM prompt templates (classify, summarize, crystallize)
data/ Local knowledge base (JSONL + Markdown, gitignored)

Privacy & Local-First

Your data never leaves your machine unless you choose to.

  • All content stored locally — ./data/ inside a project dir, otherwise ~/.sheaf/data (override via SHEAF_DATA_DIR)
  • LLM calls go to your chosen API provider
  • No telemetry, no analytics, no accounts
  • Markdown + JSONL format — fully portable, zero lock-in

Configuration

Sheaf supports any OpenAI-compatible API and offers three ways to configure your keys:

Option 1: Interactive Wizard (Recommended)

sheaf config setup

Guides you through selecting a provider, entering your API key (hidden input), and setting defaults. Keys are stored securely in ~/.sheaf/config.json with restricted file permissions.

Option 2: Environment Variables

Best for CI/CD or temporary use. Sheaf reads standard .env files automatically.

macOS / Linux
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.openai.com/v1   # optional
Windows PowerShell
$env:OPENAI_API_KEY="sk-..."
$env:OPENAI_BASE_URL="https://api.openai.com/v1"   # optional
Windows CMD
set OPENAI_API_KEY=sk-...
set OPENAI_BASE_URL=https://api.openai.com/v1       # optional

Or create a .env file in your working directory:

OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1

Option 3: CLI Config Commands

# Add/update a provider
sheaf config set-key --provider deepseek
sheaf config set-key --provider siliconflow

# Switch default provider
sheaf config use deepseek

# List configured providers
sheaf config list

# Remove a provider
sheaf config remove groq

Supported Providers

Provider Key Env Var Default Model
OpenAI OPENAI_API_KEY gpt-4o
DeepSeek DEEPSEEK_API_KEY deepseek-chat
SiliconFlow SILICONFLOW_API_KEY deepseek-ai/DeepSeek-V3.2
Together AI TOGETHER_API_KEY meta-llama/Llama-3.3-70B-Instruct-Turbo
Groq GROQ_API_KEY llama-3.3-70b-versatile

Set the default provider via sheaf config use <id> or the DEFAULT_PROVIDER environment variable.

Requirements

  • Python 3.10+
  • An LLM API key — any OpenAI-compatible endpoint
  • Playwright Chromium (optional, for JS-heavy sites): pip install -e ".[browser]" && playwright install chromium

Development

git clone https://github.com/zhelunSun/sheaf-ai.git
cd sheaf-ai
python -m pip install -e ".[dev]"
python -m pytest tests/ -q     # 988 passed, 19 skipped
python -m ruff check sheaf_ai/ tests/ sheaf_cards/

Dependencies are managed through pyproject.toml extras. Use .[dev] for local development, .[server] for the HTTP API, and .[browser] for Playwright-based fetching.

Alpha Status

Sheaf is in early alpha. The core collect → search → crystallize → MCP pipeline works and is tested with 988 passing tests. We're validating with real users before beta.

Chrome Extension

The browser extension provides one-click collection and search from any webpage:

  1. Start the local API: sheaf serve
  2. Load the extension from extension/ (Chrome → Manage Extensions → Developer mode → Load unpacked)
  3. Click the Sheaf icon to collect the current page or search your knowledge base
  4. Right-click any page or link → "🌾 Collect with Sheaf"
  5. Keyboard shortcut: Alt+Shift+S

The extension connects to your local sheaf serve instance. Its manifest version is independent from the Python package.

Try it: save 20+ links, run sheaf crystallize <topic>, then ask your agent to find them. If it works for you, open an issue or discussion to tell us what you'd change.

License

Apache 2.0


A sheaf is a bundle of harvested grain — the unit a farmer brings to market. In mathematics, a sheaf attaches local data to open sets and glues them into a global picture. Sheaf the tool does both: gather scattered knowledge into coherent bundles, ready for your agents to consume or for you to share.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheaf_ai-0.6.1.tar.gz (267.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sheaf_ai-0.6.1-py3-none-any.whl (202.5 kB view details)

Uploaded Python 3

File details

Details for the file sheaf_ai-0.6.1.tar.gz.

File metadata

  • Download URL: sheaf_ai-0.6.1.tar.gz
  • Upload date:
  • Size: 267.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sheaf_ai-0.6.1.tar.gz
Algorithm Hash digest
SHA256 32fce1c1d0a3b5486011a61d5c7eaff807cb5cce904b009fb451e4c5d118611c
MD5 4372ded428918ec36d29bbb5f0f6c434
BLAKE2b-256 749541ece8ff317e76a847747209a9e0e62a6aa0b7e4b942ec034e6f243d609f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sheaf_ai-0.6.1.tar.gz:

Publisher: publish.yml on zhelunSun/sheaf-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sheaf_ai-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: sheaf_ai-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 202.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sheaf_ai-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf82699d19dd810e0b6eb59b69e28ae0934ae9fd8d6d80298b1aeca0c4854b5f
MD5 9eff18fafd46459aeb1196f4abe8fef6
BLAKE2b-256 12fdc7c9128366a1fefb3490c64e78b1399feb5873cf17d9b0e0fc4835bbb341

See more details on using hashes here.

Provenance

The following attestation bundles were made for sheaf_ai-0.6.1-py3-none-any.whl:

Publisher: publish.yml on zhelunSun/sheaf-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page