Sheaf — Your personal knowledge layer. Paste a link, AI does the rest.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Zhelun

These details have not been verified by PyPI

Project description

English | 中文

Sheaf Logo

Sheaf

Harvest your knowledge. Bundle it. Share it.

A sheaf is a bundle of grain — the basic unit a farmer brings to market. Sheaf does the same for knowledge: gather what you read, crystallize it into structured bundles, and make it tradable. Your AI agents can search, cite, and reason over everything you've collected.

One-liner: A global bookmark manager + agent memory layer. Local-first, open-source.

Quick Start

Add Sheaf to Claude Code in one line — no Python install needed:

claude mcp add sheaf -- uvx --from sheaf-ai sheaf-mcp

Then ask your agent to collect a URL, search your knowledge base, or crystallize a topic — that's it. uvx is the npm-style runner (brew install uv / winget install astral-sh.uv); an OpenAI-compatible API key is needed — set SHEAF_API_KEY, or run uvx --from sheaf-ai sheaf config setup once.

OpenAI Codex CLI — add to ~/.codex/config.toml

[mcp_servers.sheaf]
command = "uvx"
args = ["--from", "sheaf-ai", "sheaf-mcp"]

Prefer the CLI or Python library?

pip install sheaf-ai
sheaf config setup          # interactive API-key wizard (recommended)

Already have a key? Set it via environment variable instead.

macOS / Linux

export OPENAI_API_KEY=sk-...

Windows PowerShell

$env:OPENAI_API_KEY="sk-..."

Windows CMD

set OPENAI_API_KEY=sk-...

# First-time onboarding (collects 3 sample articles)
sheaf init

# Collect a link
sheaf collect https://arxiv.org/abs/2401.00000

# Search your collection
sheaf search "transformer architecture"

# Crystallize knowledge cards from collected articles
sheaf crystallize AI

No accounts. No cloud. Your data lives locally — ./data/ when run inside a project dir, otherwise ~/.sheaf/data — as Markdown + JSON. Override with SHEAF_DATA_DIR.

The Problem

You save links every day — articles, repos, papers, tutorials. 90% never get opened again.

Not because you're lazy. Because bookmarks serve human reading, not agent workflows. When you ask your coding agent "what did I read about MCP last week?", it has no idea.

Sheaf fixes this. Every link you save becomes a structured entry — a single stalk of grain. Crystallize enough of them, and you get a bundle: a portable, searchable knowledge pack any agent can consume.

What It Does

Harvest — paste a link, Sheaf fetches, classifies, and summarizes it
Crystallize — distill 3+ related entries into structured knowledge cards with evidence tracing
Bundle — package cards into a portable .sheaf unit (coming soon)
Agent-ready — built-in MCP server lets any LLM agent query your knowledge base

Core Commands

sheaf collect <url>         # Collect an article, paper, or webpage
sheaf search <query>        # Full-text search across your collection
sheaf stats                 # Collection statistics with topic trends
sheaf crystallize <topic>   # Crystallize knowledge cards from a topic
sheaf crystallize --list    # List all crystallized cards
sheaf crystallize --semantic <q>  # Semantic vector search across cards
sheaf tags                  # Tag statistics
sheaf weekly                # Weekly summary report
sheaf insights              # Cross-topic association discovery
sheaf urgent                # Show entries with upcoming deadlines
sheaf mcp                   # Start MCP server (stdio transport)
sheaf setup --target <platform>  # One-command MCP config (cursor/claude/workbuddy/windsurf)
sheaf init                  # First-time onboarding with demo

# API Key & Provider management
sheaf config setup          # Interactive setup wizard (recommended)
sheaf config list           # Show configured providers
sheaf config set-key --provider <id>   # Add/update a provider key
sheaf config use <provider> # Switch default provider

Crystallize: Your Second Brain

This is Sheaf's killer feature. Instead of leaving your bookmarks to rot, sheaf crystallize synthesizes insights across multiple entries:

$ sheaf crystallize AI
Crystallizing 'AI'...
✨ 5 knowledge cards crystallized:
  📌 RAG faces retrieval relevance challenges (90%)
     RAG systems heavily depend on retrieval quality; errors degrade LLM output reliability.
  📌 CRAG framework improves RAG robustness (95%)
     CRAG introduces a retrieval evaluator, web search augmentation, and document decomposition.
  📌 Retrieval granularity significantly impacts performance (90%)
     Finer-grained units like propositions outperform traditional passage-level retrieval.

Each card includes:

Confidence score (0-100%)
Evidence tracing — which source entries contributed
Topic provenance — what topic this card belongs to
Tags — for filtering and cross-referencing

Use sheaf crystallize --semantic "query" for vector-based semantic search across all your cards.

MCP Server

Sheaf ships with a built-in Model Context Protocol server. Any MCP-compatible agent can query your knowledge base:

sheaf mcp

Available tools (4 core + 7 CLI-only, see Issue #91):

Tool	Type	Description
`sheaf_collect`	Core MCP	Add a new URL to your collection
`sheaf_search`	Core MCP	Full-text + semantic search (keyword/hybrid/quick)
`sheaf_crystallize`	Core MCP	Crystallize knowledge cards from a topic
`sheaf_get_card`	Core MCP	Get full card details by ID
`sheaf_collect_batch`	CLI / on-demand	Batch-collect via `sheaf collect URL1 URL2 ...`
`sheaf_list`	CLI / on-demand	`sheaf list [--topic T] [--tag T] [--json]`
`sheaf_get`	CLI / on-demand	Full entry detail (MCP `tools/call` fallback)
`sheaf_correct`	CLI / on-demand	Correct classification (MCP `tools/call` fallback)
`sheaf_insights`	CLI / on-demand	`sheaf insights` — cross-topic associations
`sheaf_crosscheck`	CLI / on-demand	Fact matrix (MCP `tools/call`; see also `sheaf matrix`)
`sheaf_list_cards`	CLI / on-demand	`sheaf crystallize --list`

The 4 core tools cover ~90% of automated agent workflows and keep the MCP schema lean (~1.5k tokens vs ~5k before). The 7 demoted tools keep their handlers registered for backward compat — call them via tools/call, or re-expose the full set with SHEAF_MCP_TOOLS=all sheaf mcp (power users / migration). For Claude Code & Codex, sheaf setup also deploys a bundled skill / AGENTS note (sheaf-guide.md / AGENTS.sheaf.md) so the agent knows when to use the sheaf CLI for the demoted tools.

Agent Integration (One Command)

Connect Sheaf to your AI coding agent in a single step — this writes the MCP config and deploys the bundled skill / agents note (Claude Code & Codex):

# Cursor / Windsurf / WorkBuddy
sheaf setup --target cursor      # writes .cursor/mcp.json
sheaf setup --target windsurf    # writes .windsurf/mcp.json
sheaf setup --target workbuddy   # writes ~/.workbuddy/mcp.json

# Claude Code — writes ~/.claude.json + deploys ~/.claude/skills/sheaf-guide.md
sheaf setup --target claude

# OpenAI Codex — writes ~/.codex/config.toml + deploys ~/.codex/AGENTS.sheaf.md
sheaf setup --target codex

# Auto-detect from CWD + installed agents
sheaf setup                      # detects claude/codex/cursor/windsurf/workbuddy

Zero-install alternative (no pip install needed):

# Claude Code
claude mcp add sheaf -- uvx --from sheaf-ai sheaf-mcp
# Codex — add to ~/.codex/config.toml:
#   [mcp_servers.sheaf]
#   command = "uvx"
#   args = ["--from", "sheaf-ai", "sheaf-mcp"]

Preview without writing:

sheaf setup --target codex --dry-run

See docs/mcp-setup.md for detailed platform guides and troubleshooting.

What You Can Collect

Sheaf handles more than just web articles:

Input	Example	What Sheaf does
Web articles	`sheaf collect https://arxiv.org/abs/2401.00000`	Fetches full text, extracts title/author/abstract, classifies topic
AI chat shares	`sheaf collect https://chatgpt.com/share/...`	Extracts the Q&A conversation, structures it as reusable knowledge
WeChat / Zhihu posts	`sheaf collect https://mp.weixin.qq.com/s/...`	Handles paywalls and dynamic rendering via Playwright fallback
Pasted text	`sheaf collect --text "Key insight..."`	Wraps freeform text into a structured entry with auto-classification

Under the hood, every input goes through the same pipeline: fetch → classify → summarize → store. The output is always a structured entry your agents can search and cite.

Exit Codes (Agent-Native)

Sheaf returns semantic exit codes so agents can programmatically branch on error type instead of parsing stderr:

Code	Name	Meaning
0	`SUCCESS`	Operation completed successfully
1	`PARTIAL`	Partial success (e.g. batch ops with skips)
2	`DUPLICATE`	Entry already exists (dedup skip)
3	`QUALITY`	Content quality gate rejected the input
4	`NETWORK`	Network connectivity or API call failure
5	`CONFIG`	Missing API key, invalid URL, or bad config
6	`LLM`	LLM API failure (rate limit, bad response)
7	`STORAGE`	File I/O or data storage failure

In --json mode, error payloads include exit_code, exit_code_name, and error_type fields for programmatic introspection:

{"success": false, "error": "Config error: missing API key", "exit_code": 5, "exit_code_name": "CONFIG", "error_type": "ConfigError", "hint": "Set SHEAF_API_KEY"}

Architecture

URL → fetch → classify → summarize → store → query
         ↓          ↓          ↓         ↓
    3-strategy   LLM tags   summary   JSONL + MD
    fallback     + topics   + deadline  index

              ↓
         crystallize → KnowledgeCard → EmbeddingEngine
              ↓              ↓
          CLI/MCP       semantic search

Module	Purpose
`sheaf_ai/`	Core — pipeline, storage, search, CLI, MCP server, crystallize engine
`sheaf_cards/`	Knowledge card engine — base types, embeddings, generation
`prompts/`	LLM prompt templates (classify, summarize, crystallize)
`data/`	Local knowledge base (JSONL + Markdown, gitignored)

Privacy & Local-First

Your data never leaves your machine unless you choose to.

All content stored locally — ./data/ inside a project dir, otherwise ~/.sheaf/data (override via SHEAF_DATA_DIR)
LLM calls go to your chosen API provider
No telemetry, no analytics, no accounts
Markdown + JSONL format — fully portable, zero lock-in

Configuration

Sheaf supports any OpenAI-compatible API and offers three ways to configure your keys:

Option 1: Interactive Wizard (Recommended)

sheaf config setup

Guides you through selecting a provider, entering your API key (hidden input), and setting defaults. Keys are stored securely in ~/.sheaf/config.json with restricted file permissions.

Option 2: Environment Variables

Best for CI/CD or temporary use. Sheaf reads standard .env files automatically.

macOS / Linux

export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.openai.com/v1   # optional

Windows PowerShell

$env:OPENAI_API_KEY="sk-..."
$env:OPENAI_BASE_URL="https://api.openai.com/v1"   # optional

Windows CMD

set OPENAI_API_KEY=sk-...
set OPENAI_BASE_URL=https://api.openai.com/v1       # optional

Or create a .env file in your working directory:

OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1

Option 3: CLI Config Commands

# Add/update a provider
sheaf config set-key --provider deepseek
sheaf config set-key --provider siliconflow

# Switch default provider
sheaf config use deepseek

# List configured providers
sheaf config list

# Remove a provider
sheaf config remove groq

Supported Providers

Provider	Key Env Var	Default Model
OpenAI	`OPENAI_API_KEY`	`gpt-4o`
DeepSeek	`DEEPSEEK_API_KEY`	`deepseek-chat`
SiliconFlow	`SILICONFLOW_API_KEY`	`deepseek-ai/DeepSeek-V3.2`
Together AI	`TOGETHER_API_KEY`	`meta-llama/Llama-3.3-70B-Instruct-Turbo`
Groq	`GROQ_API_KEY`	`llama-3.3-70b-versatile`

Set the default provider via sheaf config use <id> or the DEFAULT_PROVIDER environment variable.

Requirements

Python 3.10+
An LLM API key — any OpenAI-compatible endpoint
Playwright Chromium (optional, for JS-heavy sites): pip install -e ".[browser]" && playwright install chromium

Development

git clone https://github.com/zhelunSun/sheaf-ai.git
cd sheaf-ai
python -m pip install -e ".[dev]"
python -m pytest tests/ -q     # 988 passed, 19 skipped
python -m ruff check sheaf_ai/ tests/ sheaf_cards/

Dependencies are managed through pyproject.toml extras. Use .[dev] for local development, .[server] for the HTTP API, and .[browser] for Playwright-based fetching.

Alpha Status

Sheaf is in early alpha. The core collect → search → crystallize → MCP pipeline works and is tested with 988 passing tests. We're validating with real users before beta.

Chrome Extension

The browser extension provides one-click collection and search from any webpage:

Start the local API: sheaf serve
Load the extension from extension/ (Chrome → Manage Extensions → Developer mode → Load unpacked)
Click the Sheaf icon to collect the current page or search your knowledge base
Right-click any page or link → "🌾 Collect with Sheaf"
Keyboard shortcut: Alt+Shift+S

The extension connects to your local sheaf serve instance. Its manifest version is independent from the Python package.

Try it: save 20+ links, run sheaf crystallize <topic>, then ask your agent to find them. If it works for you, open an issue or discussion to tell us what you'd change.

License

Apache 2.0

A sheaf is a bundle of harvested grain — the unit a farmer brings to market. In mathematics, a sheaf attaches local data to open sets and glues them into a global picture. Sheaf the tool does both: gather scattered knowledge into coherent bundles, ready for your agents to consume or for you to share.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Zhelun

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.7.0

Jun 21, 2026

0.6.3

Jun 20, 2026

0.6.2

Jun 20, 2026

This version

0.6.1

Jun 19, 2026

0.4.0a0 pre-release

May 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheaf_ai-0.6.1.tar.gz (267.4 kB view details)

Uploaded Jun 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sheaf_ai-0.6.1-py3-none-any.whl (202.5 kB view details)

Uploaded Jun 19, 2026 Python 3

File details

Details for the file sheaf_ai-0.6.1.tar.gz.

File metadata

Download URL: sheaf_ai-0.6.1.tar.gz
Upload date: Jun 19, 2026
Size: 267.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sheaf_ai-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`32fce1c1d0a3b5486011a61d5c7eaff807cb5cce904b009fb451e4c5d118611c`
MD5	`4372ded428918ec36d29bbb5f0f6c434`
BLAKE2b-256	`749541ece8ff317e76a847747209a9e0e62a6aa0b7e4b942ec034e6f243d609f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sheaf_ai-0.6.1.tar.gz:

Publisher: publish.yml on zhelunSun/sheaf-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sheaf_ai-0.6.1.tar.gz
- Subject digest: 32fce1c1d0a3b5486011a61d5c7eaff807cb5cce904b009fb451e4c5d118611c
- Sigstore transparency entry: 1871145342
- Sigstore integration time: Jun 19, 2026
Source repository:
- Permalink: zhelunSun/sheaf-ai@f8da76b35f7bbae7661e07c5b9ec59c9ef581f81
- Branch / Tag: refs/tags/v0.6.1
- Owner: https://github.com/zhelunSun
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f8da76b35f7bbae7661e07c5b9ec59c9ef581f81
- Trigger Event: push

File details

Details for the file sheaf_ai-0.6.1-py3-none-any.whl.

File metadata

Download URL: sheaf_ai-0.6.1-py3-none-any.whl
Upload date: Jun 19, 2026
Size: 202.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sheaf_ai-0.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf82699d19dd810e0b6eb59b69e28ae0934ae9fd8d6d80298b1aeca0c4854b5f`
MD5	`9eff18fafd46459aeb1196f4abe8fef6`
BLAKE2b-256	`12fdc7c9128366a1fefb3490c64e78b1399feb5873cf17d9b0e0fc4835bbb341`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sheaf_ai-0.6.1-py3-none-any.whl:

Publisher: publish.yml on zhelunSun/sheaf-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sheaf_ai-0.6.1-py3-none-any.whl
- Subject digest: cf82699d19dd810e0b6eb59b69e28ae0934ae9fd8d6d80298b1aeca0c4854b5f
- Sigstore transparency entry: 1871145433
- Sigstore integration time: Jun 19, 2026
Source repository:
- Permalink: zhelunSun/sheaf-ai@f8da76b35f7bbae7661e07c5b9ec59c9ef581f81
- Branch / Tag: refs/tags/v0.6.1
- Owner: https://github.com/zhelunSun
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f8da76b35f7bbae7661e07c5b9ec59c9ef581f81
- Trigger Event: push

sheaf-ai 0.6.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Sheaf

Quick Start

The Problem

What It Does

Core Commands

Crystallize: Your Second Brain

MCP Server

Agent Integration (One Command)

What You Can Collect

Exit Codes (Agent-Native)

Architecture

Privacy & Local-First

Configuration

Option 1: Interactive Wizard (Recommended)

Option 2: Environment Variables

Option 3: CLI Config Commands

Supported Providers

Requirements

Development

Alpha Status

Chrome Extension

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance