Sheaf — Your personal knowledge layer. Paste a link, AI does the rest.
Project description
Sheaf
Harvest your knowledge. Bundle it. Share it.
A sheaf is a bundle of grain — the basic unit a farmer brings to market. Sheaf does the same for knowledge: gather what you read, crystallize it into structured bundles, and make it tradable. Your AI agents can search, cite, and reason over everything you've collected.
Quick Start
# Install from PyPI
pip install sheaf-ai
# Or install from source
git clone https://github.com/zhelunSun/sheaf-ai.git
cd sheaf-ai
pip install -e .
# Set your LLM API key (any OpenAI-compatible endpoint)
export OPENAI_API_KEY=sk-...
# First-time onboarding (collects 3 sample articles)
sheaf init
# Collect a link
sheaf collect https://arxiv.org/abs/2401.00000
# Search your collection
sheaf search "transformer architecture"
# Crystallize knowledge cards from collected articles
sheaf crystallize AI
No accounts. No cloud. Your data lives in ./data/ as Markdown + JSON.
The Problem
You save links every day — articles, repos, papers, tutorials. 95% never get opened again.
Not because you're lazy. Because bookmarks serve human reading, not agent workflows. When you ask your coding agent "what did I read about MCP last week?", it has no idea.
Sheaf fixes this. Every link you save becomes a structured entry — a single stalk of grain. Crystallize enough of them, and you get a bundle: a portable, searchable knowledge pack any agent can consume.
What It Does
- Harvest — paste a link, Sheaf fetches, classifies, and summarizes it
- Crystallize — distill 3+ related entries into structured knowledge cards with evidence tracing
- Bundle — package cards into a portable
.sheafunit (coming soon) - Agent-ready — built-in MCP server lets any LLM agent query your knowledge base
Core Commands
sheaf collect <url> # Collect an article, paper, or webpage
sheaf search <query> # Full-text search across your collection
sheaf stats # Collection statistics with topic trends
sheaf crystallize <topic> # Crystallize knowledge cards from a topic
sheaf crystallize --list # List all crystallized cards
sheaf crystallize --semantic <q> # Semantic vector search across cards
sheaf tags # Tag statistics
sheaf weekly # Weekly summary report
sheaf insights # Cross-topic association discovery
sheaf urgent # Show entries with upcoming deadlines
sheaf mcp # Start MCP server (stdio transport)
sheaf init # First-time onboarding with demo
Crystallize: Your Second Brain
This is Sheaf's killer feature. Instead of leaving your bookmarks to rot, sheaf crystallize synthesizes insights across multiple entries:
$ sheaf crystallize AI
Crystallizing 'AI'...
✨ 5 knowledge cards crystallized:
📌 RAG faces retrieval relevance challenges (90%)
RAG systems heavily depend on retrieval quality; errors degrade LLM output reliability.
📌 CRAG framework improves RAG robustness (95%)
CRAG introduces a retrieval evaluator, web search augmentation, and document decomposition.
📌 Retrieval granularity significantly impacts performance (90%)
Finer-grained units like propositions outperform traditional passage-level retrieval.
Each card includes:
- Confidence score (0-100%)
- Evidence tracing — which source entries contributed
- Topic provenance — what topic this card belongs to
- Tags — for filtering and cross-referencing
Use sheaf crystallize --semantic "query" for vector-based semantic search across all your cards.
MCP Server
Sheaf ships with a built-in Model Context Protocol server. Any MCP-compatible agent can query your knowledge base:
sheaf mcp
Available tools (9 total):
| Tool | Description |
|---|---|
sheaf_search |
Full-text search across all entries |
sheaf_list |
List recent entries with filtering |
sheaf_get |
Get full entry details by ID |
sheaf_urgent |
Find time-sensitive entries (deadlines, CFPs) |
sheaf_collect |
Add a new URL to your collection |
sheaf_correct |
Correct a classification error |
sheaf_crystallize |
Crystallize knowledge cards from a topic |
sheaf_list_cards |
List crystallized cards (optional topic filter) |
sheaf_get_card |
Get full card details by ID |
What You Can Collect
Sheaf handles more than just web articles:
| Input | Example | What Sheaf does |
|---|---|---|
| Web articles | sheaf collect https://arxiv.org/abs/2401.00000 |
Fetches full text, extracts title/author/abstract, classifies topic |
| AI chat shares | sheaf collect https://chatgpt.com/share/... |
Extracts the Q&A conversation, structures it as reusable knowledge |
| WeChat / Zhihu posts | sheaf collect https://mp.weixin.qq.com/s/... |
Handles paywalls and dynamic rendering via Playwright fallback |
| Pasted text | sheaf collect --text "Key insight..." |
Wraps freeform text into a structured entry with auto-classification |
Under the hood, every input goes through the same pipeline: fetch → classify → summarize → store. The output is always a structured entry your agents can search and cite.
Architecture
URL → fetch → classify → summarize → store → query
↓ ↓ ↓ ↓
3-strategy LLM tags summary JSONL + MD
fallback + topics + deadline index
↓
crystallize → KnowledgeCard → EmbeddingEngine
↓ ↓
CLI/MCP semantic search
| Module | Purpose |
|---|---|
sheaf_ai/ |
Core — pipeline, storage, search, CLI, MCP server, crystallize engine |
sheaf_cards/ |
Knowledge card engine — base types, embeddings, generation |
prompts/ |
LLM prompt templates (classify, summarize, crystallize) |
data/ |
Local knowledge base (JSONL + Markdown, gitignored) |
Privacy & Local-First
Your data never leaves your machine unless you choose to.
- All content stored locally in
./data/(configurable viaSHEAF_DATA_DIR) - LLM calls go to your chosen API provider
- No telemetry, no analytics, no accounts
- Markdown + JSONL format — fully portable, zero lock-in
Configuration
Sheaf works with any OpenAI-compatible API:
# OpenAI
export OPENAI_API_KEY=sk-...
# Or any compatible endpoint (Together, Groq, DeepSeek, etc.)
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.together.xyz/v1
Optional: create a .env file in your working directory. See .env.example for all options.
Requirements
- Python 3.10+
- An LLM API key — any OpenAI-compatible endpoint
- Playwright Chromium (optional, for JS-heavy sites):
pip install -e ".[browser]" && playwright install chromium
Development
git clone https://github.com/zhelunSun/sheaf-ai.git
cd sheaf-ai
pip install -e ".[dev]"
pytest tests/ -v # 104 tests
ruff check sheaf_ai/ tests/ sheaf_cards/
Alpha Status
Sheaf is in early alpha. The core collect → search → crystallize → MCP pipeline works and is tested with 104 tests. We're validating with real users before beta.
Try it: save 20+ links, run sheaf crystallize <topic>, then ask your agent to find them. If it works for you, open an issue or discussion to tell us what you'd change.
License
A sheaf is a bundle of harvested grain — the unit a farmer brings to market. In mathematics, a sheaf attaches local data to open sets and glues them into a global picture. Sheaf the tool does both: gather scattered knowledge into coherent bundles, ready for your agents to consume or for you to share.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sheaf_ai-0.4.0a0.tar.gz.
File metadata
- Download URL: sheaf_ai-0.4.0a0.tar.gz
- Upload date:
- Size: 98.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91b77d883312a2e0c5a6089b37325db3e7226825ed8564524c8cc15097e49b6a
|
|
| MD5 |
67abd04cc07bae1a3eaab07cf682bce2
|
|
| BLAKE2b-256 |
5c424e1db0ba8826a64dabed8063178b6c4da90f6d2b3ffed213afc7e31b035f
|
File details
Details for the file sheaf_ai-0.4.0a0-py3-none-any.whl.
File metadata
- Download URL: sheaf_ai-0.4.0a0-py3-none-any.whl
- Upload date:
- Size: 86.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9e4bb4e703561e0604bb72b79c882c9c7ebf8c104c8902bfe929ab003a51893
|
|
| MD5 |
188cbb48f6568414ed0ceafbc74904a2
|
|
| BLAKE2b-256 |
74eae29e13a78b675cadecc6a140ea9b1308023d5b1a0967edcbf3b6d060b9f5
|