Skip to main content

Sheaf — Your personal knowledge layer. Paste a link, AI does the rest.

Project description

Sheaf

Harvest your knowledge. Bundle it. Share it.

Python 3.10+ License: MIT Tests PyPI PyPI - Python Version

A sheaf is a bundle of grain — the basic unit a farmer brings to market. Sheaf does the same for knowledge: gather what you read, crystallize it into structured bundles, and make it tradable. Your AI agents can search, cite, and reason over everything you've collected.

Quick Start

# Install from PyPI
pip install sheaf-ai

# Or install from source
git clone https://github.com/zhelunSun/sheaf-ai.git
cd sheaf-ai
pip install -e .

# Set your LLM API key (any OpenAI-compatible endpoint)
export OPENAI_API_KEY=sk-...

# First-time onboarding (collects 3 sample articles)
sheaf init

# Collect a link
sheaf collect https://arxiv.org/abs/2401.00000

# Search your collection
sheaf search "transformer architecture"

# Crystallize knowledge cards from collected articles
sheaf crystallize AI

No accounts. No cloud. Your data lives in ./data/ as Markdown + JSON.

The Problem

You save links every day — articles, repos, papers, tutorials. 95% never get opened again.

Not because you're lazy. Because bookmarks serve human reading, not agent workflows. When you ask your coding agent "what did I read about MCP last week?", it has no idea.

Sheaf fixes this. Every link you save becomes a structured entry — a single stalk of grain. Crystallize enough of them, and you get a bundle: a portable, searchable knowledge pack any agent can consume.

What It Does

  1. Harvest — paste a link, Sheaf fetches, classifies, and summarizes it
  2. Crystallize — distill 3+ related entries into structured knowledge cards with evidence tracing
  3. Bundle — package cards into a portable .sheaf unit (coming soon)
  4. Agent-ready — built-in MCP server lets any LLM agent query your knowledge base

Core Commands

sheaf collect <url>         # Collect an article, paper, or webpage
sheaf search <query>        # Full-text search across your collection
sheaf stats                 # Collection statistics with topic trends
sheaf crystallize <topic>   # Crystallize knowledge cards from a topic
sheaf crystallize --list    # List all crystallized cards
sheaf crystallize --semantic <q>  # Semantic vector search across cards
sheaf tags                  # Tag statistics
sheaf weekly                # Weekly summary report
sheaf insights              # Cross-topic association discovery
sheaf urgent                # Show entries with upcoming deadlines
sheaf mcp                   # Start MCP server (stdio transport)
sheaf init                  # First-time onboarding with demo

Crystallize: Your Second Brain

This is Sheaf's killer feature. Instead of leaving your bookmarks to rot, sheaf crystallize synthesizes insights across multiple entries:

$ sheaf crystallize AI
Crystallizing 'AI'...
✨ 5 knowledge cards crystallized:
  📌 RAG faces retrieval relevance challenges (90%)
     RAG systems heavily depend on retrieval quality; errors degrade LLM output reliability.
  📌 CRAG framework improves RAG robustness (95%)
     CRAG introduces a retrieval evaluator, web search augmentation, and document decomposition.
  📌 Retrieval granularity significantly impacts performance (90%)
     Finer-grained units like propositions outperform traditional passage-level retrieval.

Each card includes:

  • Confidence score (0-100%)
  • Evidence tracing — which source entries contributed
  • Topic provenance — what topic this card belongs to
  • Tags — for filtering and cross-referencing

Use sheaf crystallize --semantic "query" for vector-based semantic search across all your cards.

MCP Server

Sheaf ships with a built-in Model Context Protocol server. Any MCP-compatible agent can query your knowledge base:

sheaf mcp

Available tools (9 total):

Tool Description
sheaf_search Full-text search across all entries
sheaf_list List recent entries with filtering
sheaf_get Get full entry details by ID
sheaf_urgent Find time-sensitive entries (deadlines, CFPs)
sheaf_collect Add a new URL to your collection
sheaf_correct Correct a classification error
sheaf_crystallize Crystallize knowledge cards from a topic
sheaf_list_cards List crystallized cards (optional topic filter)
sheaf_get_card Get full card details by ID

What You Can Collect

Sheaf handles more than just web articles:

Input Example What Sheaf does
Web articles sheaf collect https://arxiv.org/abs/2401.00000 Fetches full text, extracts title/author/abstract, classifies topic
AI chat shares sheaf collect https://chatgpt.com/share/... Extracts the Q&A conversation, structures it as reusable knowledge
WeChat / Zhihu posts sheaf collect https://mp.weixin.qq.com/s/... Handles paywalls and dynamic rendering via Playwright fallback
Pasted text sheaf collect --text "Key insight..." Wraps freeform text into a structured entry with auto-classification

Under the hood, every input goes through the same pipeline: fetch → classify → summarize → store. The output is always a structured entry your agents can search and cite.

Architecture

URL → fetch → classify → summarize → store → query
         ↓          ↓          ↓         ↓
    3-strategy   LLM tags   summary   JSONL + MD
    fallback     + topics   + deadline  index

              ↓
         crystallize → KnowledgeCard → EmbeddingEngine
              ↓              ↓
          CLI/MCP       semantic search
Module Purpose
sheaf_ai/ Core — pipeline, storage, search, CLI, MCP server, crystallize engine
sheaf_cards/ Knowledge card engine — base types, embeddings, generation
prompts/ LLM prompt templates (classify, summarize, crystallize)
data/ Local knowledge base (JSONL + Markdown, gitignored)

Privacy & Local-First

Your data never leaves your machine unless you choose to.

  • All content stored locally in ./data/ (configurable via SHEAF_DATA_DIR)
  • LLM calls go to your chosen API provider
  • No telemetry, no analytics, no accounts
  • Markdown + JSONL format — fully portable, zero lock-in

Configuration

Sheaf works with any OpenAI-compatible API:

# OpenAI
export OPENAI_API_KEY=sk-...

# Or any compatible endpoint (Together, Groq, DeepSeek, etc.)
export OPENAI_API_KEY=sk-...
export OPENAI_BASE_URL=https://api.together.xyz/v1

Optional: create a .env file in your working directory. See .env.example for all options.

Requirements

  • Python 3.10+
  • An LLM API key — any OpenAI-compatible endpoint
  • Playwright Chromium (optional, for JS-heavy sites): pip install -e ".[browser]" && playwright install chromium

Development

git clone https://github.com/zhelunSun/sheaf-ai.git
cd sheaf-ai
pip install -e ".[dev]"
pytest tests/ -v     # 104 tests
ruff check sheaf_ai/ tests/ sheaf_cards/

Alpha Status

Sheaf is in early alpha. The core collect → search → crystallize → MCP pipeline works and is tested with 104 tests. We're validating with real users before beta.

Try it: save 20+ links, run sheaf crystallize <topic>, then ask your agent to find them. If it works for you, open an issue or discussion to tell us what you'd change.

License

MIT


A sheaf is a bundle of harvested grain — the unit a farmer brings to market. In mathematics, a sheaf attaches local data to open sets and glues them into a global picture. Sheaf the tool does both: gather scattered knowledge into coherent bundles, ready for your agents to consume or for you to share.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheaf_ai-0.4.0a0.tar.gz (98.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sheaf_ai-0.4.0a0-py3-none-any.whl (86.8 kB view details)

Uploaded Python 3

File details

Details for the file sheaf_ai-0.4.0a0.tar.gz.

File metadata

  • Download URL: sheaf_ai-0.4.0a0.tar.gz
  • Upload date:
  • Size: 98.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for sheaf_ai-0.4.0a0.tar.gz
Algorithm Hash digest
SHA256 91b77d883312a2e0c5a6089b37325db3e7226825ed8564524c8cc15097e49b6a
MD5 67abd04cc07bae1a3eaab07cf682bce2
BLAKE2b-256 5c424e1db0ba8826a64dabed8063178b6c4da90f6d2b3ffed213afc7e31b035f

See more details on using hashes here.

File details

Details for the file sheaf_ai-0.4.0a0-py3-none-any.whl.

File metadata

  • Download URL: sheaf_ai-0.4.0a0-py3-none-any.whl
  • Upload date:
  • Size: 86.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for sheaf_ai-0.4.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9e4bb4e703561e0604bb72b79c882c9c7ebf8c104c8902bfe929ab003a51893
MD5 188cbb48f6568414ed0ceafbc74904a2
BLAKE2b-256 74eae29e13a78b675cadecc6a140ea9b1308023d5b1a0967edcbf3b6d060b9f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page