Full Spectrum Graph Sieve - Automated Technical Term Extraction and Relationship Mapping
Project description
Graph-Sieve 🕸️📊
Full Spectrum Graph Sieve - Automated Technical Term Extraction and Relationship Mapping
graph-sieve is a powerful knowledge management utility and service designed to extract high-fidelity, relationship-aware domain knowledge from unstructured documents (.docx, .pptx, .msg, .pdf, .one). Using a multi-gate verifiable pipeline, it builds a structured knowledge graph that preserves technical context and organizational links.
✨ Core Capabilities
- 🔍 Multi-Gate Pipeline: A 5-gate extraction flow (Strategic Sieve -> Batch Extraction -> Multi-Source Validation -> Alias Resolution -> Global Synthesis) ensuring high-fidelity term capture with minimal hallucinations.
- 📄 Multi-Format Support: Native handling of PDF, PPTX, DOCX, MSG, and OneNote (.one) files. Leverages Microsoft MarkItDown for deep document parsing and OCR.
- 🗺️ Relationship Mapping: Beyond simple term lookup—automatically maps how terms relate (e.g.,
SUPERSEDES,DEPENDS_ON,HAS_EXPERT). - 🌐 Global Synthesis: Automatically clusters the graph into communities and generates executive summaries and a global project narrative.
- 🇮🇱 Hebrew & Mixed-Language Support: Specialized Bi-Directional (BIDI) support for Hebrew-English technical documents, ensuring technical terms are correctly extracted from mixed-language contexts.
- ⚙️ Flexible LLM Backend: Run locally with Ollama/vLLM for privacy, or use OpenAI for scale.
- 📈 Interactive Visualization: Generate dynamic, relationship-aware graph visualizations via PyVis.
- 🤖 MCP Server: Integrated Model Context Protocol (MCP) server for seamless integration with AI agents like Claude Desktop or Gemini CLI.
🚀 Quick Start
-
Configure Your LLM: Create a
.envfile in your working directory:LLM_PROVIDER=openai OPENAI_API_KEY=your_key_here MODEL_NAME=gpt-4o-mini
Or use local Ollama (default):
LLM_PROVIDER=ollama OLLAMA_BASE_URL=http://localhost:11434 MODEL_NAME=llama3
-
Scan a Directory:
graph-sieve-scan ./path/to/documents --db my_knowledge.db
-
Visualize the Results:
graph-sieve-visualize --db my_knowledge.db
🛠️ CLI Command Reference
graph-sieve-scan <path>: Extract terms from a directory or file.--db <path>: Path to the SQLite database (default: platform-standard data dir).--seed <path>: High-authority documents to process first.--whitelist <path>: Text file with terms to always include.--retry-failed: Retry processing chunks from the Dead Letter Queue (DLQ).
graph-sieve-lookup <term>: Query a term, its definition, and its graph context.graph-sieve-visualize: Generate an interactive HTML graph.graph-sieve-mcp: Launch the MCP server.graph-sieve-whois <term>: Identify experts, owners, and organizations responsible for a term.
📖 Advanced Workflow
💎 Seed Documents
Use the --seed flag to process "Golden" documents (specs, architecture docs) before general notes. This sets the ground truth for term definitions and relationships.
🔗 Alias Resolution & Canonicalization
Graph-Sieve automatically performs LLM-verified canonicalization. If it finds "AIP" and "AI Platform" in the same context, it will attempt to merge them into a single canonical entry with appropriate aliases.
🆘 Dead Letter Queue (DLQ)
If an LLM call fails or a chunk is too complex, it's pushed to the DLQ. Use graph-sieve-scan --retry-failed to re-process these chunks after updating your configuration or models.
⚙️ Configuration (Environment Variables)
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
openai, ollama, or vllm |
openai |
OPENAI_API_KEY |
Required if using OpenAI | None |
OLLAMA_BASE_URL |
URL for Ollama API | http://localhost:11434 |
MODEL_NAME |
Model to use for extraction | gpt-4o-mini |
STORAGE_DIR |
Directory for graph data | Platform-specific |
🧩 AI Agent Integration
Add Graph-Sieve to your MCP-compatible agent's configuration:
{
"mcpServers": {
"graph-sieve": {
"command": "graph-sieve-mcp",
"args": []
}
}
}
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file graph_sieve-1.2.1.tar.gz.
File metadata
- Download URL: graph_sieve-1.2.1.tar.gz
- Upload date:
- Size: 75.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a30f9c4a1d768d792c5b55dbfffe5789c5c93228d93c107aad15265be9c2522
|
|
| MD5 |
1395899bc976d74db9040fcb9feecb1b
|
|
| BLAKE2b-256 |
90812e3c12409b92c1951207aab2f8ed3015a18c9a4cdc2f6c0c6dcc91f113d2
|
File details
Details for the file graph_sieve-1.2.1-py3-none-any.whl.
File metadata
- Download URL: graph_sieve-1.2.1-py3-none-any.whl
- Upload date:
- Size: 50.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9746ca637e20debbcbd3b0f6e39915c00f4d94ec2270cd024ec1c844573cccb9
|
|
| MD5 |
8719b622c174bab4e4fc05f0f0c0daf7
|
|
| BLAKE2b-256 |
01183f204a31b4e60fba6cd2a23a8c342fbce5584b1afc031781960e3d52b3e9
|