Full Spectrum Graph Sieve - Automated Technical Term Extraction and Relationship Mapping
Project description
Graph-Sieve 🕸️📊
Full Spectrum Graph Sieve - Automated Technical Term Extraction and Relationship Mapping
graph-sieve is a standalone utility and service designed to extract relationship-aware domain knowledge from internal documents (.docx, .pptx, .msg, .pdf). It uses a multi-gate verifiable pipeline with local or remote models (OpenAI, Ollama, vLLM) to build a structured knowledge graph of technical terms and their relationships.
Features
- Multi-Gate Extraction: A robust pipeline (Detection -> Extraction -> Validation) ensuring high-fidelity term capture.
- Relationship Mapping: Beyond simple term lookup—builds a Property Graph of how terms relate.
- Multi-Format Support: Handles PDF, PPTX, DOCX, MSG, and images (via OCR) using Microsoft MarkItDown.
- Hebrew & Mixed-Language Handling: Specialized BIDI (Bi-Directional) support for Hebrew-English technical documents, ensuring technical terms are correctly extracted from mixed-language contexts.
- Flexible LLM Backend: Run locally with Ollama/vLLM for privacy, or use OpenAI for scale.
- Interactive Visualization: Generate dynamic, relationship-aware graph visualizations.
- MCP Server: Integrated Model Context Protocol (MCP) server for seamless AI agent integration.
Installation
pip install graph-sieve
Quick Start
-
Configure Your LLM: Create a
.envfile in your working directory:LLM_PROVIDER=openai OPENAI_API_KEY=your_key_here
Or use Ollama (default):
LLM_PROVIDER=ollama OLLAMA_BASE_URL=http://localhost:11434 MODEL_NAME=llama3
-
Scan a Directory:
graph-sieve-scan ./path/to/documents --dict my_dictionary.json
-
Visualize the Results:
graph-sieve-visualize --dict my_dictionary.json
CLI Commands
graph-sieve-scan: Extract terms from a directory or file.graph-sieve-lookup: Query terms and their graph context.graph-sieve-visualize: Generate an interactive HTML graph.graph-sieve-mcp: Launch the MCP server.graph-sieve-whois: Find the source document for a specific term.
Configuration (Environment Variables)
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER |
openai, ollama, or vllm |
ollama |
OPENAI_API_KEY |
Required if using OpenAI | None |
OLLAMA_BASE_URL |
URL for Ollama API | http://localhost:11434 |
MODEL_NAME |
Model to use for extraction | gpt-4o-mini / llama3 |
STORAGE_DIR |
Directory for graph data | Platform-specific |
AI Agent Integration
Claude Desktop / Gemini CLI
To use Graph-Sieve as a tool, add it to your agent's config:
{
"mcpServers": {
"graph-sieve": {
"command": "graph-sieve-mcp",
"args": []
}
}
}
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file graph_sieve-1.1.0.tar.gz.
File metadata
- Download URL: graph_sieve-1.1.0.tar.gz
- Upload date:
- Size: 60.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3009ff04232b3251d19a609f87a2741a252e3b0d41947dbeaced40d06b2d78ef
|
|
| MD5 |
7804a39c02c66c44efe9cb7a369ea2b4
|
|
| BLAKE2b-256 |
8626e4828d0b4a645b2129aef77a48ce0b70b7276645afe3f5f6150bfd262bbb
|
File details
Details for the file graph_sieve-1.1.0-py3-none-any.whl.
File metadata
- Download URL: graph_sieve-1.1.0-py3-none-any.whl
- Upload date:
- Size: 44.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4318e70c31dfecef661c911420adf52c275e84386963715dac9803c21af5df00
|
|
| MD5 |
3ca61151fa3b04aebb839b217fbdefb0
|
|
| BLAKE2b-256 |
b4397cbc653b0e8a126a7202edbd5e1afe8129db83ec183a0664186e86e0aa88
|