Skip to main content

Full Spectrum Graph Sieve - Automated Technical Term Extraction and Relationship Mapping

Project description

Graph-Sieve 🕸️📊

Full Spectrum Graph Sieve - Automated Technical Term Extraction and Relationship Mapping

graph-sieve is a powerful knowledge management utility and service designed to extract high-fidelity, relationship-aware domain knowledge from unstructured documents (.docx, .pptx, .msg, .pdf, .one). Using a multi-gate verifiable pipeline, it builds a structured knowledge graph that preserves technical context and organizational links.

✨ Core Capabilities

  • 🔍 Multi-Gate Pipeline: A 5-gate extraction flow (Strategic Sieve -> Batch Extraction -> Multi-Source Validation -> Alias Resolution -> Global Synthesis) ensuring high-fidelity term capture with minimal hallucinations.
  • 📄 Multi-Format Support: Native handling of PDF, PPTX, DOCX, MSG, and OneNote (.one) files. Leverages Microsoft MarkItDown for deep document parsing and OCR.
  • 🗺️ Relationship Mapping: Beyond simple term lookup—automatically maps how terms relate (e.g., SUPERSEDES, DEPENDS_ON, HAS_EXPERT).
  • 🌐 Global Synthesis: Automatically clusters the graph into communities and generates executive summaries and a global project narrative.
  • 🇮🇱 Hebrew & Mixed-Language Support: Specialized Bi-Directional (BIDI) support for Hebrew-English technical documents, ensuring technical terms are correctly extracted from mixed-language contexts.
  • ⚙️ Flexible LLM Backend: Run locally with Ollama/vLLM for privacy, or use OpenAI for scale.
  • 📈 Interactive Visualization: Generate dynamic, relationship-aware graph visualizations via PyVis.
  • 🤖 MCP Server: Integrated Model Context Protocol (MCP) server for seamless integration with AI agents like Claude Desktop or Gemini CLI.

🚀 Quick Start

  1. Configure Your LLM: Create a .env file in your working directory:

    LLM_PROVIDER=openai
    OPENAI_API_KEY=your_key_here
    MODEL_NAME=gpt-4o-mini
    

    Or use local Ollama (default):

    LLM_PROVIDER=ollama
    OLLAMA_BASE_URL=http://localhost:11434
    MODEL_NAME=llama3
    
  2. Scan a Directory:

    graph-sieve-scan ./path/to/documents --db my_knowledge.db
    
  3. Visualize the Results:

    graph-sieve-visualize --db my_knowledge.db
    

🛠️ CLI Command Reference

  • graph-sieve-scan <path>: Extract terms from a directory or file.
    • --db <path>: Path to the SQLite database (default: platform-standard data dir).
    • --seed <path>: High-authority documents to process first.
    • --whitelist <path>: Text file with terms to always include.
    • --retry-failed: Retry processing chunks from the Dead Letter Queue (DLQ).
  • graph-sieve-lookup <term>: Query a term, its definition, and its graph context.
  • graph-sieve-visualize: Generate an interactive HTML graph.
  • graph-sieve-mcp: Launch the MCP server.
  • graph-sieve-whois <term>: Identify experts, owners, and organizations responsible for a term.

📖 Advanced Workflow

💎 Seed Documents

Use the --seed flag to process "Golden" documents (specs, architecture docs) before general notes. This sets the ground truth for term definitions and relationships.

🔗 Alias Resolution & Canonicalization

Graph-Sieve automatically performs LLM-verified canonicalization. If it finds "AIP" and "AI Platform" in the same context, it will attempt to merge them into a single canonical entry with appropriate aliases.

🆘 Dead Letter Queue (DLQ)

If an LLM call fails or a chunk is too complex, it's pushed to the DLQ. Use graph-sieve-scan --retry-failed to re-process these chunks after updating your configuration or models.

⚙️ Configuration (Environment Variables)

Variable Description Default
LLM_PROVIDER openai, ollama, or vllm openai
OPENAI_API_KEY Required if using OpenAI None
OLLAMA_BASE_URL URL for Ollama API http://localhost:11434
MODEL_NAME Model to use for extraction gpt-4o-mini
STORAGE_DIR Directory for graph data Platform-specific

🧩 AI Agent Integration

Add Graph-Sieve to your MCP-compatible agent's configuration:

{
  "mcpServers": {
    "graph-sieve": {
      "command": "graph-sieve-mcp",
      "args": []
    }
  }
}

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_sieve-1.2.1.tar.gz (75.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graph_sieve-1.2.1-py3-none-any.whl (50.5 kB view details)

Uploaded Python 3

File details

Details for the file graph_sieve-1.2.1.tar.gz.

File metadata

  • Download URL: graph_sieve-1.2.1.tar.gz
  • Upload date:
  • Size: 75.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for graph_sieve-1.2.1.tar.gz
Algorithm Hash digest
SHA256 5a30f9c4a1d768d792c5b55dbfffe5789c5c93228d93c107aad15265be9c2522
MD5 1395899bc976d74db9040fcb9feecb1b
BLAKE2b-256 90812e3c12409b92c1951207aab2f8ed3015a18c9a4cdc2f6c0c6dcc91f113d2

See more details on using hashes here.

File details

Details for the file graph_sieve-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: graph_sieve-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 50.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for graph_sieve-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9746ca637e20debbcbd3b0f6e39915c00f4d94ec2270cd024ec1c844573cccb9
MD5 8719b622c174bab4e4fc05f0f0c0daf7
BLAKE2b-256 01183f204a31b4e60fba6cd2a23a8c342fbce5584b1afc031781960e3d52b3e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page