Skip to main content

Full Spectrum Graph Sieve - Automated Technical Term Extraction and Relationship Mapping

Project description

Graph-Sieve 🕸️📊

Full Spectrum Graph Sieve - Automated Technical Term Extraction and Relationship Mapping

graph-sieve is a standalone utility and service designed to extract relationship-aware domain knowledge from internal documents (.docx, .pptx, .msg, .pdf). It uses a multi-gate verifiable pipeline with local or remote models (OpenAI, Ollama, vLLM) to build a structured knowledge graph of technical terms and their relationships.

Features

  • Multi-Gate Extraction: A robust pipeline (Detection -> Extraction -> Validation) ensuring high-fidelity term capture.
  • Relationship Mapping: Beyond simple term lookup—builds a Property Graph of how terms relate.
  • Multi-Format Support: Handles PDF, PPTX, DOCX, MSG, and images (via OCR) using Microsoft MarkItDown.
  • Hebrew & Mixed-Language Handling: Specialized BIDI (Bi-Directional) support for Hebrew-English technical documents, ensuring technical terms are correctly extracted from mixed-language contexts.
  • Flexible LLM Backend: Run locally with Ollama/vLLM for privacy, or use OpenAI for scale.
  • Interactive Visualization: Generate dynamic, relationship-aware graph visualizations.
  • MCP Server: Integrated Model Context Protocol (MCP) server for seamless AI agent integration.

Installation

pip install graph-sieve

Quick Start

  1. Configure Your LLM: Create a .env file in your working directory:

    LLM_PROVIDER=openai
    OPENAI_API_KEY=your_key_here
    

    Or use Ollama (default):

    LLM_PROVIDER=ollama
    OLLAMA_BASE_URL=http://localhost:11434
    MODEL_NAME=llama3
    
  2. Scan a Directory:

    graph-sieve-scan ./path/to/documents --dict my_dictionary.json
    
  3. Visualize the Results:

    graph-sieve-visualize --dict my_dictionary.json
    

CLI Commands

  • graph-sieve-scan: Extract terms from a directory or file.
  • graph-sieve-lookup: Query terms and their graph context.
  • graph-sieve-visualize: Generate an interactive HTML graph.
  • graph-sieve-mcp: Launch the MCP server.
  • graph-sieve-whois: Find the source document for a specific term.

Configuration (Environment Variables)

Variable Description Default
LLM_PROVIDER openai, ollama, or vllm ollama
OPENAI_API_KEY Required if using OpenAI None
OLLAMA_BASE_URL URL for Ollama API http://localhost:11434
MODEL_NAME Model to use for extraction gpt-4o-mini / llama3
STORAGE_DIR Directory for graph data Platform-specific

AI Agent Integration

Claude Desktop / Gemini CLI

To use Graph-Sieve as a tool, add it to your agent's config:

{
  "mcpServers": {
    "graph-sieve": {
      "command": "graph-sieve-mcp",
      "args": []
    }
  }
}

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_sieve-1.1.0.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graph_sieve-1.1.0-py3-none-any.whl (44.6 kB view details)

Uploaded Python 3

File details

Details for the file graph_sieve-1.1.0.tar.gz.

File metadata

  • Download URL: graph_sieve-1.1.0.tar.gz
  • Upload date:
  • Size: 60.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for graph_sieve-1.1.0.tar.gz
Algorithm Hash digest
SHA256 3009ff04232b3251d19a609f87a2741a252e3b0d41947dbeaced40d06b2d78ef
MD5 7804a39c02c66c44efe9cb7a369ea2b4
BLAKE2b-256 8626e4828d0b4a645b2129aef77a48ce0b70b7276645afe3f5f6150bfd262bbb

See more details on using hashes here.

File details

Details for the file graph_sieve-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: graph_sieve-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for graph_sieve-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4318e70c31dfecef661c911420adf52c275e84386963715dac9803c21af5df00
MD5 3ca61151fa3b04aebb839b217fbdefb0
BLAKE2b-256 b4397cbc653b0e8a126a7202edbd5e1afe8129db83ec183a0664186e86e0aa88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page