Skip to main content

Add your description here

Project description

kgnode

Knowledge Graph Agnostic Node for Knowledge-Aware LLM Applications

Overview

kgnode is a Python library that extracts relevant subgraphs from large knowledge graphs using a path-aware Markov chain algorithm for question answering tasks.

Implementation Summary:

  1. Kgnode - work in progress
  2. Initial Dataset: DBLP-QuAD
  3. Knowledge graph embedding ❌
  4. Simple text embedding with basic template ✅
  5. Initial Vector DB: ChromaDB
  6. Framework: LangGraph
  7. Seed node identification strategy:
    • SPARQL text search (1-hop nodes)
    • High-frequency node (degree) semantic search (2-3 hop nodes)
    • Compile VectorDB with top 1 million nodes
  8. Node pruning algorithm: Path-aware Markov chain (relevant subgraph identification)
    • P(v→w) ∝ base_weight(v,w) × f(history,v,w)
    • Initially using P(v→w) ∝ softmax(cos(path_embedding, template_embedding))
    • path_embedding == f(a, r, b, r, v, r, w)
    • Query → template → template_embedding
    • Stops when p gets smaller than previous step or reaches 10 hops
  9. Generate SPARQL for answering the query, using the subgraph as context
  10. Generate answer of the query by executing SPARQL and using subgraph

Installation

pip install kgnode

Quick Start

from kgnode import KGConfig, get_seed_nodes, get_subgraphs, generate_answer

# Configure for your knowledge graph
config = KGConfig(
    sparql_endpoint="http://localhost:7878/query",
    embedding_model="all-MiniLM-L6-v2"
)

# Find seed nodes for a query
seed_nodes = get_seed_nodes(query="What papers did John Smith publish?", config=config)

# Extract relevant subgraph
subgraphs = get_subgraphs(seed_node=seed_nodes[0], query="...", config=config)

# Generate answer
answer = generate_answer(query="...", config=config)

Folder Structure

kgnode/
├── src/kgnode/
│   ├── __init__.py              # Public API exports
│   ├── seed_finder.py           # Seed node identification
│   ├── subgraph_extraction.py   # Path-aware Markov chain algorithm
│   ├── generator.py             # SPARQL generation and answer generation
│   ├── validator.py             # Subgraph validation
│   ├── keyword_search.py        # Keyword-based entity search
│   ├── chroma_db.py            # Vector database operations
│   └── core/
│       ├── kg_config.py        # Configuration class
│       ├── sparql_query.py     # SPARQL endpoint communication
│       ├── schema_extractor.py # Schema extraction from ontology/SPARQL
│       ├── schema_chromadb.py  # Schema ChromaDB collections
│       └── schema_selector.py  # Query-aware schema selection
├── tests/                       # Unit tests
├── docs/                        # Documentation
└── _data/                       # Data files (not in repo)

Running Oxigraph SPARQL Server

kgnode requires a SPARQL endpoint. We recommend Oxigraph:

# Start server (read-write)
oxigraph_server serve -l ./oxigraph_db --cors

# Start server (read-only)
oxigraph_server serve-read-only -l ./oxigraph_db --cors

# Load dataset (one-time setup)
oxigraph_server load -l ./oxigraph_db -f _data/dblp.nt

# Custom bind address
oxigraph_server serve -l ~/oxigraph_db --bind 127.0.0.1:7878

Default endpoint: http://localhost:7878/query

Public API

Main Pipeline

from kgnode import (
    citable,                    # Check seed node quality
    get_seed_nodes,             # Find seed nodes (keyword + semantic search)
    get_subgraphs,              # Extract subgraph using path-aware Markov chain
    generate_sparql,            # Generate SPARQL from subgraph
    kg_retrieve,                # Full pipeline: query → subgraph → SPARQL → results
    generate_answer,            # End-to-end answer generation
    generate_answer_using_subgraph,  # Answer generation from subgraph
)

VectorDB Operations

from kgnode import (
    compile_chromadb,           # Build vector DB from knowledge graph
    compile_chromadb_from_csv,  # Build from existing CSV
    semantic_search_entities,   # Semantic search for entities
    load_chromadb,              # Load existing ChromaDB collection
    add_or_update_entities,     # Add/update entity embeddings
    delete_entities,            # Remove entities from vector DB
)

Search Operations

from kgnode import search_entities_by_keywords  # SPARQL keyword search

Validation

from kgnode import validate_subgraph  # Validate extracted subgraph

Core Configuration

from kgnode import KGConfig, execute_sparql_query

# Create configuration
config = KGConfig(
    sparql_endpoint="http://localhost:7878/query",
    embedding_model="all-MiniLM-L6-v2",
    openai_model="gpt-4o-mini"
)

# Execute SPARQL queries
results = execute_sparql_query(query="SELECT * WHERE { ?s ?p ?o } LIMIT 10", config=config)

TODOs

LangGraph Integration

  • Orchestrate workflow with LangGraph
  • Add visualization support

Documentation

For detailed usage, API reference, and examples, see docs/USAGE.md or visit the online documentation.

Dataset

DBLP-QuAD - Academic publications knowledge graph

Supported Technologies

Vector Databases

  • ChromaDB ✅ (implemented)
  • Pinecone (planned)
  • Qdrant (planned)

Embedding Models

  • all-MiniLM-L6-v2 ✅ (default, 384 dimensions)
  • google/embeddinggemma-300m (alternative)

License

MIT

Testing

Run All Tests

python tests/test_runner.py

Run Specific Tests

# Run single test file
python tests/test_runner.py chromadb

# Run multiple test files
python tests/test_runner.py chromadb seed_finder subgraph_extraction

# List available tests
python tests/test_runner.py --list

# Run standalone test file
python tests/test_chromadb.py

Prerequisites

  • Oxigraph SPARQL server running at http://localhost:7878/query
  • OPENAI_API_KEY environment variable set
  • ChromaDB created (happens automatically on first run)

Option 1: Environment Variable (Before Running)

Show everything (including debug messages)

export KGNODE_LOG_LEVEL=DEBUG python your_script.py

Show only warnings and errors (less verbose)

export KGNODE_LOG_LEVEL=WARNING python your_script.py

Completely silent

export KGNODE_LOG_LEVEL=CRITICAL python your_script.py

Option 2: In Your Code

from kgnode.core import set_log_level

Show debug messages

set_log_level("DEBUG")

Only warnings and errors

set_log_level("WARNING")

Completely silent

from kgnode.core import disable_logging disable_logging()

Configuration Options:

To Make More Aggressive (even fewer subgraphs):

config = KGConfig.default( min_subgraphs=2, # Stop after just 2 max_subgraphs=10, # Lower hard cap quality_threshold_ratio=0.75, # Stop at 75% of median (stricter) absolute_prob_threshold=2.0 # Higher absolute floor )

To Make More Conservative (more subgraphs):

config = KGConfig.default( min_subgraphs=5, # Collect at least 5 max_subgraphs=20, # Higher hard cap quality_threshold_ratio=0.4, # Stop at 40% of median (looser) absolute_prob_threshold=1.0 # Lower absolute floor )

To Disable Adaptive Stopping:

config = KGConfig.default( min_subgraphs=25, # Never trigger adaptive stop max_subgraphs=25 # Just use hard limit )

Benefits:

  1. Quality over quantity: Only collects semantically relevant paths
  2. Adaptive to question complexity: Simple questions stop early, hard questions use more budget
  3. Configurable: Can tune without code changes
  4. Fast: Early termination reduces SPARQL queries & embeddings
  5. Clean data: Less noise for SPARQL generation stage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kgnode-0.2.0.tar.gz (73.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kgnode-0.2.0-py3-none-any.whl (88.9 kB view details)

Uploaded Python 3

File details

Details for the file kgnode-0.2.0.tar.gz.

File metadata

  • Download URL: kgnode-0.2.0.tar.gz
  • Upload date:
  • Size: 73.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kgnode-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d5d492eaf9d0fe2fdf1667f1c1e66b0939d413fb71d5347e8972147395b0ce57
MD5 3be27f9e5f5fd702768b51ad2557f0e3
BLAKE2b-256 9685737223b5dbd31fc7bb7b39bb5bcdae7a89372f6d060fb4bd79ba6e133613

See more details on using hashes here.

File details

Details for the file kgnode-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: kgnode-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 88.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kgnode-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb5b1b1f2423e907d4f16f830fa165c425f34bbfbab2d8362ee40fbd934477a3
MD5 ef4b959e2d2c4faf19a15b90d70d5f8f
BLAKE2b-256 0bfd838156584d1231b84666f346ce55fbca69f158994acf548801616576771a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page