Add your description here
Project description
kgnode
Knowledge Graph Agnostic Node for Knowledge-Aware LLM Applications
Overview
kgnode is a Python library that extracts relevant subgraphs from large knowledge graphs using a path-aware Markov chain algorithm for question answering tasks.
Implementation Summary:
- Kgnode - work in progress
- Initial Dataset: DBLP-QuAD
- Knowledge graph embedding ❌
- Simple text embedding with basic template ✅
- Initial Vector DB: ChromaDB
- Framework: LangGraph
- Seed node identification strategy:
- SPARQL text search (1-hop nodes)
- High-frequency node (degree) semantic search (2-3 hop nodes)
- Compile VectorDB with top 1 million nodes
- Node pruning algorithm: Path-aware Markov chain (relevant subgraph identification)
- P(v→w) ∝ base_weight(v,w) × f(history,v,w)
- Initially using P(v→w) ∝ softmax(cos(path_embedding, template_embedding))
- path_embedding == f(a, r, b, r, v, r, w)
- Query → template → template_embedding
- Stops when p gets smaller than previous step or reaches 10 hops
- Generate SPARQL for answering the query, using the subgraph as context
- Generate answer of the query by executing SPARQL and using subgraph
Installation
pip install kgnode
Quick Start
from kgnode import KGConfig, get_seed_nodes, get_subgraphs, generate_answer
# Configure for your knowledge graph
config = KGConfig(
sparql_endpoint="http://localhost:7878/query",
embedding_model="all-MiniLM-L6-v2"
)
# Find seed nodes for a query
seed_nodes = get_seed_nodes(query="What papers did John Smith publish?", config=config)
# Extract relevant subgraph
subgraphs = get_subgraphs(seed_node=seed_nodes[0], query="...", config=config)
# Generate answer
answer = generate_answer(query="...", config=config)
Folder Structure
kgnode/
├── src/kgnode/
│ ├── __init__.py # Public API exports
│ ├── seed_finder.py # Seed node identification
│ ├── subgraph_extraction.py # Path-aware Markov chain algorithm
│ ├── generator.py # SPARQL generation and answer generation
│ ├── validator.py # Subgraph validation
│ ├── keyword_search.py # Keyword-based entity search
│ ├── chroma_db.py # Vector database operations
│ └── core/
│ ├── kg_config.py # Configuration class
│ ├── sparql_query.py # SPARQL endpoint communication
│ ├── schema_extractor.py # Schema extraction from ontology/SPARQL
│ ├── schema_chromadb.py # Schema ChromaDB collections
│ └── schema_selector.py # Query-aware schema selection
├── tests/ # Unit tests
├── docs/ # Documentation
└── _data/ # Data files (not in repo)
Running Oxigraph SPARQL Server
kgnode requires a SPARQL endpoint. We recommend Oxigraph:
# Start server (read-write)
oxigraph_server serve -l ./oxigraph_db --cors
# Start server (read-only)
oxigraph_server serve-read-only -l ./oxigraph_db --cors
# Load dataset (one-time setup)
oxigraph_server load -l ./oxigraph_db -f _data/dblp.nt
# Custom bind address
oxigraph_server serve -l ~/oxigraph_db --bind 127.0.0.1:7878
Default endpoint: http://localhost:7878/query
Public API
Main Pipeline
from kgnode import (
citable, # Check seed node quality
get_seed_nodes, # Find seed nodes (keyword + semantic search)
get_subgraphs, # Extract subgraph using path-aware Markov chain
generate_sparql, # Generate SPARQL from subgraph
kg_retrieve, # Full pipeline: query → subgraph → SPARQL → results
generate_answer, # End-to-end answer generation
generate_answer_using_subgraph, # Answer generation from subgraph
)
VectorDB Operations
from kgnode import (
compile_chromadb, # Build vector DB from knowledge graph
compile_chromadb_from_csv, # Build from existing CSV
semantic_search_entities, # Semantic search for entities
load_chromadb, # Load existing ChromaDB collection
add_or_update_entities, # Add/update entity embeddings
delete_entities, # Remove entities from vector DB
)
Search Operations
from kgnode import search_entities_by_keywords # SPARQL keyword search
Validation
from kgnode import validate_subgraph # Validate extracted subgraph
Core Configuration
from kgnode import KGConfig, execute_sparql_query
# Create configuration
config = KGConfig(
sparql_endpoint="http://localhost:7878/query",
embedding_model="all-MiniLM-L6-v2",
openai_model="gpt-4o-mini"
)
# Execute SPARQL queries
results = execute_sparql_query(query="SELECT * WHERE { ?s ?p ?o } LIMIT 10", config=config)
TODOs
LangGraph Integration
- Orchestrate workflow with LangGraph
- Add visualization support
Documentation
For detailed usage, API reference, and examples, see docs/USAGE.md or visit the online documentation.
Dataset
DBLP-QuAD - Academic publications knowledge graph
- Source: https://dblp.org/rdf/
- Download: https://zenodo.org/records/7638511
- Paper: DBLP-QuAD (ECIR 2023)
- Stats: 252M triples, 92M entities, 62 relations
Supported Technologies
Vector Databases
- ChromaDB ✅ (implemented)
- Pinecone (planned)
- Qdrant (planned)
Embedding Models
- all-MiniLM-L6-v2 ✅ (default, 384 dimensions)
- google/embeddinggemma-300m (alternative)
License
MIT
Testing
Run All Tests
python tests/test_runner.py
Run Specific Tests
# Run single test file
python tests/test_runner.py chromadb
# Run multiple test files
python tests/test_runner.py chromadb seed_finder subgraph_extraction
# List available tests
python tests/test_runner.py --list
# Run standalone test file
python tests/test_chromadb.py
Prerequisites
- Oxigraph SPARQL server running at
http://localhost:7878/query OPENAI_API_KEYenvironment variable set- ChromaDB created (happens automatically on first run)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kgnode-0.1.1.tar.gz.
File metadata
- Download URL: kgnode-0.1.1.tar.gz
- Upload date:
- Size: 44.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aeac421c6f60773d992630611f1a263e8b0b9019e4c87e52ca064a661b8980f4
|
|
| MD5 |
8e5ad4c8b7225b9774fbca64f53d55d4
|
|
| BLAKE2b-256 |
fbf6610515963f8f93efc450ab363d62b01000582f4c6f881aff071ceef04aee
|
File details
Details for the file kgnode-0.1.1-py3-none-any.whl.
File metadata
- Download URL: kgnode-0.1.1-py3-none-any.whl
- Upload date:
- Size: 53.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86ba017d2d7a2aa1cb22aea0c521e01aa94e87085c4d1a0c1b6ac11fb9a077a3
|
|
| MD5 |
f8a9684607450c93f24bea484a98b657
|
|
| BLAKE2b-256 |
f46f28ca5fca3461c05d748ae60ae08bff53a711db0cdbd50132a590ae51d745
|