Skip to main content

Advanced Knowledge Graph Engine with semantic search and temporal tracking

Project description

Knowledge Graph Engine v2

Modern Neo4j-based knowledge graph engine with semantic search capabilities, intelligent relationship management, and performance optimizations.

๐ŸŽฏ Overview

A production-ready knowledge graph system built entirely on Neo4j for persistent graph storage and vector search. Combines graph database operations with semantic vector search to provide intelligent information storage, retrieval, and reasoning.

โœจ Key Features

  • ๐Ÿ—๏ธ Neo4j-Native Architecture: Complete Neo4j integration for both graph and vector operations
  • ๐Ÿ” Enhanced Semantic Search: Improved vector search with dynamic thresholds and contextual boosting
  • ๐Ÿค– LLM Integration: OpenAI/Ollama support for entity extraction and query processing
  • โš”๏ธ Conflict Resolution: Intelligent handling of contradicting information with temporal tracking
  • โฐ Temporal Tracking: Complete relationship history with date ranges and conflict resolution
  • ๐ŸŽฏ Smart Query Understanding: Context-aware search with semantic category matching
  • ๐Ÿ“Š Optimized Performance: 50-74% faster queries with smart caching and lazy loading
  • ๐Ÿš€ Production Ready: ACID compliance, comprehensive error handling, modern architecture
  • ๐Ÿท๏ธ Edge Classification: Intelligent edge categorization with vector similarity (85% threshold)
  • ๐Ÿ”„ Complete CRUD API: Full create, read, update, delete operations for edges and nodes
  • ๐Ÿ“ฆ External Package Support: Clean API exports for use as external dependency

๐Ÿ†• New in v2.1.0

  • โšก Performance Optimizations: GraphQueryOptimizer and Neo4jOptimizer for 50-74% faster queries
  • ๐Ÿ’พ Smart Caching: Query result caching with 5-minute TTL for near-instant repeated queries
  • ๐Ÿ”ง Refactored GraphEdge: Lazy loading with safe accessors, 18% smaller codebase
  • ๐Ÿ› ๏ธ Dynamic Relationships: WORKS_AT, LIVES_IN instead of generic RELATES_TO
  • ๐Ÿ› Bug Fixes: Fixed "Relationship not populated" errors, enhanced source filtering
  • ๐Ÿท๏ธ Edge Classifier System: Vector similarity-based edge classification (replaced LLM approach)
  • ๐Ÿ”„ CRUD Operations: Complete API for edge and node management including merge operations
  • ๐Ÿ“ฆ API Exports: All types exported for external package usage
  • ๐ŸŒ Separate API Server: Production-ready FastAPI server as external project

๐Ÿ“ Project Structure

src/                                  # Main source directory
โ”œโ”€โ”€ kg_engine/                        # Knowledge Graph Engine
โ”‚   โ”œโ”€โ”€ core/                         # Core engine
โ”‚   โ”‚   โ””โ”€โ”€ engine.py                 # Main KG Engine
โ”‚   โ”œโ”€โ”€ models/                       # Data models
โ”‚   โ”‚   โ”œโ”€โ”€ models.py                 # Graph data structures
โ”‚   โ”‚   โ””โ”€โ”€ classifier_map.py         # Edge classifier management
โ”‚   โ”œโ”€โ”€ storage/                      # Storage components
โ”‚   โ”‚   โ”œโ”€โ”€ graph_db.py               # Neo4j graph operations
โ”‚   โ”‚   โ”œโ”€โ”€ neo4j_vector_store.py     # Vector storage
โ”‚   โ”‚   โ”œโ”€โ”€ vector_store.py           # Vector store interface
โ”‚   โ”‚   โ””โ”€โ”€ ...                       # Other storage components
โ”‚   โ”œโ”€โ”€ llm/                          # LLM integration
โ”‚   โ”‚   โ””โ”€โ”€ llm_interface.py          # OpenAI/Ollama interface
โ”‚   โ”œโ”€โ”€ config/                       # Configuration
โ”‚   โ”‚   โ”œโ”€โ”€ neo4j_config.py           # Neo4j settings
โ”‚   โ”‚   โ””โ”€โ”€ neo4j_schema.py           # Schema management
โ”‚   โ”œโ”€โ”€ utils/                        # Utilities
โ”‚   โ”‚   โ”œโ”€โ”€ date_parser.py            # Date parsing utilities
โ”‚   โ”‚   โ”œโ”€โ”€ graph_query_optimizer.py  # Query optimization
โ”‚   โ”‚   โ”œโ”€โ”€ neo4j_optimizer.py        # Neo4j optimizations
โ”‚   โ”‚   โ””โ”€โ”€ classifier_detector.py    # Edge classification
โ”‚   โ””โ”€โ”€ __init__.py                   # Package exports
โ”œโ”€โ”€ api/                              # API endpoints
โ”‚   โ””โ”€โ”€ main.py                       # FastAPI CRUD operations
โ”œโ”€โ”€ examples/                         # Usage examples
โ”‚   โ”œโ”€โ”€ examples.py                   # Basic examples
โ”‚   โ”œโ”€โ”€ bio_example.py                # Biographical demo
โ”‚   โ””โ”€โ”€ simple_bio_demo.py            # Simple demo
โ””โ”€โ”€ tests/                            # Test suite

kg_api_server/                        # Separate API server project
โ”œโ”€โ”€ app/                              # FastAPI application
โ”‚   โ”œโ”€โ”€ __init__.py                   # Package init
โ”‚   โ””โ”€โ”€ main.py                       # API server implementation
โ”œโ”€โ”€ tests/                            # API tests
โ”œโ”€โ”€ requirements.txt                  # Dependencies
โ”œโ”€โ”€ Dockerfile                        # Container configuration
โ”œโ”€โ”€ docker-compose.yml                # Full stack deployment
โ””โ”€โ”€ README.md                         # API documentation

docs/                                 # Comprehensive documentation
โ”œโ”€โ”€ architecture/                     # System design
โ”œโ”€โ”€ user-guide/                       # Getting started
โ”œโ”€โ”€ api/                              # API reference
โ””โ”€โ”€ development/                      # Development guides

๐Ÿš€ Quick Start

Prerequisites

# Install Neo4j (required)
docker run --name neo4j -p7474:7474 -p7687:7687 -d \
    -e NEO4J_AUTH=neo4j/password \
    neo4j:latest

Installation

pip install -e .

Basic Usage

As a Library

from kg_engine import KnowledgeGraphEngineV2, InputItem, Neo4jConfig

# Initialize with Neo4j
engine = KnowledgeGraphEngineV2(
    api_key="your-openai-key",  # or "ollama" for local LLM
    neo4j_config=Neo4jConfig()
)

# Add knowledge
result = engine.process_input([
    InputItem(description="Alice works as a software engineer at Google"),
    InputItem(description="Bob lives in San Francisco")
])

# Search with natural language
response = engine.search("Who works at Google?")
print(response.answer)  # "Alice works as a software engineer at Google."

Using the API Server

# Start the API server
cd kg_api_server
python app/main.py

# Process text via API
curl -X POST "http://localhost:8080/process" \
     -H "Content-Type: application/json" \
     -d '{
       "texts": ["Alice works at Google", "Bob lives in San Francisco"]
     }'

# Search via API
curl -X POST "http://localhost:8080/search" \
     -H "Content-Type: application/json" \
     -d '{"query": "Who works at Google?"}'

๐Ÿค– LLM Setup Options

Option 1: OpenAI (Recommended for Production)

export OPENAI_API_KEY="your-api-key"
engine = KnowledgeGraphEngineV2(
    api_key="your-openai-key",
    model="gpt-4.1-nano"  # Fast and cost-effective
)

Option 2: Local Ollama (Privacy & Cost-Free)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Start server
ollama serve

# Pull a model
ollama pull llama3.2:3b  # Recommended: good balance of size/performance
engine = KnowledgeGraphEngineV2(
    api_key="ollama",
    base_url="http://localhost:11434/v1",
    model="llama3.2:3b"
)

๐Ÿ—๏ธ Optimized Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   LLM Interface โ”‚    โ”‚   Graph Database โ”‚    โ”‚  Vector Store   โ”‚
โ”‚                 โ”‚    โ”‚                  โ”‚    โ”‚                 โ”‚
โ”‚ โ€ข Entity Extractโ”‚    โ”‚ โ€ข Neo4j Native   โ”‚    โ”‚ โ€ข Neo4j Vectors โ”‚
โ”‚ โ€ข Query Parse   โ”‚    โ”‚ โ€ข Query Cache    โ”‚    โ”‚ โ€ข Semantic      โ”‚
โ”‚ โ€ข Answer Gen.   โ”‚    โ”‚ โ€ข Optimizations  โ”‚    โ”‚ โ€ข Search        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                       โ”‚                       โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚ KG Engine v2        โ”‚
                    โ”‚  (Optimized)        โ”‚
                    โ”‚                     โ”‚
                    โ”‚ โ€ข Process Input     โ”‚
                    โ”‚ โ€ข Smart Updates     โ”‚
                    โ”‚ โ€ข Hybrid Search     โ”‚
                    โ”‚ โ€ข Query Caching     โ”‚
                    โ”‚ โ€ข Safe Accessors    โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“Š Advanced Features

Edge Classification System

# Automatic edge classification with vector similarity (85% threshold)
engine.process_input([
    InputItem(description="Alice works at Google"),      # โ†’ category: "business"
    InputItem(description="Bob lives in Paris"),         # โ†’ category: "location"
    InputItem(description="Charlie loves photography")   # โ†’ category: "interests"
])

# Similar edges are grouped intelligently
# "works_at", "employed_by", "works_for" โ†’ all map to WORKS_AT relationship

Complete CRUD Operations

# Create edges manually
from kg_engine import EdgeData, EdgeMetadata, RelationshipStatus

metadata = EdgeMetadata(
    summary="John is the CTO of TechCorp",
    confidence=0.95,
    category="business",
    status=RelationshipStatus.ACTIVE
)

edge_data = EdgeData(
    subject="John",
    relationship="WORKS_AS",
    object="CTO at TechCorp",
    metadata=metadata
)

# Node operations
engine.graph_db.merge_nodes_auto("John Smith", "J. Smith")  # Auto merge
engine.graph_db.merge_nodes_manual("John", "Jonathan", "John Smith")  # Manual

Intelligent Conflict Resolution

# Initial information
engine.process_input([InputItem(description="Alice lives in Boston")])

# Update with conflicting information (automatically resolves)
engine.process_input([InputItem(description="Alice moved to Seattle in 2024")])

# System automatically:
# 1. Marks old relationship as obsolete
# 2. Adds new relationship as active
# 3. Maintains complete history

Optimized Search Performance

# Fast cached queries (< 1ms for repeated searches)
response = engine.search("Who works in technology?")  # First call: ~100ms
response = engine.search("Who works in technology?")  # Cached: < 1ms

# Enhanced semantic understanding with contextual boosting
response = engine.search("Who was born in Europe?")
# โœ… Returns all European births: Berlin, Lyon, Barcelona, Paris

# Safe relationship access (no more "Relationship not populated" errors)
for result in response.results:
    edge = result.triplet.edge
    subject = edge.get_subject_safe()  # Safe accessor
    relationship = edge.get_relationship_safe()  # Safe accessor
    obj = edge.get_object_safe()  # Safe accessor

Temporal Relationship Tracking

# Natural language dates with simple parse_date utility
from kg_engine import parse_date

engine.process_input([
    InputItem(description="Project started", from_date=parse_date("2 months ago")),
    InputItem(description="Alice joined", from_date=parse_date("last week"))
])

๐Ÿ“ฆ Using as External Package

The KG Engine is designed to be used as an external dependency in your projects:

# Import all needed components
from kg_engine import (
    KnowledgeGraphEngineV2, InputItem, Neo4jConfig,
    EdgeData, EdgeMetadata, RelationshipStatus,
    SearchType, parse_date, __version__
)

print(f"Using KG Engine v{__version__}")

# Full API available for external applications
# See kg_api_server/ for a complete FastAPI example

API Server Example

A complete FastAPI server is provided as a separate project in kg_api_server/:

cd kg_api_server
pip install -r requirements.txt
python app/main.py  # Starts at http://localhost:8080

Features:

  • Complete REST API with all CRUD operations
  • Interactive documentation at /docs
  • Docker support for production deployment
  • Comprehensive test suite

๐Ÿ“š Documentation

๐Ÿšฆ Running Examples

# Run basic examples
python src/examples/examples.py

# Run biographical knowledge graph demo  
python src/examples/simple_bio_demo.py

# Verify project structure
python verify_structure.py

Expected output:

โœ… Neo4j connection verified
๐Ÿš€ Knowledge Graph Engine v2 initialized
   - Vector store: kg_v2 (neo4j)
   - Graph database: Neo4j (persistent)
   
=== Example: Semantic Relationship Handling ===
1. Adding: John Smith teaches at MIT
   Result: 1 new edge(s) created
...

๐Ÿ” Search Capabilities

The Knowledge Graph Engine v2 features advanced semantic search with:

  • Performance Optimizations: Query caching, lazy loading, and optimized Cypher queries
  • Dynamic Similarity Thresholds: Base threshold of 0.3 with context-specific adjustments
  • Semantic Category Matching: Understands relationships between concepts (e.g., "technology" โ†’ "software engineer")
  • Query-Specific Boosting: Different query types get tailored relevance scoring
  • Geographic Intelligence: Recognizes European cities and other geographic relationships
  • Safe Data Access: Robust error handling with safe accessor methods

Example Queries

# Technology and profession queries
"Who works in technology?" โ†’ Finds software engineers, developers, tech professionals
"Tell me about engineers" โ†’ Returns all engineering-related professions

# Geographic queries  
"Who was born in Europe?" โ†’ Finds Berlin, Lyon, Barcelona, Paris births
"Who lives in Paris?" โ†’ Returns all Paris residents

# Activity and interest queries
"What do people do for hobbies?" โ†’ Returns all "enjoys" relationships
"Tell me about photographers" โ†’ Finds people who enjoy or specialize in photography

# Entity-specific queries
"Tell me about Emma Johnson" โ†’ Returns all relationships for Emma

๐Ÿงช Testing

Run the comprehensive test suite:

# Core integration tests
python test_neo4j_integration.py

# Performance optimization tests
python test_optimizations.py

# Relationship fix validation
python test_relationship_fix.py

# Edge classifier tests
python test_classifier_system.py

# API export tests
python test_api_exports.py

# Quick validation
python test_quick_relationship_fix.py

# API server tests (from kg_api_server directory)
cd kg_api_server && pytest tests/

๐Ÿ“ˆ Performance Benchmarks

Operation Before Optimization After Optimization Improvement
Entity Exploration 20-50ms 8-15ms ~60% faster
Vector Search 100-200ms 40-80ms ~50% faster
Conflict Detection 150-300ms 50-100ms ~67% faster
Path Finding 80-160ms 25-50ms ~70% faster
Cached Queries N/A < 1ms Near-instant

๐Ÿ”ง Development

For development setup and contributing guidelines, see docs/development/README.md.

Key Implementation Details

# Safe edge property access
edge = result.triplet.edge
if edge.has_graph_data():
    subject, relationship, obj = edge.get_graph_data()
else:
    subject = edge.get_subject_safe() or "Unknown"
    relationship = edge.get_relationship_safe() or "Unknown"
    obj = edge.get_object_safe() or "Unknown"

# Optimized queries with caching
cache_key = f"entity_exploration_{entity_name}"
if cached_result := self.graph_db._get_cache(cache_key):
    return cached_result
    
result = self.graph_db.get_entity_relationships_optimized(entity_name)
self.graph_db._set_cache(cache_key, result)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kg_engine_v2-2.3.1.tar.gz (137.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kg_engine_v2-2.3.1-py3-none-any.whl (75.8 kB view details)

Uploaded Python 3

File details

Details for the file kg_engine_v2-2.3.1.tar.gz.

File metadata

  • Download URL: kg_engine_v2-2.3.1.tar.gz
  • Upload date:
  • Size: 137.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kg_engine_v2-2.3.1.tar.gz
Algorithm Hash digest
SHA256 5424580aaa425715a8d7b086c13e6fd9c851769fdbeec12f9adee473349173f1
MD5 44c63ed6d02efe3a223c66c2dc1fce5a
BLAKE2b-256 2cbb2eedbef0e59648f0b32588472f8360d2865fe4132d3bb49d7ca21855d987

See more details on using hashes here.

File details

Details for the file kg_engine_v2-2.3.1-py3-none-any.whl.

File metadata

  • Download URL: kg_engine_v2-2.3.1-py3-none-any.whl
  • Upload date:
  • Size: 75.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kg_engine_v2-2.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 81e1b4bf158f5d148d6e7afdb89d80a04cf45d5fd3f79042a9885a9d9412c0de
MD5 371f7bd114493393ffe5a8fda36c10f8
BLAKE2b-256 d1aea3388592b178f2023a2005249fea8b4d1ec35d3b2f2ecb9808d46822e37e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page