Advanced Knowledge Graph Engine with Document Processing, Semantic Search and Multi-LLM Integration
Project description
LLM Exo-Graph ๐ง ๐ธ๏ธ
An advanced knowledge graph engine that externalizes LLM memory into Neo4j, creating a persistent, searchable brain for AI systems.
๐ Why Exo-Graph?
Traditional LLMs have ephemeral memory. LLM Exo-Graph creates an exocortex - an external brain that:
- ๐ Persists knowledge across sessions
- ๐ Searches with both semantic and graph algorithms
- ๐งฉ Connects information through relationships
- โก Scales beyond context window limitations
๐ฏ The Power of Graph Structure
Subject โ Relationship โ Object = Triplet(metadata)
Our graph structure captures not just entities, but the rich context of their relationships:
God โ CREATED โ man = (summary: God created man in his own image) [conf: 0.90]
God โ DIVIDED โ waters = (summary: God divided the waters) [conf: 0.90]
light โ EXISTS โ light = (summary: there was light) [conf: 0.90]
Benefits of This Approach
-
Enhanced Graph Search
- Traverse relationships with Cypher queries
- Find indirect connections (friend-of-friend)
- Discover patterns and clusters
-
Superior Vector Search
- Summaries provide rich semantic context
- Embeddings capture relationship meaning
- Hybrid search combines graph + semantic
-
Temporal Intelligence
- Track relationship changes over time
- Handle contradictions gracefully
- Maintain complete history
๐๏ธ How It Works
Entity Extraction Pipeline
graph LR
A[Natural Language Input] --> B[LLM Processor]
B --> C{Entity Extraction}
C --> D[Subject Recognition]
C --> E[Relationship Detection]
C --> F[Object Identification]
D --> G[Graph Edge Creation]
E --> G
F --> G
G --> H[Neo4j Storage]
G --> I[Vector Embedding]
I --> J[Semantic Index]
Entity Standardization Process
graph TD
A[Raw Entity/Relationship] --> B[BiEncoder Embedding]
B --> C[Category Classification]
C --> D{Similarity Check}
D -->|High Similarity| E[Use Existing Standard]
D -->|Low Similarity| F[CrossEncoder Verification]
F --> G{Cross-Validation Score}
G -->|Score > Threshold| H[Merge with Standard]
G -->|Score < Threshold| I[Create New Standard]
E --> J[Standardized Output]
H --> J
I --> J
K[Existing Categories] --> C
L[Cached Embeddings] --> D
style B fill:#e1f5fe,stroke:#01579b,stroke-width:2px
style F fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
style J fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px
Item Processing Workflow
graph TD
A[InputItem] --> B[LLM Entity Extraction]
B --> C[Standardization Process]
C --> D{Negation Detection}
D -->|Positive Statement| E[Duplicate Check]
D -->|Negation| F[Conflict Detection]
E -->|New Relationship| G[Create Edge]
E -->|Duplicate Found| H[Skip/Ignore]
F -->|Conflict Found| I[Temporal Resolution]
F -->|No Conflict| J[Log Error]
G --> K[Neo4j Storage]
I --> L[Obsolete Existing]
L --> M[Update Metadata]
K --> N[Vector Embedding]
M --> N
N --> O[Index Update]
P[Temporal Metadata] --> G
P --> I
Q[Confidence Scoring] --> G
Q --> I
style D fill:#fff3e0,stroke:#e65100,stroke-width:2px
style I fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
style N fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
style J fill:#ffebee,stroke:#c62828,stroke-width:2px
Key Processing Features:
- ๐ Standardization: Entities and relationships are normalized using BiEncoder + CrossEncoder
- โ ๏ธ Negation Handling: "Alice no longer works at Google" โ obsoletes existing relationship
- โฐ Temporal Resolution: Automatic conflict resolution with date-based transitions
- ๐ฏ Confidence Scoring: Each relationship has confidence metadata for reliability
- ๐ Duplicate Prevention: Exact matches are detected and skipped
- ๐ Vector Integration: All changes immediately update semantic search indexes
๐ Quick Start
Prerequisites
# Using Docker (Recommended)
docker-compose up -d
# Or use Neo4j Cloud
# Set NEO4J_URI=neo4j+s://your-instance.neo4j.io
Installation
From PyPI (Recommended):
pip install llm-exo-graph
From Source:
git clone https://github.com/your-org/llm-exo-graph
cd llm-exo-graph
pip install -e .
With Optional Dependencies:
# For document processing
pip install "llm-exo-graph[documents]"
# For development
pip install "llm-exo-graph[dev]"
# All features
pip install "llm-exo-graph[all]"
Basic Usage
from llm_exo_graph import ExoGraphEngine, InputItem
# Initialize with auto-configuration
engine = ExoGraphEngine()
# Or with custom encoder models
config = {
"encoder_model": "all-mpnet-base-v2",
"cross_encoder_model": "cross-encoder/ms-marco-MiniLM-L-12-v2"
}
engine = ExoGraphEngine(config=config)
# Feed knowledge
engine.process_input([
InputItem("Marie Curie discovered radium in 1898"),
InputItem("Radium glows green in the dark"),
InputItem("Marie Curie won the Nobel Prize twice")
])
# Query naturally
response = engine.search("What did Marie Curie discover?")
print(response.answer)
# โ "Marie Curie discovered radium in 1898."
๐ค MCP Integration (Model Context Protocol)
What is MCP?
MCP enables AI assistants like Claude to directly interact with your knowledge graph via Server-Sent Events (SSE), creating a persistent memory layer that survives across conversations.
Quick Setup with Docker
-
Start the MCP Server
# Use the notebook docker-compose for MCP development docker-compose -f docker-compose.notebook.yml up -d # This starts: # - Neo4j on port 7687/7474 # - MCP SSE server on port 3000
-
Configure Claude Desktop
// ~/Library/Application Support/Claude/claude_desktop_config.json { "mcpServers": { "exo-graph": { "command": "npx", "args": [ "-y", "mcp-remote", "http://localhost:3000/sse", "--allow-http" ] } } }
-
Restart Claude Desktop - The MCP server will connect automatically
Graph Data Examples
After setup, Claude can work with rich graph relationships like these from our Biblical knowledge graph:
God โ CREATED โ man (God created man in his own image) [conf: 0.90]
God โ DIVIDED โ waters (God divided the waters) [conf: 0.90]
light โ EXISTS โ light (there was light) [conf: 0.90]
God โ SAID โ "Let there be light" (God spoke creation into existence) [conf: 0.95]
man โ MADE_IN_IMAGE_OF โ God (humanity reflects divine nature) [conf: 0.85]
waters โ SEPARATED_BY โ firmament (division of waters above and below) [conf: 0.88]
Using MCP in Claude
Once configured, Claude gains persistent memory and can:
๐พ Store Knowledge Permanently
Claude: "I'll remember that John works at OpenAI as a researcher"
โ Creates: John โ WORKS_AT โ OpenAI (researcher role) [conf: 0.95]
๐ Query Across Sessions
User: "What did we discuss about John yesterday?"
Claude: "You told me John works at OpenAI as a researcher. I have that stored in the knowledge graph."
๐ Discover Connections
User: "How is John connected to AI research?"
Claude: "Through the knowledge graph, I can see John โ WORKS_AT โ OpenAI โ FOCUSES_ON โ AI Research"
๐ Analyze Patterns
User: "Show me all employment relationships you know about"
Claude: "I found 15 employment relationships in the graph, including John at OpenAI, Alice at Google..."
โฐ Track Changes Over Time
User: "John left OpenAI and joined Google"
Claude: "I've updated the graph - obsoleted John's OpenAI relationship and created a new Google relationship with today's date."
๐ REST API
Quick API Usage
# Start API server
cd kg_api_server
python app/main.py
# Add knowledge
curl -X POST http://localhost:8080/api/v1/process \
-H "Content-Type: application/json" \
-d '{"items": [{"description": "Einstein developed E=mcยฒ"}]}'
# Search
curl http://localhost:8080/api/v1/search?query=Einstein
API Endpoints
POST /api/v1/process- Add knowledgeGET /api/v1/search- Natural language searchGET /api/v1/entities/{name}- Get entity detailsDELETE /api/v1/edges/{id}- Remove relationships
๐ Visualization
Generate beautiful graph visualizations:
python visualize_graph.py
Creates three outputs in /output:
- ๐
knowledge_graph_relationships.txt- Human-readable relationships - ๐ผ๏ธ
knowledge_graph_static.png- Publication-ready visualization - ๐
knowledge_graph_interactive.html- Interactive exploration
๐ง Configuration
Engine Configuration
from llm_exo_graph import ExoGraphEngine, Neo4jConfig, OllamaConfig
# Custom encoder configuration
config = {
"encoder_model": "all-mpnet-base-v2", # BiEncoder model
"cross_encoder_model": "cross-encoder/ms-marco-MiniLM-L-12-v2" # CrossEncoder model
}
# Initialize with all configurations
engine = ExoGraphEngine(
llm_config=OllamaConfig(model="llama3.2"),
neo4j_config=Neo4jConfig(),
config=config
)
Available Encoder Models
BiEncoder Models (for semantic embeddings):
all-MiniLM-L6-v2(default) - Fast, good qualityall-mpnet-base-v2- Higher quality, slowersentence-transformers/all-MiniLM-L12-v2- Balanced
CrossEncoder Models (for relationship validation):
cross-encoder/ms-marco-MiniLM-L-6-v2(default) - Fastcross-encoder/ms-marco-MiniLM-L-12-v2- More accuratecross-encoder/ms-marco-electra-base- Highest accuracy
Environment Variables
# LLM Configuration (auto-detected)
OPENAI_API_KEY=sk-... # For OpenAI
OLLAMA_BASE_URL=http://localhost:11434 # For Ollama
OLLAMA_MODEL=llama3
# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
NEO4J_DATABASE=neo4j
# Optional
LOG_LEVEL=INFO
๐ Advanced Features
Custom Model Configuration
Choose encoder models based on your needs:
# High Performance Setup (Fast processing)
fast_config = {
"encoder_model": "all-MiniLM-L6-v2",
"cross_encoder_model": "cross-encoder/ms-marco-MiniLM-L-6-v2"
}
# High Accuracy Setup (Better quality)
accurate_config = {
"encoder_model": "all-mpnet-base-v2",
"cross_encoder_model": "cross-encoder/ms-marco-MiniLM-L-12-v2"
}
# Domain-Specific Setup (for scientific/technical content)
domain_config = {
"encoder_model": "sentence-transformers/allenai-specter",
"cross_encoder_model": "cross-encoder/ms-marco-electra-base"
}
engine = ExoGraphEngine(config=accurate_config)
Document Processing
from llm_exo_graph import DocumentProcessor
processor = DocumentProcessor()
results = processor.process_directory("./research_papers/")
Temporal Relationships & Negation Handling
# Example: Career transitions with temporal intelligence
engine.process_input([
InputItem("Alice works as a software engineer at Google"),
InputItem("Alice no longer works at Google"), # Negation - obsoletes previous
InputItem("Alice started working at OpenAI in January 2024") # New relationship
])
# The system automatically:
# 1. Detects "no longer" as negation
# 2. Finds conflicting relationships
# 3. Obsoletes old relationship with end date
# 4. Creates new relationship with start date
Standardization in Action
# These variations are automatically standardized:
engine.process_input([
InputItem("John works at Microsoft"),
InputItem("John is employed by Microsoft"), # Standardized to "WORKS_AT"
InputItem("John's employer is Microsoft"), # Also standardized to "WORKS_AT"
])
# Result: All create the same standardized relationship
# John โ WORKS_AT โ Microsoft (with different summaries)
Conflict Resolution
# Handles contradictions intelligently
history = engine.get_entity_relationships("Alice")
# Shows both relationships with temporal metadata:
# - Alice โ WORKS_AT โ Google [obsolete: 2024-01-15]
# - Alice โ WORKS_AT โ OpenAI [active: 2024-01-16]
๐งช Examples
- ๐ Bible Knowledge Graph
- ๐งฌ Bio Research Graph
- ๐ Document Processing
- ๐ API Integration
๐ ๏ธ Development
Running Tests
pytest tests/
cd kg_api_server && pytest tests/
Contributing
See CONTRIBUTING.md
๐ Performance
- โก 50-74% faster queries with optimizations
- ๐ Batch processing for large datasets
- ๐พ Intelligent caching layers
- ๐ฏ Optimized Neo4j indexes
๐ฆ Package Information
- PyPI: https://pypi.org/project/llm-exo-graph/
- Install:
pip install llm-exo-graph - Version: Check latest on PyPI
- Extras:
[documents],[dev],[all]
๐ค Community
- ๐ Documentation
- ๐ Issues
- ๐ฌ Discussions
- ๐ฆ PyPI Package
๐ License
MIT License - see LICENSE
LLM Exo-Graph - Giving AI a persistent, searchable memory ๐ง โจ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_exo_graph-1.2.1.tar.gz.
File metadata
- Download URL: llm_exo_graph-1.2.1.tar.gz
- Upload date:
- Size: 11.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb17c6dee580ab05012a9e34b282d48d9ef894224abecdb0c54becdea5b2c6d3
|
|
| MD5 |
a43e079d5140ab6d5fc948c73c3c548c
|
|
| BLAKE2b-256 |
180fc16eaca928933c46225a633a644779043956e7689ce14485c3abe2d3a136
|
File details
Details for the file llm_exo_graph-1.2.1-py3-none-any.whl.
File metadata
- Download URL: llm_exo_graph-1.2.1-py3-none-any.whl
- Upload date:
- Size: 59.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55ce2fe2a411158343f25fc528af0ea9ac3066f9ef2e06b5b4d573ac82ba790b
|
|
| MD5 |
5c276217c7fa1de8efe99e7894cd1f7b
|
|
| BLAKE2b-256 |
d934ea73defa3b8e67c6915e2dac8eabb2161da1639988440bd83e5e07389930
|