Odin: Graph intelligence engine with PPR + NPLL for AI agent navigation
Project description
Odin KG Engine
Graph Intelligence for Autonomous AI Agents
Odin is a production-ready Python library that transforms how AI agents navigate knowledge graphs. It combines structural graph algorithms (Personalized PageRank), semantic plausibility scoring (NPLL), and pattern detection to guide agents toward high-signal discoveries in complex, multi-domain graphs.
Built for: Healthcare analytics, fraud detection, regulatory compliance, supply chain intelligence, and any domain where autonomous agents need to discover patterns in graphs with 10K-5M entities.
The Problem
AI agents exploring knowledge graphs face three critical challenges:
- Exponential Path Growth - A 3-hop exploration from a single node in a densely connected graph can generate 100K+ paths, most of which are noise
- Semantic Invalidity - Naive traversal follows edges that violate domain logic (e.g.,
Patient → diagnosed_by → Medication) - No Prioritization - Without ranking, agents waste turns analyzing low-value paths while missing critical patterns
Traditional approaches fail:
- BFS/DFS: Exponential explosion, no signal filtering
- Fixed Cypher Queries: Only finds patterns you already know exist
- Random Walk: No convergence guarantees, wasted compute
- LLM Prompting Alone: Hallucinates relationships, can't verify graph structure
The Solution
Odin provides a guided exploration framework that acts as a navigation compass for agents:
┌─────────────┐ ┌──────────────────────────────────────┐ ┌─────────────────┐
│ Seed │ -> │ PPR → Beam Search → NPLL → Aggregate │ -> │ Ranked Paths + │
│ Entities │ │ (Odin Engine) │ │ Patterns + Score│
└─────────────┘ └──────────────────────────────────────┘ └─────────────────┘
Key Components:
| Component | Purpose | Impact |
|---|---|---|
| Personalized PageRank (PPR) | Identifies structurally important nodes as starting points | Reduces search space by 80% |
| Beam Search | Efficiently explores top-K paths at each hop | Prevents exponential explosion |
| NPLL (Neural Probabilistic Logic) | Scores edge plausibility using learned rules from your graph | Filters 60-90% of semantically invalid paths |
| Motif Detection | Surfaces recurring patterns (e.g., "A→B→C appears 47 times") | Automatic anomaly detection |
| Triage Scoring | 0-100 importance ranking for agent prioritization | Focuses agent compute on high-signal areas |
Results: 10x faster exploration, 5x more relevant discoveries, agents focus on high-signal regions.
Quick Start
Installation
# From PyPI (recommended)
pip install odin-kg-engine
# From source
git clone https://github.com/prescott-data/odin-kg-engine.git
cd odin-kg-engine
pip install -e .
Requirements:
- Python 3.9+
- ArangoDB 3.10+ (or compatible graph database)
- PyTorch 2.0+ (for NPLL)
5-Minute Integration
from arango import ArangoClient
from odin import OdinEngine
# 1. Connect to your knowledge graph
client = ArangoClient(hosts="http://localhost:8529")
db = client.db("my_database", username="user", password="pass")
# 2. Initialize Odin (auto-trains NPLL from your graph on first run)
engine = OdinEngine(db=db, community_id="my_community")
# 3. Explore from seed entities
result = engine.retrieve(
seeds=["entity/claim_123", "entity/provider_456"],
max_paths=50,
hop_limit=3,
)
# 4. Access scored paths
print(f"Found {len(result['paths'])} paths")
print(f"Triage Score: {result['triage']['score']}/100")
for path in result['paths'][:5]:
nodes = " → ".join(path['nodes'])
print(f" [{path['score']:.2f}] {nodes}")
Output Example:
Found 47 paths
Triage Score: 87/100
[0.94] claim_123 → billed_by → provider_456 → flagged_in → audit_07
[0.89] claim_123 → has_diagnosis → sepsis_dx → rare_in → nursing_home_cluster
[0.82] provider_456 → prescribed → medication_999 → contraindicated_with → patient_history
Self-Managing Intelligence
Odin automatically manages its NPLL model lifecycle:
- First Run: Extracts edge patterns from your graph, trains NPLL model (~2-5 minutes)
- Stores Weights: Saves learned parameters in ArangoDB collection (
NPLLWeights) - Subsequent Runs: Loads weights from DB and rebuilds model in ~30 seconds
No separate ML pipeline, no .pt files, no DevOps overhead. Just initialize OdinEngine and it handles everything.
Core Features
1. Intelligent Path Finding
# Start from suspicious entities, let Odin find connections
result = engine.retrieve(
seeds=["claim/CLM_99285"],
max_paths=100,
hop_limit=4,
)
# Returns scored paths with:
# - Structural importance (PPR)
# - Semantic plausibility (NPLL)
# - Pattern frequency (motif detection)
2. Edge Plausibility Scoring
# Validate specific relationships
score = engine.score_edge(
head="entity/patient_001",
relation="treated_by",
tail="entity/doctor_smith"
)
# Returns: 0.0 (impossible) to 1.0 (highly plausible)
# Use in agent decision loops
if score > 0.7:
agent.investigate_further(path)
3. Anchor Node Discovery
# Find most important nodes in your graph
anchors = engine.find_anchors(
seeds=["community/insurance_claims"],
topn=20
)
# Returns top-N nodes by PageRank for targeted exploration
4. Pattern Detection
# Automatic motif extraction
motifs = result['aggregates']['motifs']
for motif in motifs:
print(f"{motif['pattern']} appears {motif['count']} times")
# Example output:
# Claim → has_policyholder → Person (47 instances)
# Provider → billed_by → Claim → flagged_in → Audit (12 instances)
Architecture
Odin is composed of four layers working in concert:
┌─────────────────────────────────────────────────────────────────────────────┐
│ ODIN ENGINE │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ RETRIEVAL ORCHESTRATOR │ │
│ │ Coordinates PPR → Beam Search → NPLL → Aggregation pipeline │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Graph │ │ PPR Engine │ │ NPLL Model │ │ Aggregators │ │
│ │ Accessor │ │ (Anchors) │ │ (Confidence) │ │ (Motifs) │ │
│ │ + Cache │ │ │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
Component Details:
| Layer | Responsibility | Performance |
|---|---|---|
| Graph Accessor | LRU-cached graph queries | <10ms avg per node fetch |
| PPR Engine | Compute importance scores | ~200ms for 1K nodes |
| Beam Search | Multi-hop path exploration | 50-500ms depending on beam width |
| NPLL Confidence | Semantic edge filtering | ~5ms per edge (cached) |
| Aggregators | Pattern extraction | ~100ms post-retrieval |
Typical End-to-End Latency: 300-800ms for 50-path retrieval
Use Cases
1. Healthcare Fraud Detection
Scenario: Find providers billing unusual procedure combinations
engine = OdinEngine(db, community_id="medicare_claims")
result = engine.retrieve(
seeds=["provider/high_volume_clinic"],
max_paths=100,
)
# Odin surfaces: "CPT_99285 + CPT_office_visit" pattern in 34 claims
2. Supply Chain Risk Analysis
Scenario: Identify cascading supplier dependencies
result = engine.retrieve(
seeds=["supplier/critical_vendor"],
hop_limit=5, # Deep supply chain exploration
)
# Discovers: Tier-3 supplier affects 47 downstream products
3. Regulatory Compliance Checks
Scenario: Validate entity relationships against compliance rules
score = engine.score_edge(
head="entity/investment_fund",
relation="managed_by",
tail="entity/sanctioned_entity"
)
if score > 0.5:
compliance_agent.flag_for_review()
Documentation
| Document | Description |
|---|---|
| Architecture | Complete technical design (873 lines) |
| Agent Integration Guide | How to integrate with AI agents (189 lines) |
| API Reference | Full method signatures and parameters |
API Reference
OdinEngine
from odin import OdinEngine
engine = OdinEngine(
db: StandardDatabase, # ArangoDB connection
community_id: str = "global", # Scope for exploration
cache_size: int = 5000, # LRU cache size
auto_train: bool = True, # Auto-train NPLL if needed
community_mode: str = "none" # "none" | "mapping"
)
Methods:
retrieve(seeds, max_paths, hop_limit, **kwargs)
Find and score paths from seed entities.
Parameters:
seeds: List[str]- Starting entity IDs (e.g.,["entity/123"])max_paths: int = 50- Maximum paths to returnhop_limit: int = 3- Maximum hops from seedsbeam_width: int = 10- Top-K paths to explore at each hop
Returns:
{
"paths": [
{
"nodes": ["entity/A", "entity/B", "entity/C"],
"edges": [{"relation": "relates_to", "score": 0.89}],
"score": 0.94
}
],
"triage": {"score": 87, "confidence": "high"},
"aggregates": {
"motifs": [{"pattern": "A→B→C", "count": 12}],
"node_frequencies": {"entity/A": 47}
}
}
score_edge(head, relation, tail)
Score plausibility of a single edge.
Parameters:
head: str- Source entity IDrelation: str- Edge typetail: str- Target entity ID
Returns: float (0.0-1.0)
find_anchors(seeds, topn)
Get top-N most important nodes by PageRank.
Parameters:
seeds: List[str]- Seed entities for personalized PageRanktopn: int = 20- Number of top nodes to return
Returns: List[Tuple[str, float]] - (entity_id, ppr_score)
retrain_model(force_retrain=True)
Force NPLL model retraining (use after major graph updates).
Performance
| Metric | Value | Notes |
|---|---|---|
| Graph Scale | 10K-5M entities | Tested on healthcare, insurance, supply chain |
| Typical Latency | 300-800ms | For 50-path retrieval with 3-hop limit |
| Cache Hit Rate | >80% | After warm-up period |
| NPLL Training | 2-5 minutes | First run, then cached |
| NPLL Loading | ~30 seconds | Subsequent runs |
| Memory | ~500MB-2GB | Depends on cache size and graph density |
Tested Deployments:
- Healthcare KG: 2.3M entities, 8.7M edges (avg 450ms retrieval)
- Insurance Claims: 850K entities, 4.1M edges (avg 320ms retrieval)
- Supply Chain: 450K entities, 2.3M edges (avg 280ms retrieval)
Testing
# Run all tests (62 passing)
pytest tests/ -v
# Unit tests only
pytest tests/unit/ -v
# Integration tests
pytest tests/integration/ -v
# With coverage report
pytest tests/ --cov=odin --cov=npll --cov=retrieval --cov-report=html
Contributing
We welcome contributions! Areas of interest:
- Scalability: Optimizations for graphs >10M entities
- Algorithms: Alternative PPR implementations, new aggregators
- Database Support: Neo4j, Neptune adapters
- Benchmarks: Academic dataset comparisons
License
MIT License - see LICENSE for details.
Authors
Prescott Data
- Muyukani Kizito - Lead Engineer
- Elizabeth Nyambere - NPLL & GNN Research
Citation
If you use Odin in academic work, please cite:
@software{odin_kg_engine,
title={Odin: Graph Intelligence for Autonomous AI Agents},
author={Prescott Data},
year={2026},
url={https://github.com/prescott-data/odin-kg-engine}
}
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file odin_engine-0.1.0.tar.gz.
File metadata
- Download URL: odin_engine-0.1.0.tar.gz
- Upload date:
- Size: 137.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d0cb8d40f255f818c7bc9fc3d3d6128d05f7bf1c20a497d9328cc57a07887b2
|
|
| MD5 |
d44b6fc515cb05873052d126e82cd431
|
|
| BLAKE2b-256 |
9e77564a756480e4d1bc46ddd46ab86973a8bf5fc4c550019c2eda9a5e071885
|
File details
Details for the file odin_engine-0.1.0-py3-none-any.whl.
File metadata
- Download URL: odin_engine-0.1.0-py3-none-any.whl
- Upload date:
- Size: 145.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf5cb755aae168f51b12513176341f915bfd4e52ab2bb341daa80e2422da1b63
|
|
| MD5 |
2d4d16bd359b84d49796cc5c59de23e7
|
|
| BLAKE2b-256 |
951d465d0d1b3a9081d2db2f3222a788c8930a579afb59b7bd83683630239313
|