Orchestration layer for the KGRAG(tm) components.
Project description
KGRAG — Knowledge Compiler and Federated Retrieval Layer for Ontologically Grounded Domains
Patent Pending — The Knowledge Compiler concept and its execution are the subject of a pending U.S. provisional patent application.
Author: Eric G. Suchanek, PhD · Flux-Frontiers, Liberty TWP, OH
Overview
KGRAG is a federation and orchestration layer for structural knowledge graphs derived from heterogeneous source domains. It integrates PyCodeKG (Python codebase analysis), DocKG (semantic document indexing), MetaboKG (metabolic pathways), DiaryKG (personal diary corpora), AgentKG (conversational memory), FTreeKG (file system trees), and a growing family of domain-specific backends under a single five-method adapter protocol.
KGRAG treats derived structure as ground truth and uses semantic embeddings strictly as an acceleration layer for locating entry points into that structure. All graph traversal, ranking, and snippet extraction is deterministic. When KGRAG output is passed to a language model for synthesis, the model receives verified facts with full source provenance — not approximate embeddings.
KG Types
Fully Implemented
| Kind | Backend | Description |
|---|---|---|
code |
PyCodeKG | Python codebase — AST-extracted modules, classes, functions, call graphs |
doc |
DocKG | Document corpus — Markdown/RST/text indexed by topic, section, and entity |
meta |
MetaboKG | Metabolic pathways — biochemical reaction networks (KEGG, BioCyc) |
diary |
DiaryKG | Personal diary entries — timestamped chunk graphs with temporal edges |
agent |
AgentKG | Conversational memory — Turn/Topic/Task/Summary graph (live session) |
filetree |
FTreeKG | File system tree — directory/file/module/dependency structure |
memory |
MemoryKG | Episodic memory — hybrid semantic + structural graph for conversation/event corpora |
Stub Adapters (protocol boundary, backends under development)
| Kind | Backend | Description |
|---|---|---|
gutenberg |
GutenbergKG | Project Gutenberg book corpus — literature indexed by author, genre, and chapter |
ia |
IABookKG | Internet Archive book corpus — public-domain books indexed by genre and topic |
pdbfile |
— | PDB structure files — 3D atomic coordinates and protein metadata |
disulfide |
— | Disulfide bond data — cysteine connectivity in protein structures |
verse |
— | Scripture/verse — Book → Chapter → Verse hierarchy and cross-references |
person |
— | Personal knowledge — biographical and relational graphs |
legal |
— | Legal corpus — statutory codes and regulations (TBD) |
Corpus Abstractions
Generic Corpus — A named collection of any KG instances grouped for scoped federated queries. Useful for project-level or thematic groupings (e.g., "KGRAG_repos" combining code + doc KGs).
Person Corpus — A corpus enriched with personal metadata representing an individual. Aggregates all KGs relevant to a person — diaries, memories, documents, agent sessions, and more — alongside structured personal data (birth year, address, email, contact info).
Features
- Multi-domain federation — Query code, docs, metabolic pathways, diary entries, and conversation history simultaneously
- Five-method adapter protocol —
is_available,query,pack,stats,analyze; add a new domain by implementing five methods - Unified registry — Persistent SQLite-backed storage of KG locations, metadata, corpora, and person records
- Corpus abstraction — Group KGs into named corpora for scoped federated queries
- Person corpus — Model individuals with personal metadata and their associated KG collections
- Hybrid querying — Semantic seeding via LanceDB + structural BFS traversal
- Context packing — Extract source-grounded snippets with line numbers for direct LLM ingestion
- MCP server — 22 tools exposing registry, corpus, and person operations to any MCP-compatible agent
- CLI tooling — Full CRUD for KGs, corpora, and person corpora; query, pack, analyze, synthesize
- Streamlit dashboard — Interactive browser for exploring and querying registered knowledge graphs
- Deterministic retrieval — Auditable, source-grounded results; zero hallucination at the knowledge layer
Quick Start
1. Install KGRAG
pip install 'kg-rag @ git+https://github.com/Flux-Frontiers/KGRAG.git'
# With Streamlit dashboard
pip install 'kg-rag[viz] @ git+https://github.com/Flux-Frontiers/KGRAG.git'
2. Register a Knowledge Graph
# Register a Python codebase (requires pycode-kg built in that repo)
kgrag register my-code code /path/to/my-repo
# Register a document corpus (requires doc-kg built in that repo)
kgrag register my-docs doc /path/to/docs-repo
# Register a diary corpus
kgrag register pepys-diary diary /path/to/diary-repo
3. Query Your Graphs
# Federated query across all registered KGs
kgrag query "authentication flow"
# Federated snippet pack for LLM ingestion
kgrag pack "database connection setup" --out context.md
# Scope to a specific corpus
kgrag query "disulfide bond patterns" --scope my-corpus
kgrag pack "journal entries about travel" --scope alice
4. Launch the Dashboard
kgrag viz
CLI Reference
Registry Management
| Command | Description |
|---|---|
kgrag register <name> <kind> <path> |
Register a KG instance |
kgrag unregister <name> |
Remove a KG from the registry |
kgrag list [--kind <kind>] |
List all registered KGs |
kgrag info <name> |
Show detailed info for a KG |
kgrag status [--stats] |
Check health and live stats |
kgrag init |
Interactively register a new KG |
Query & Analysis
| Command | Description |
|---|---|
kgrag query <q> [--kind <kind>] [--scope <name>] |
Federated semantic query |
kgrag pack <q> [--kind <kind>] [--scope <name>] [--out <file>] |
Snippet pack for LLM |
kgrag analyze <name> |
Full analysis report for one KG |
kgrag synthesize <q> |
KG-grounded synthesis via local LLM (Ollama) |
Corpus Management
| Command | Description |
|---|---|
kgrag corpus create <name> |
Create a named corpus |
kgrag corpus add <corpus> <kg> |
Add a KG to a corpus |
kgrag corpus remove <corpus> <kg> |
Remove a KG from a corpus |
kgrag corpus list |
List all corpora |
kgrag corpus query <name> <q> |
Query within a corpus |
kgrag corpus pack <name> <q> |
Snippet pack within a corpus |
Person Corpus Management
| Command | Description |
|---|---|
kgrag person create <name> |
Create a person corpus |
kgrag person add <person> <kg> |
Add a KG to a person corpus |
kgrag person update <name> [--email ...] [--notes ...] |
Update personal metadata |
kgrag person query <name> <q> |
Query across a person's KGs |
kgrag person pack <name> <q> |
Snippet pack for a person |
Server & Integration
| Command | Description |
|---|---|
kgrag mcp |
Launch MCP server (stdio transport) |
kgrag viz |
Launch Streamlit dashboard |
kgrag hooks install |
Install pre-commit snapshot hook |
MCP Integration
Launch the MCP server:
kgrag mcp
The server exposes 22 tools to any MCP-compatible agent (Claude Code, Cursor, GitHub Copilot, Cline, Claude Desktop):
Registry tools:
| Tool | Description |
|---|---|
kgrag_stats() |
Registry summary: KG count, kinds, built status |
kgrag_list([kind]) |
List registered KG entries |
kgrag_info(name) |
Full detail for a single KG entry |
kgrag_query(q, [k, kinds]) |
Federated semantic query, JSON result |
kgrag_pack(q, [k, kinds]) |
Federated snippet pack, Markdown output |
Corpus tools: kgrag_corpus_list, kgrag_corpus_info, kgrag_corpus_create, kgrag_corpus_delete, kgrag_corpus_add, kgrag_corpus_remove, kgrag_corpus_query, kgrag_corpus_pack
Person tools: kgrag_person_list, kgrag_person_info, kgrag_person_create, kgrag_person_delete, kgrag_person_add, kgrag_person_remove, kgrag_person_update, kgrag_person_query, kgrag_person_pack
Architecture
Source Domains
↓
PyCodeKG DocKG MetaboKG DiaryKG AgentKG FTreeKG MemoryKG GutenbergKG IABookKG … (stubs)
SQLite + LanceDB per backend
↓
┌─────────────────────────────────────────────────────────┐
│ KGAdapter (five-method protocol) │
├─────────────────────────────────────────────────────────┤
│ KGRAG Orchestrator · KGRegistry · CorpusRegistry │
│ PersonCorpusRegistry │
└─────────────────────────────────────────────────────────┘
↓ ↓
CLI / Python API MCP Server (stdio)
(query, pack, analyze) (AI agents, Claude Code)
Design Principles
- Derived structure is authoritative — graphs are extracted from formal sources by deterministic programs; embeddings are derived and disposable
- Semantics accelerate; structure decides — vector search locates entry points; BFS traversal determines what is returned
- Every result is traceable — every node carries a stable identifier encoding its origin
- Determinism over approximation — identical inputs produce identical outputs
- Generality through protocol — five adapter methods; no orchestrator changes needed for new domains
- Independence from language models — the full build and query pipeline runs locally without any LLM call
Project Structure
src/kg_rag/
├── orchestrator.py # KGRAG — cross-KG orchestrator
├── registry.py # KGRegistry — SQLite-backed KG registry
├── corpus_registry.py # CorpusRegistry — named corpus groups
├── person_registry.py # PersonCorpusRegistry — person-centric corpora
├── primitives.py # KGKind, KGEntry, CrossHit, CrossSnippet, …
├── embed.py # Embedder abstraction (SentenceTransformer, LlamaCpp)
├── adapters/
│ ├── base.py # KGAdapter ABC (five abstract methods)
│ ├── _stub_adapter.py # StubKGAdapter base for unbuilt backends
│ ├── pycodekg_adaptor.py # CodeKGAdapter (code)
│ ├── dockg_adapter.py # DocKGAdapter (doc)
│ ├── metakg_adapter.py # MetaKGAdapter (meta / MetaboKG)
│ ├── diary_adapter.py # DiaryKGAdapter (diary)
│ ├── agent_adapter.py # AgentKGAdapter (agent)
│ ├── memory_adapter.py # MemoryKGAdapter (memory)
│ ├── gutenberg_adapter.py # stub (gutenberg)
│ ├── ia_adapter.py # stub (ia)
│ ├── disulfide_adapter.py # stub
│ ├── pdbfile_adapter.py # stub
│ ├── verse_adapter.py # stub
│ ├── legal_adapter.py # stub
│ └── person_adapter.py # stub
├── cli/
│ ├── main.py # root Click group
│ ├── cmd_registry.py # register, unregister, list, info, status, init
│ ├── cmd_query.py # query, pack
│ ├── cmd_corpus.py # corpus CRUD + query/pack
│ ├── cmd_analyze.py # analyze
│ ├── cmd_synthesize.py # synthesize (Ollama-grounded)
│ ├── cmd_mcp.py # mcp
│ ├── cmd_viz.py # viz (Streamlit)
│ ├── cmd_hooks.py # hooks install
│ └── cmd_models.py # models (embedder config)
├── mcp_server.py # MCP server (22 tools, stdio transport)
└── app.py # Streamlit dashboard
Installation
Requirements: Python ≥ 3.12, < 3.14
# Core
pip install 'kg-rag @ git+https://github.com/Flux-Frontiers/KGRAG.git'
# With Streamlit dashboard
pip install 'kg-rag[viz] @ git+https://github.com/Flux-Frontiers/KGRAG.git'
# Poetry
poetry add 'kg-rag @ git+https://github.com/Flux-Frontiers/KGRAG.git'
Embedding Backend (ARM / Raspberry Pi)
KGRAG supports llama.cpp-based embedding for low-power deployment. Configure in pyproject.toml:
[tool.kgrag]
embed_backend = "llama"
llama_model_path = "~/.kgrag/bge-small-en-v1.5-Q8_0.gguf"
Related Projects
| Project | Description |
|---|---|
| PyCodeKG | Deterministic knowledge graph for Python codebases |
| DocKG | Semantic knowledge graph for document corpora |
| MetaboKG | Metabolic pathway knowledge graph |
| DiaryKG | Diary and personal journal corpus knowledge graph |
| AgentKG | Conversational memory knowledge graph |
| FTreeKG | File system tree knowledge graph |
| MemoryKG | Episodic memory knowledge graph for conversation and event corpora |
| GutenbergKG | Project Gutenberg book corpus knowledge graph (under development) |
| IABookKG | Internet Archive book corpus knowledge graph (under development) |
License
Elastic License 2.0 — see LICENSE.
Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial internal use is permitted.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kg_rag-0.6.0.tar.gz.
File metadata
- Download URL: kg_rag-0.6.0.tar.gz
- Upload date:
- Size: 100.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6b67c38aaf77e15a01abddce785ef944fc3f35a9791330ea8e7ed4d7ad9ef86
|
|
| MD5 |
a43f025d42df9dc6e1b8cf25c63f86c8
|
|
| BLAKE2b-256 |
ab3857a392823ffd4a143daf93a1bd63e25c091619e0e5147baa319e0996028e
|
File details
Details for the file kg_rag-0.6.0-py3-none-any.whl.
File metadata
- Download URL: kg_rag-0.6.0-py3-none-any.whl
- Upload date:
- Size: 125.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d58b6a2af5df61664f5ed8529fa12225d3d6ad9d24a701c0a1961b65a40fad3
|
|
| MD5 |
2cf9b30d70cf670f87041cd5ccd0a5bc
|
|
| BLAKE2b-256 |
f411a66d2886a5ef007b13318c501eb3d81e09da1173dbe72b95beea42b1791c
|