A tool to build a searchable knowledge graph from Python repositories
Project description
PyCodeKG — A Deterministic Knowledge Graph for Python Codebases with Semantic Indexing and Source-Grounded Snippet Packing
Author: Eric G. Suchanek, PhD Flux-Frontiers, Liberty TWP, OH
Overview
PyCodeKG constructs a deterministic, explainable knowledge graph from a Python codebase using static analysis. The graph captures structural relationships — definitions, calls, imports, and inheritance — directly from the Python AST, stores them in SQLite, and augments retrieval with vector embeddings via LanceDB.
No inference required. The CLI is fully useful as a standalone analysis tool — every result is derived from structure, not generated. When used with AI agents, PyCodeKG gives them structurally-grounded answers: precise callers, real call chains, exact line numbers. Hallucination-resistant by design.
Structure is treated as ground truth; semantic search is strictly an acceleration layer. The result is a searchable, auditable representation of a codebase that supports precise navigation, contextual snippet extraction, and downstream reasoning without hallucination.
PyCodeKG uses the same architecture as DocKG but targets Python source code rather than document corpora.
Features
- Static analysis pipeline — Three-pass AST extraction: structure, call graph, data-flow
- Deterministic knowledge graph — SQLite-backed canonical store with provenance-tracked edges
- Symbol resolution —
RESOLVES_TOedges bridge cross-module call sites via import aliases - Hybrid query model — Semantic seeding (LanceDB embeddings) + structural expansion (graph traversal)
- Source-grounded snippet packing — Definition and call-site snippets with line numbers
- Precise fan-in lookup — Two-phase reverse traversal resolving cross-module caller chains
- Temporal snapshots — Save and diff graph metrics across commits and versions
- MCP server — Nineteen tools for AI agent integration
- Streamlit web app — Interactive graph browser, hybrid query UI, snippet pack explorer
- 3-D visualizer — PyVista/PyQt5 interactive graph explorer with FunnelLayout and timeline view
- Zero-config MCP setup — Single-line installer configures Claude Code, Kilo Code, GitHub Copilot, and Cline
Installation
Requirements: Python ≥ 3.12, < 3.14
# pip
pip install pycode-kg
# With Streamlit web visualizer
pip install 'pycode-kg[viz]'
# With 3-D visualizer (PyVista/PyQt5)
pip install 'pycode-kg[viz3d]'
# Poetry
poetry add pycode-kg
For the one-line skill installer (MCP config, Claude slash commands, git hooks) see docs/INSTALLATION.md.
Quick Start
# Index your repo (SQLite + LanceDB in one step)
pycodekg build --repo /path/to/repo
# Natural-language query
pycodekg query "authentication flow"
# Source-grounded snippet pack — paste straight into an LLM prompt
pycodekg pack "database connection setup" --format md --out context.md
# Full architectural analysis
pycodekg analyze /path/to/repo
Usage
Build and query
pycodekg build --repo . # full build (SQLite + LanceDB)
pycodekg build --repo . --include-dir src # index a specific subtree
pycodekg query "snapshot freshness comparison" # hybrid semantic + structural search
pycodekg pack "graph build pipeline" --format md # snippet pack for LLM context
Analyze codebase health
pycodekg analyze . # full report + JSON snapshot
Snapshots
pycodekg snapshot save 0.18.0 # capture current metrics
pycodekg snapshot list # list all snapshots
pycodekg snapshot diff <key_a> <key_b> # compare two versions
Visualize
pycodekg viz # Streamlit web app
pycodekg viz3d --layout funnel # 3-D PyVista explorer
pycodekg viz-timeline # metric history timeline
Full flag reference: docs/INSTALLATION.md · Query patterns: docs/CHEATSHEET.md
What Agents Say
From independent assessments run against PyCodeKG's own codebase. See assessments/ for full reports.
"PyCodeKG compresses a multi-step workflow — semantic search, graph expansion, caller tracing, snippet retrieval, and architectural summarization — into a small set of tools that are fast to invoke and easy to chain. In practice, it let me move from broad orientation to intent-driven discovery and then to structural validation without dropping down into manual grep or repeated file reads." — GPT-5 (via Cline)
"What sets it apart from 'search the repo with embeddings' tools is the structural layer… Verdict: 4.5/5 — recommend without reservation for any non-trivial Python codebase." — Claude Opus 4.7
"PyCodeKG is dramatically more effective than traditional grep/file-reading workflows. Unique value: hybrid search combining natural-language intent with precise structural relationships." — Claude Haiku 4.5
"
pack_snippets()provided source excerpts around each hit, making the code instantly readable. Context lines and relevance metadata obviated manual file open." — Raptor Mini
Citation
If you use PyCodeKG in your research or project, please cite it:
APA
Suchanek, E. G. (2026). PyCodeKG: Semantic Knowledge Graph for Python Codebases (Version 0.18.0) [Software]. Flux-Frontiers. https://doi.org/10.5281/zenodo.19834777
BibTeX
@software{suchanek_pycode_kg,
author = {Suchanek, Eric G.},
title = {{PyCodeKG}: Semantic Knowledge Graph for Python Codebases},
version = {0.18.0},
year = {2026},
publisher = {Flux-Frontiers},
url = {https://github.com/Flux-Frontiers/pycode_kg},
doi = {10.5281/zenodo.19834777},
}
License
Elastic License 2.0 — free for non-commercial and internal use; commercial redistribution or hosting requires a license from Flux-Frontiers.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycode_kg-0.18.2.tar.gz.
File metadata
- Download URL: pycode_kg-0.18.2.tar.gz
- Upload date:
- Size: 171.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d6f4b40a1039b943a4cd9a0e06db8e894c176619e419d2b29ca223f37aa7b26
|
|
| MD5 |
7739cc9a37cd248cc30fea6e5074033d
|
|
| BLAKE2b-256 |
04897099041902f3ebc93860cbca1a76019561eaa16c4cffed8c46a60c514618
|
File details
Details for the file pycode_kg-0.18.2-py3-none-any.whl.
File metadata
- Download URL: pycode_kg-0.18.2-py3-none-any.whl
- Upload date:
- Size: 196.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dd42dd383a303a09bd1851e5deccd062a26deccd6dcb3912f5b87e4544c4a5d
|
|
| MD5 |
ab8ac62f541a7bc09d4927dfbddb228a
|
|
| BLAKE2b-256 |
9bf839c71f890047d2257b8a7dab5b88605c0c4bd2402ae1991f6a0dc383f570
|