Extract knowledge graphs from source code repositories. Rank relevant nodes with Personalized PageRank for LLM context. No LLM dependency — bring your own model.
Project description
code2graph
Turn a source code repository into a queryable knowledge graph — no LLM required.
code2graph statically extracts the structure of a codebase — files, modules, functions, classes, calls, dependencies, schemas, infrastructure — as a typed graph of nodes and edges. Rank the most relevant nodes for any query with Personalized PageRank and pass focused context to any LLM.
Pure Python. No LLM dependency. Bring your own model.
Quick start
pip install codebase2graph
# Extract full graph from a repo
codebase2graph /path/to/repo --graph all --output repo.graph.json
# Python call graph only
codebase2graph /path/to/repo --graph call --output calls.graph.json
# With actionable summary
codebase2graph /path/to/repo --graph all \
--output repo.graph.json \
--summary-output repo.summary.json
from code2graph import build_graph
graph = build_graph("/path/to/repo", graph_type="all")
# graph.nodes — list of Node objects
# graph.edges — list of Edge objects
Why graph-based code context?
| Approach | What you lose |
|---|---|
| Dump entire codebase into prompt | Token budget, focus |
| Embed + search file chunks | Call relationships, module structure, dependency chains |
| code2graph | Nothing — relationships are explicit labeled edges |
The graph knows that auth.login() calls db.query(), which imports connection_pool, which depends on config.DATABASE_URL. Flat file chunks don't.
Graph types
| Type | What it extracts |
|---|---|
folder |
Repo, folder, file nodes with contains edges |
call |
Functions/methods with calls and defines edges (Python, JS, TS) |
entity |
Classes, functions, constants with defines and imports edges |
schema |
Database tables, columns, foreign keys (SQL, ORM models) |
workflow |
CI/CD pipelines, GitHub Actions, Makefile targets |
infra |
Dockerfiles, docker-compose, Terraform, Kubernetes manifests |
security |
Hardcoded secrets patterns, dangerous function calls, exposed endpoints |
web |
React/Vue components, routes, API endpoints |
android |
Activities, services, permissions from AndroidManifest.xml |
decision |
ADR-style architecture decisions |
all |
Merged graph from all applicable extractors |
codebase2graph /path/to/repo --graph call --output call.graph.json
codebase2graph /path/to/repo --graph schema --output schema.graph.json
codebase2graph /path/to/repo --graph infra --output infra.graph.json
codebase2graph /path/to/repo --graph all --output full.graph.json
Installation
pip install codebase2graph
No extra dependencies required — all graph types work with the standard install.
Python API
Build a graph
from code2graph import build_graph, Graph, Node, Edge
# Full graph
graph: Graph = build_graph("/path/to/repo", graph_type="all")
# Specific type
call_graph = build_graph("/path/to/repo", graph_type="call")
schema_graph = build_graph("/path/to/repo", graph_type="schema")
Inspect results
print(f"{len(graph.nodes)} nodes, {len(graph.edges)} edges")
# Filter by kind
functions = [n for n in graph.nodes if n.attributes.get("kind") == "function"]
calls = [e for e in graph.edges if e.label == "calls"]
Export
import json
# To dict
d = {"nodes": [vars(n) for n in graph.nodes], "edges": [vars(e) for e in graph.edges]}
json.dump(d, open("graph.json", "w"), indent=2)
Graph output format
{
"nodes": [
{
"id": "function:auth.login",
"label": "login",
"attributes": {
"kind": "function",
"module": "auth",
"file": "src/auth.py",
"line": 42
},
"content": "def login(username, password): ..."
}
],
"edges": [
{
"id": "edge:auth.login:calls:db.query",
"from": "function:auth.login",
"to": "function:db.query",
"label": "calls"
}
],
"current_node_id": "repo"
}
CLI reference
codebase2graph <repo> [options]
Arguments:
repo Path to the repository root
Options:
--graph TYPE Graph type: folder, call, entity, schema, workflow,
infra, security, web, android, decision, all
(default: all)
--output PATH Write graph JSON to this file (default: stdout)
--pretty Pretty-print JSON output
--summary-output PATH Write graph summary JSON (entrypoints, fan-in/out nodes)
--update-existing PATH Update an existing graph JSON in place
--update-summary-output PATH
Write update diff summary JSON
-h, --help Show help
Update mode
Rebuild a graph from the current repository state while preserving stable node IDs and custom attributes added outside code2graph:
codebase2graph /path/to/repo --graph all \
--update-existing repo.graph.json \
--update-summary-output repo.update.json
Update mode removes stale nodes/edges for deleted or changed code, adds new nodes/edges, and keeps stable IDs for nodes that haven't changed. Custom attributes on existing nodes are preserved.
Use cases
- Code review — extract call graph before/after a PR to see what changed structurally
- LLM code assistance — pass ranked subgraph as context instead of dumping whole files
- Dependency analysis — find all callers of a function, all modules depending on a service
- Security audit — detect hardcoded secrets, dangerous API patterns, exposed endpoints
- Architecture docs — extract infra + schema + decision graphs for living documentation
- Onboarding — give a new developer a ranked subgraph of the most important entry points
Design principles
- Pure Python — no LLM, no cloud, no database required
- Deterministic — same repository state always produces the same graph
- Static analysis only — no code execution, safe to run on any codebase
- Works with any model — output is plain JSON; pass to GPT-4, Claude, Llama, or any other model
- Companion to docs2graph — same node/edge schema, combine code and documentation graphs
Related projects
| Package | What it does |
|---|---|
| docs2graph | Documents → knowledge graph (same node/edge schema) |
| graph2sql | Graph-based schema analysis for text-to-SQL |
Contributing
See CONTRIBUTING.md.
git clone https://github.com/jw-open/code2graph
cd code2graph
pip install -e ".[dev]"
pytest tests/ -v
License
Apache-2.0 — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codebase2graph-0.1.0.tar.gz.
File metadata
- Download URL: codebase2graph-0.1.0.tar.gz
- Upload date:
- Size: 80.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b0e3a24299c365b20a628e1f1ad5abc8a3c22845772ef2778b8164b01888bf7
|
|
| MD5 |
4654dcf4f5e5eec563b53a796559d9fc
|
|
| BLAKE2b-256 |
0c32d44682240f1fdf1c48f0d30ad7c781956dcf97d23ae3dcc6e0ee7c9ecafd
|
File details
Details for the file codebase2graph-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codebase2graph-0.1.0-py3-none-any.whl
- Upload date:
- Size: 70.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c104221cc7539c7320d0ebb00618e1e27fc6987d233b7eb5892b521c299c0c77
|
|
| MD5 |
b56c8b873d75d0b00f1149968230238e
|
|
| BLAKE2b-256 |
db07bc1055dce8798a32335a317b15b84387feb833866c674dd55e26db3ca7dc
|