Turn any codebase, docs, or images into a queryable knowledge graph

Project description

graphify

A Claude Code skill. Type /graphify in Claude Code — it reads your files, builds a knowledge graph, and gives you back structure you didn't know was there.

Andrej Karpathy keeps a /raw folder where he drops papers, tweets, screenshots, and notes. The problem: that folder becomes opaque. You forget what's in it. You can't see what connects. graphify is the answer to that problem.

/graphify ./raw

.graphify/
├── obsidian/        open as Obsidian vault — visual graph, wikilinks, filter by community
├── GRAPH_REPORT.md  what the graph found: god nodes, surprising connections, suggested questions
├── graph.json       persistent graph — query it weeks later without re-reading anything
├── cache/           per-file SHA256 cache — re-runs only process changed files
└── memory/          Q&A results filed back in — what you ask grows the graph on next --update

Why this exists

graphify takes that observation and builds the missing infrastructure:

His problem	What graphify adds
Folder becomes opaque	Community detection surfaces structure automatically
Forget what's in it	Persistent `graph.json` — query weeks later without re-reading
Can't see connections	Cross-community surprising connections as a first-class output
Claude hallucinates missing links	`EXTRACTED` / `INFERRED` / `AMBIGUOUS` — honest about what was found vs guessed
Context resets every session	Memory feedback loop — what you ask grows the graph on `--update`
Only works on text	PDFs, images, screenshots, tweets, any language via vision

What LLMs get wrong without it: Naive summarization fills every gap confidently. You get output that sounds complete but you can't tell what was actually in the files vs invented. And next session, it's all gone.

What graphify does differently:

Persistent graph — relationships stored in .graphify/graph.json, survive across sessions. Query weeks later without re-reading anything.
Honest audit trail — every edge tagged EXTRACTED (explicitly stated), INFERRED (call-graph or reasonable deduction), or AMBIGUOUS (flagged for review). You always know what was found vs invented.
Cross-document surprise — Leiden community detection finds clusters, then surfaces cross-community connections: the things you would never think to ask about directly.
Feedback loop — every query answer saved to .graphify/memory/. On next --update, that Q&A becomes a node. The graph grows from what you ask, not just what you add.

The result: a navigable map of your corpus that is honest about what it knows and what it guessed.

Install

pip install graphify && graphify install

This copies the skill file into ~/.claude/skills/graphify/ and registers it in ~/.claude/CLAUDE.md. The Python package and all dependencies install automatically on first /graphify run — you never touch pip again.

Then open Claude Code in any directory and type:

/graphify .

Manual install (curl)

Step 1 — copy the skill file

mkdir -p ~/.claude/skills/graphify
curl -fsSL https://raw.githubusercontent.com/safishamsi/graphify/v1/skills/graphify/skill.md \
  > ~/.claude/skills/graphify/SKILL.md

Step 2 — register it in Claude Code

Add this to ~/.claude/CLAUDE.md (create the file if it doesn't exist):

- **graphify** (`~/.claude/skills/graphify/SKILL.md`) — any input to knowledge graph. Trigger: `/graphify`
When the user types `/graphify`, invoke the Skill tool with `skill: "graphify"` before doing anything else.

Usage

All commands are typed inside Claude Code:

/graphify                          # run on current directory
/graphify ./raw                    # run on a specific folder
/graphify ./raw --mode deep        # more aggressive INFERRED edge extraction
/graphify ./raw --update           # re-extract only changed files, merge into existing graph
/graphify ./raw --watch            # notify when new files appear

/graphify add https://arxiv.org/abs/1706.03762        # fetch a paper, save, update graph
/graphify add https://x.com/karpathy/status/...       # fetch a tweet
/graphify add <url> --author "Karpathy" --contributor "safi"

/graphify query "what connects attention to the optimizer?"    # BFS — broad context
/graphify query "how does the encoder reach the loss?" --dfs   # DFS — trace a path
/graphify query "..." --budget 1500                            # cap at N tokens

/graphify path "DigestAuth" "Response"      # shortest path between two concepts
/graphify explain "SwinTransformer"         # plain-language node explanation

/graphify ./raw --html             # also export graph.html (browser, no Obsidian needed)
/graphify ./raw --svg              # also export graph.svg (embeds in Notion, GitHub)
/graphify ./raw --neo4j            # generate cypher.txt for Neo4j import
/graphify ./raw --mcp              # start MCP stdio server for agent access

Works with any mix of file types in the same folder:

Type	Extensions	How it's extracted
Code	`.py .ts .tsx .js .go .rs .java .c .cpp .rb .cs .kt .scala .php`	AST via tree-sitter (deterministic) + call-graph pass (INFERRED)
Documents	`.md .txt .rst`	Concepts + relationships via Claude
Papers	`.pdf`	Citation mining + concept extraction
Images	`.png .jpg .webp .gif .svg`	Claude vision — screenshots, charts, whiteboards, any language

What you get

After running, Claude outputs three things directly in chat:

God nodes — highest-degree concepts (what everything connects through)

Surprising connections — cross-community edges; relationships between concepts in different clusters that you didn't know to look for

Suggested questions — 4-5 questions the graph is uniquely positioned to answer, with the reason why (which bridge node makes it interesting, which community boundary it crosses)

The full GRAPH_REPORT.md adds community summaries with cohesion scores and a list of ambiguous edges for review.

Key files explained

File	Purpose
`GRAPH_REPORT.md`	The audit report. God nodes, surprising connections, community cohesion scores, ambiguous edge list, suggested questions.
`graph.json`	Persistent graph in node-link format. Load it with NetworkX or push to Neo4j. Survives sessions.
`obsidian/`	Wikilink vault. Open in Obsidian → enable graph view → see communities as clusters. Filter by tag, search across everything.
`.graphify/cache/`	SHA256-based per-file cache. A re-run on an unchanged corpus takes seconds.
`.graphify/memory/`	Q&A feedback loop. Every `/graphify query` answer is saved here. Next `--update` extracts it into the graph.

What this skill will NOT do

Won't invent edges — AMBIGUOUS exists so uncertain relationships are flagged, not hidden. If the connection isn't clear, it's tagged, not fabricated.
Won't claim the graph is useful when it isn't — a corpus over 2M words or 200 files gets a cost warning before proceeding.
Won't re-extract unchanged files — SHA256 cache ensures warm re-runs skip everything that hasn't changed.
Won't visualize graphs over 5,000 nodes — use --no-viz or query instead.
Won't download datasets or set up infrastructure — graphify reads your files. What you put in the folder is what it works with.
Won't implement baselines or run experiments — it reads and maps. Analysis is yours.

Design principles

Extraction quality is everything — clustering is downstream of it. A bad graph clusters into bad communities. The AST + call-graph pass exists because deterministic beats probabilistic for code.
Show the numbers — cohesion is 0.91, not "good". Token cost is always printed. You know what you spent.
The best output is what you didn't know — Surprising Connections is not optional. God nodes you probably already suspected. Cross-community edges are what you came for.
The graph earns its complexity — below a certain density, just use Claude directly. The graph adds value when you have more than you can hold in context across sessions.
What you ask grows the graph — query results are filed back in automatically. The corpus is not static.
Honest uncertainty — EXTRACTED, INFERRED, AMBIGUOUS are not cosmetic labels. They are the difference between trusting the graph and being misled by it.

Contributing

Adding worked examples

Worked examples are the most trust-building part of this project. To add one:

Pick a real corpus (people should be able to verify the output)
Run the skill: /graphify <path>
Save the full output to worked/{corpus_slug}/
Write a review.md that honestly evaluates:
- What the graph got right
- What edges it correctly flagged AMBIGUOUS
- Any mistakes or missed connections
- Any surprising connections that were genuinely surprising
Submit a PR with all of the above

Improving extraction

If you find a file type or language where extraction is poor, open an issue with a minimal reproduction case. The best bug reports include: the input file, the extraction output (.graphify/cache/ entry), and what was missed or invented.

Adding domain knowledge

If corpora in your domain consistently contain structures graphify doesn't extract well (e.g., legal documents, lab notebooks, musical scores), open a discussion with examples.

Worked examples

Corpus	Type	Reduction	Eval report
Karpathy repos + 5 research papers + 4 images	Mixed (code + papers + images)	71.5x	`worked/karpathy-repos/review.md`
httpx (Python HTTP client)	Codebase	—	`worked/httpx/review.md` + `GRAPH_REPORT.md`
Mixed corpus (code + paper + Arabic image)	Multi-type	—	`worked/mixed-corpus/review.md`

Each includes the full graph output and an honest evaluation of what the skill got right and wrong.

Tech stack

Layer	Library	Why
Graph	NetworkX	Pure Python, same internals as MS GraphRAG
Community detection	Leiden via graspologic	Better than K-means for sparse graphs
Code parsing	tree-sitter	Multi-language AST, deterministic, zero hallucination
Extraction	Claude (parallel subagents)	Reads anything, outputs structured graph data
Visualization	Obsidian vault	Native graph view, wikilinks, no server needed

No Neo4j required. No dashboards. No server. Runs entirely locally.

Files

graphify/
├── detect.py     detect file types, auto-exclude venvs/caches/node_modules; scan .graphify/memory/
├── extract.py    AST extraction (13 languages via tree-sitter) + call-graph pass (INFERRED edges)
├── build.py      assemble NetworkX graph from extraction JSON; schema-validates before assembly
├── cluster.py    Leiden community detection, cohesion scoring
├── analyze.py    god nodes, bridge nodes, surprising connections, suggested questions, graph diff
├── report.py     render GRAPH_REPORT.md
├── export.py     Obsidian vault, graph.json, graph.html, graph.svg, Neo4j Cypher, Canvas
├── ingest.py     fetch URLs (arXiv, Twitter/X, PDF, any webpage); save Q&A to .graphify/memory/
├── cache.py      SHA256-based per-file extraction cache; check_semantic_cache / save_semantic_cache
├── security.py   URL validation (http/https only), safe fetch with size cap, path guards, label sanitisation
├── validate.py   JSON schema checks on extraction output
├── serve.py      MCP stdio server — query_graph, get_node, get_neighbors, shortest_path, god_nodes
└── watch.py      fs watcher, writes flag file when new files appear

skills/graphify/
└── skill.md      the Claude Code skill — the full pipeline the agent runs step by step

ARCHITECTURE.md   module responsibilities, extraction schema, how to add a language
SECURITY.md       threat model, mitigations, vulnerability reporting
worked/           eval reports from real corpora (karpathy-repos, httpx, mixed-corpus)
tests/            212 tests, one file per module
pyproject.toml    pip install graphify  |  pip install graphify[mcp,neo4j,pdf,watch]

Project details

Release history Release notifications | RSS feed

0.4.19

Apr 17, 2026

0.4.18

Apr 16, 2026

0.4.17

Apr 16, 2026

0.4.16

Apr 16, 2026

0.4.15

Apr 15, 2026

0.4.14

Apr 14, 2026

0.4.13

Apr 14, 2026

0.4.12

Apr 13, 2026

0.4.11

Apr 13, 2026

0.4.10

Apr 13, 2026

0.4.9

Apr 13, 2026

0.4.8

Apr 12, 2026

0.4.7

Apr 12, 2026

0.4.6

Apr 12, 2026

0.4.5

Apr 12, 2026

0.4.4

Apr 12, 2026

0.4.3

Apr 12, 2026

0.4.2

Apr 11, 2026

0.4.1

Apr 10, 2026

0.4.0

Apr 10, 2026

0.3.29

Apr 10, 2026

0.3.28

Apr 10, 2026

0.3.27

Apr 10, 2026

0.3.26

Apr 10, 2026

0.3.25

Apr 9, 2026

0.3.24

Apr 9, 2026

0.3.23

Apr 9, 2026

0.3.22

Apr 9, 2026

0.3.21

Apr 9, 2026

0.3.20

Apr 9, 2026

0.3.19

Apr 9, 2026

0.3.18

Apr 9, 2026

0.3.17

Apr 8, 2026

0.3.16

Apr 8, 2026

0.3.15

Apr 8, 2026

0.3.14

Apr 8, 2026

0.3.13

Apr 8, 2026

0.3.12

Apr 8, 2026

0.3.11

Apr 7, 2026

0.3.10

Apr 7, 2026

0.3.9

Apr 7, 2026

0.3.8

Apr 7, 2026

0.3.7

Apr 7, 2026

0.3.6

Apr 7, 2026

0.3.5

Apr 7, 2026

0.3.4

Apr 7, 2026

0.3.3

Apr 7, 2026

0.3.2

Apr 7, 2026

0.3.1

Apr 6, 2026

0.3.0

Apr 6, 2026

0.2.2

Apr 6, 2026

0.2.1

Apr 6, 2026

0.2.0

Apr 6, 2026

0.1.15

Apr 5, 2026

0.1.14

Apr 5, 2026

0.1.13

Apr 5, 2026

0.1.12

Apr 5, 2026

0.1.11

Apr 5, 2026

0.1.10

Apr 5, 2026

0.1.9

Apr 5, 2026

0.1.8

Apr 5, 2026

0.1.7

Apr 5, 2026

0.1.6

Apr 5, 2026

0.1.5

Apr 5, 2026

0.1.4

Apr 4, 2026

0.1.3

Apr 4, 2026

0.1.2

Apr 4, 2026

This version

0.1.1

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphifyy-0.1.1.tar.gz (77.4 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

graphifyy-0.1.1-py3-none-any.whl (66.1 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file graphifyy-0.1.1.tar.gz.

File metadata

Download URL: graphifyy-0.1.1.tar.gz
Upload date: Apr 4, 2026
Size: 77.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for graphifyy-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2bf93f6144092a5223e843ce21c5b13ec3136251fd19fdc786efc7249e92a68b`
MD5	`a035973f694bc4dc13f85641ee979329`
BLAKE2b-256	`ad3f4462dde7abc6de08ed6c8a4c1086d34846a7c04ac2759b865a38a0dc4a7a`

See more details on using hashes here.

File details

Details for the file graphifyy-0.1.1-py3-none-any.whl.

File metadata

Download URL: graphifyy-0.1.1-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 66.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for graphifyy-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`36fac326a05db5262bc1d8c2c5d5fdf3edf543bff962dc0af1534b4232ff79e6`
MD5	`737e5cc8de343f8b4c420c8660938883`
BLAKE2b-256	`685698f65388c2aef9a40ab47a9169b24bf0e5ddab4096579217efecf6a7703f`

See more details on using hashes here.

graphifyy 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

graphify

Why this exists

Install

Usage

What you get

Key files explained

What this skill will NOT do

Design principles

Contributing

Worked examples

Tech stack

Files

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes