Central, multi-repo code knowledge graph for AI agents — Neo4j + Tree-sitter + MCP.

These details have not been verified by PyPI

Project description

central-code-knowledge-graph

Stop re-reading. Start querying.

AI coding tools re-read your entire codebase on every task. ckg fixes that. One server indexes every repo in your org with Tree-sitter across 26 languages, stores the structural map as a Neo4j property graph, keeps it fresh via incremental ingest + webhooks, and serves precise context to your AI assistant via MCP so it reads only what matters.

One server that:

ingests many repositories (not just one) and keeps them incrementally fresh
stores them as a Neo4j property graph (File, Class, Function, Module + CONTAINS, DEFINES, HAS_METHOD, CALLS, IMPORTS)
exposes REST, GraphQL, MCP/JSON-RPC, and a ckg CLI
supports structural queries (callers, callees, imports, blast radius, downstream dependencies), full-text search, and semantic vector search
generates an architecture map with coupling warnings (cyclic deps, god modules, SDP violations) every ingest
secures every endpoint with scoped API tokens (argon2id-hashed)
runs as a single docker compose up

Why

Need	How this server delivers
Rock-solid, won't fall over	Stateless API + workers; Neo4j/Postgres/Redis run with healthchecks + `restart: unless-stopped`; horizontal scale via `--scale worker=N`
Fast relationship search for AI agents	Native graph DB (Cypher) + Lucene FTS + vector index — all in Neo4j
Multi-language	Tree-sitter parsers (23): Python, JS/TS (incl. JSX/TSX → React, Angular), Rust, Go, Java, Ruby, C, C++, C#, Kotlin, Scala, Swift, PHP, Solidity, Dart, R, Perl, Lua, Zig, PowerShell, Julia, Nix. Extraction wrappers (3): Vue, Svelte (delegates `<script>` to JS/TS), Jupyter/Databricks `.ipynb` (concatenates code cells, dispatches by kernel). Pluggable — one file under `ckg/parsers/` adds another language
Precise cross-file edges	Opt-in LSP pass (`CKG_LSP_ENABLED=true`) upgrades CALLS edges with language-server-resolved targets. Pyright today; rust-analyzer / gopls / ts-server / jdtls planned. Graph stays functional with no LSP installed
Fast updates	Incremental ingest (`--incremental`): sha-diffs files against the graph, only re-parses what changed. Full reparse stays available as `--full`
Context for AI tools	Built-in MCP HTTP server → Cursor, VS Code, Claude Code drop in
Two query surfaces	REST (`/v1/`) for simple calls + GraphQL* (`/v1/graphql`) for composed traversals; both use the same API token
CLI for automation	`ckg` Typer CLI: register, ingest, query, search
Spec-driven	Auto-generated OpenAPI at `/docs`; GraphiQL UI at `/v1/graphql`; ADRs under `docs/adr/`
Whole-codebase index	One Neo4j graph spans all registered repos
Neo4j-backed	Functions, classes, files, imports, calls all stored as labeled nodes + typed relationships
Secure	API tokens with scopes (`admin`, `repo:write`, `repo:read`); hashed at rest

Supported languages

Tree-sitter parsers (23): Python · JavaScript (incl. JSX → React) · TypeScript (incl. TSX → Angular) · Rust · Go · Java · Ruby · C · C++ · C# · Kotlin · Scala · Swift · PHP · Solidity · Dart · R · Perl · Lua · Zig · PowerShell · Julia · Nix

Extraction wrappers (3): Vue & Svelte SFCs (delegate <script> to JS/TS) · Jupyter / Databricks .ipynb (concatenate code cells, dispatch by kernel language)

Pluggable — adding another language is one file under ckg/parsers/ and one line in the registry.

Architecture

                      ┌──────────────┐
   AI agents ───MCP──▶│              │
   CLI (ckg) ──REST──▶│   FastAPI    │──▶ Auth (API tokens, scopes)
   Web UI ────GQL───▶ │              │──▶ Audit log
                      └──────┬───────┘
                             │
            ┌────────────────┼─────────────────────────────┐
            ▼                ▼                             ▼
     ┌────────────┐   ┌─────────────┐              ┌───────────────┐
     │ Neo4j 5    │   │ Postgres    │              │ Redis         │
     │ graph +    │   │ repos +     │              │ cache + queue │
     │ vector +   │   │ tokens +    │              └───────┬───────┘
     │ FTS        │   │ runs +      │                      │
     └────────────┘   │ audit       │              ┌───────▼───────┐
                      └─────────────┘              │ Celery workers│
                                                   │  - clone      │
                                                   │  - parse      │
                                                   │  - embed      │
                                                   │  - write graph│
                                                   └───────┬───────┘
                                                           │
                                                   ┌───────▼───────┐
                                                   │ Tree-sitter   │
                                                   │ parsers       │
                                                   │ Py / JS / TS  │
                                                   │ (Rust/Ruby/   │
                                                   │  Go/Java soon)│
                                                   └───────────────┘

Full design rationale: docs/adr/0001-architecture.md.

Quickstart

1. Prerequisites

Docker Desktop (macOS / Windows) or Docker Engine + Compose v2 (Linux)
8 GB free RAM recommended
Python 3.11+ on the host only if you want the CLI locally

2. Clone and configure

git clone https://github.com/ajankurjain/central-code-knowledge-graph.git
cd central-code-knowledge-graph
cp .env.example .env

Edit .env and replace every change-me-*. Generate strong values with:

python -c "import secrets; print(secrets.token_urlsafe(32))"

3. Start the stack

make up

(or docker compose up -d)

First boot is 1–3 minutes (image pulls + Neo4j schema init).

curl http://localhost:8080/readyz
# {"ready": true, "checks": {"neo4j": true, "postgres": true, "redis": true}, ...}

Open the auto-generated API docs: http://localhost:8080/docs

Open the web UI: http://localhost:3000 (paste an API token to sign in).

4. Install the CLI

From PyPI (recommended — CLI-only, light install):

pip install central-code-knowledge-graph
# or, isolated:
pipx install central-code-knowledge-graph

Or from a checkout for development:

pip install -e .
# Or with everything (server stack + dev tools):
pip install -e '.[dev]'

Then point the CLI at your server and sign in with the bootstrap token:

export CKG_SERVER=http://localhost:8080
ckg login --token "$(grep ^CKG_BOOTSTRAP_TOKEN .env | cut -d= -f2)"

# Mint a real token, then re-login with it:
ckg token create my-laptop --scope repo:read --scope repo:write
ckg login --token ckg_xxxxxxxxxxxxxxxxxxxx

5. Ingest your first repo

ckg repo register my-repo file:///Users/you/code/my-repo --branch main
ckg repo ingest    my-repo
ckg repo runs      my-repo          # watch progress
ckg graph stats
ckg search keyword "ingest pipeline"
ckg search semantic "where do we parse Tree-sitter trees?"
ckg graph callers     my-repo my.module.foo --depth 2
ckg graph blast       my-repo src/foo/bar.py            # what breaks if bar.py changes
ckg graph downstream  my-repo src/foo/bar.py            # what bar.py depends on

5b. Or pull an entire org / group / workspace at once

Paste a single URL — GitHub org/user, GitLab group/user, Bitbucket workspace, or a JSON/YAML manifest — and ckg discovers every accessible repo, registers them, and queues a full ingest for each.

# Public org, anonymous
ckg source add https://github.com/orgs/anthropics

# Private org with a Personal Access Token (example: read from env)
export CKG_SOURCE_TOKEN="$GH_PAT"
ckg source add https://github.com/orgs/acme --include-forks

# GitLab group (incl. subgroups)
ckg source add https://gitlab.com/groups/gitlab-org

# Bitbucket workspace (token format: "username:app-password")
ckg source add https://bitbucket.org/atlassian --token "$BB_USER:$BB_APP_PASSWORD"

# Manifest URL (JSON or YAML list)
ckg source add https://example.com/all-repos.yaml

ckg source list          # see what you've added
ckg source repos 1       # repos discovered for source 1
ckg source sync 1        # re-discover; queues ingests for newly-added repos
ckg source delete 1 --yes  # CASCADE — drops every repo + graph data this source created

PATs are encrypted at rest with Fernet (key in CKG_SECRET_KEY). They never appear in repos.url — the worker injects them into the clone URL at fetch time.

5c. Keep the graph fresh — polling + webhooks

Two ways to keep ingested repos up-to-date without manual triggers. Polling uses a Celery Beat scheduler (one extra Compose service); webhooks are push-driven by GitHub / GitLab / Bitbucket.

# Polling
ckg source schedule 1 30m       # re-discover source 1 every 30 minutes
ckg repo   poll     my-repo 5m  # incremental ingest of my-repo every 5 minutes

# Webhooks (returns the secret + receiver URL — paste both into the provider)
ckg source webhook  1 --enable

Provider setup:

Provider	Where	Field
GitHub	repo / org Settings → Webhooks	`Payload URL` = `<your-server>/v1/webhooks/<source_id>`; `Content type: application/json`; `Secret` = the printed value; tick just the `push` event
GitLab	project Settings → Webhooks	`URL` = same as above; `Secret token` = the printed value; tick Push events
Bitbucket	workspace Webhooks → Add	`URL` = `<your-server>/v1/webhooks/<source_id>?secret=<paste>`; trigger on Repository push

GitHub uses HMAC-SHA256 of the body, GitLab a shared-token header, Bitbucket Cloud the URL-embedded secret. The same /v1/webhooks/<id> endpoint detects the provider from headers automatically.

6. Browse it in the UI

Open http://localhost:3000, paste an API token, and explore:

Dashboard — node/edge/repo/file counts; repo list
Repos — register repos, queue incremental or full ingests, watch run status
Sources — paste a GitHub org / GitLab group / Bitbucket workspace / manifest URL and bulk-add every repo it exposes
Search — keyword (Lucene FTS) or semantic (vector) across all (or one) repos
Graph — force-directed call graph for any function, callers + callees up to depth 4

The UI is a static Next.js bundle served from the web container; the browser hits the API directly using the bearer token kept in localStorage.

7. Hook up your editor

Editor	Guide
Cursor	integrations/cursor/README.md
VS Code (Copilot Chat / Cline / Roo Code)	integrations/vscode/README.md
Claude Code	integrations/claude-code/README.md

What the graph looks like

(Repo)-[:CONTAINS]->(File)-[:DEFINES]->(Class)-[:HAS_METHOD]->(Function)
                          -[:DEFINES]->(Function)-[:CALLS]->(Function)
                          -[:IMPORTS]->(Module|File)

Function nodes carry a embedding vector property indexed for cosine similarity. Names + docs feed Lucene full-text indexes. So one Cypher store answers all three styles of query (structural / keyword / semantic).

API surface (short)

Full reference: docs/api.md.

Verb	Path	Purpose
`GET`	`/healthz`	Liveness
`GET`	`/readyz`	Readiness (per-store)
`POST`	`/v1/tokens`	Mint a token (admin)
`GET`	`/v1/tokens`	List tokens (admin)
`DELETE`	`/v1/tokens/{id}`	Revoke (admin)
`POST`	`/v1/repos`	Register a repo
`GET`	`/v1/repos`	List repos
`POST`	`/v1/repos/{id}/ingest`	Queue ingest
`GET`	`/v1/repos/{id}/runs`	Ingest history
`GET`	`/v1/graph/stats`	Graph counts
`GET`	`/v1/graph/callers_of`	Transitive callers
`GET`	`/v1/graph/callees_of`	Transitive callees
`GET`	`/v1/graph/imports_of`	Imports for a file
`GET`	`/v1/graph/blast_radius`	Files affected if this file changes (upstream callers)
`GET`	`/v1/graph/downstream_dependencies`	Files this file depends on (outgoing callees)
`GET`	`/v1/graph/file`	Symbols in a file
`GET`	`/v1/search/keyword`	Lucene FTS
`GET`	`/v1/search/semantic`	Vector cosine
`POST`	`/v1/mcp`	MCP JSON-RPC for IDEs
`POST`	`/v1/graphql`	GraphQL endpoint (open in browser for GraphiQL UI)

Roadmap

Phase 1 — Foundation, auth, Python/JS/TS ingest, REST + MCP, CLI
Phase 2 — Incremental updates (per-file sha diff), GraphQL endpoint, Rust/Go/Java/Ruby parsers
Phase 3 — C/C++ parsers; opt-in LSP precision pass (pyright today; rust-analyzer / gopls / ts-server / jdtls planned)
Phase 4 — Next.js web UI: token login, dashboard, repo management, search (keyword + semantic), force-directed function call-graph viz
Phase 5 — Multi-tenant orgs/users, k8s/Helm, OpenTelemetry, Neo4j Causal Cluster

Development

Backend:

pip install -e '.[dev]'
pytest -q
ruff check ckg

Web UI:

cd web
npm install --legacy-peer-deps
NEXT_PUBLIC_CKG_API=http://localhost:8080 npm run dev
# open http://localhost:3000

Project layout:

ckg/
├── api/        # FastAPI app + routes (REST + GraphQL + MCP)
├── auth.py     # API tokens, principal, scopes
├── cli/        # `ckg` Typer CLI
├── config.py   # Pydantic settings
├── db/         # neo4j / postgres / redis clients + schema
├── lsp/        # Opt-in LSP precision pass (Phase 3)
├── parsers/    # tree-sitter parsers, one per language
├── services/   # ingest, embeddings, lsp_resolve
└── worker/     # Celery app + tasks
web/            # Next.js 15 + Tailwind + react-force-graph-2d (Phase 4)
docker/         # API + worker + web Dockerfiles
docs/           # ADRs, deployment, API
integrations/   # cursor / vscode / claude-code MCP snippets
tests/          # pytest

Security

API tokens are 32-byte URL-safe random strings prefixed ckg_, never stored in plaintext — only argon2id hashes are persisted.
The bootstrap token (.env) is your only way in on day 0; rotate it immediately after minting a scoped token.
All non-health endpoints require a token; CORS is restricted to CKG_CORS_ORIGINS.
.env is git-ignored. Do not commit it. Do not paste tokens into chats.

If you find a security issue, please open a private vulnerability report on GitHub.

Pre-commit credential audit

A small audit script refuses to commit credentials, IDE-assistant configs (.claude/, CLAUDE.md, .mcp.json, .cursor/, .continue/, .aider*, .windsurf/), or files matching common secret patterns (GitHub PAT, OpenAI key, AWS access key, Slack token, JWT, PEM private key):

./scripts/audit-secrets.sh

# install as a git pre-commit hook (recommended):
ln -sf ../../scripts/audit-secrets.sh .git/hooks/pre-commit

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

May 12, 2026

0.1.5

May 12, 2026

0.1.4

May 12, 2026

0.1.3

May 12, 2026

0.1.2

May 12, 2026

This version

0.1.1

May 12, 2026

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

central_code_knowledge_graph-0.1.1.tar.gz (92.3 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

central_code_knowledge_graph-0.1.1-py3-none-any.whl (124.0 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file central_code_knowledge_graph-0.1.1.tar.gz.

File metadata

Download URL: central_code_knowledge_graph-0.1.1.tar.gz
Upload date: May 12, 2026
Size: 92.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for central_code_knowledge_graph-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5ba8ef2c3ea118b3d5764e4e1d04a27faa2f567ea294c83cb11b0707d71e3ee1`
MD5	`888cd0afd34ad1409a3f620a0d15d31e`
BLAKE2b-256	`49d818e2a387779355b6a1d9a327dddd1f6e603559f1792447b17e0c182a0f04`

See more details on using hashes here.

Provenance

The following attestation bundles were made for central_code_knowledge_graph-0.1.1.tar.gz:

Publisher: publish.yml on ajankurjain/central-code-knowledge-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: central_code_knowledge_graph-0.1.1.tar.gz
- Subject digest: 5ba8ef2c3ea118b3d5764e4e1d04a27faa2f567ea294c83cb11b0707d71e3ee1
- Sigstore transparency entry: 1516493284
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: ajankurjain/central-code-knowledge-graph@8576fd7a14ef372a09b9918ff71134911f147f13
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ajankurjain
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8576fd7a14ef372a09b9918ff71134911f147f13
- Trigger Event: push

File details

Details for the file central_code_knowledge_graph-0.1.1-py3-none-any.whl.

File metadata

Download URL: central_code_knowledge_graph-0.1.1-py3-none-any.whl
Upload date: May 12, 2026
Size: 124.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for central_code_knowledge_graph-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bef5fab15522227771f1e96fd2aed121b5122181a65bb6c9bbfc75001cb61055`
MD5	`2d249125dc23a78cbbb20a194cdaf18d`
BLAKE2b-256	`c00c1ced8d937a29fe787d9caa5523eb3d8c9c49369e26aebc87d4e04b736858`

See more details on using hashes here.

Provenance

The following attestation bundles were made for central_code_knowledge_graph-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ajankurjain/central-code-knowledge-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: central_code_knowledge_graph-0.1.1-py3-none-any.whl
- Subject digest: bef5fab15522227771f1e96fd2aed121b5122181a65bb6c9bbfc75001cb61055
- Sigstore transparency entry: 1516493934
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: ajankurjain/central-code-knowledge-graph@8576fd7a14ef372a09b9918ff71134911f147f13
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ajankurjain
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8576fd7a14ef372a09b9918ff71134911f147f13
- Trigger Event: push

central-code-knowledge-graph 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

central-code-knowledge-graph

Why

Supported languages

Architecture

Quickstart

1. Prerequisites

2. Clone and configure

3. Start the stack

4. Install the CLI

5. Ingest your first repo

5b. Or pull an entire org / group / workspace at once

5c. Keep the graph fresh — polling + webhooks

6. Browse it in the UI

7. Hook up your editor

What the graph looks like

API surface (short)

Roadmap

Development

Security

Pre-commit credential audit

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance