Central, multi-repo code knowledge graph for AI agents — Neo4j + Tree-sitter + MCP.

These details have not been verified by PyPI

Project description

central-code-knowledge-graph

Stop re-reading. Start querying.

AI coding tools re-read your entire codebase on every task. ckg fixes that. One server indexes every repo in your org with Tree-sitter across 26 languages, stores the structural map as a Neo4j property graph, keeps it fresh via incremental ingest + webhooks, and serves precise context to your AI assistant via MCP so it reads only what matters.

One server that:

ingests many repositories (not just one) and keeps them incrementally fresh
stores them as a Neo4j property graph (File, Class, Function, Module + CONTAINS, DEFINES, HAS_METHOD, CALLS, IMPORTS)
exposes REST, GraphQL, MCP/JSON-RPC, and a ckg CLI
supports structural queries (callers, callees, imports, blast radius, downstream dependencies), full-text search, and semantic vector search
generates an architecture map with coupling warnings (cyclic deps, god modules, SDP violations) every ingest
secures every endpoint with scoped API tokens (argon2id-hashed)
runs as a single docker compose up

Why

Need	How this server delivers
Rock-solid, won't fall over	Stateless API + workers; Neo4j/Postgres/Redis run with healthchecks + `restart: unless-stopped`; horizontal scale via `--scale worker=N`
Fast relationship search for AI agents	Native graph DB (Cypher) + Lucene FTS + vector index — all in Neo4j
Multi-language	Tree-sitter parsers (23): Python, JS/TS (incl. JSX/TSX → React, Angular), Rust, Go, Java, Ruby, C, C++, C#, Kotlin, Scala, Swift, PHP, Solidity, Dart, R, Perl, Lua, Zig, PowerShell, Julia, Nix. Extraction wrappers (3): Vue, Svelte (delegates `<script>` to JS/TS), Jupyter/Databricks `.ipynb` (concatenates code cells, dispatches by kernel). Pluggable — one file under `ckg/parsers/` adds another language
Precise cross-file edges	Opt-in LSP pass (`CKG_LSP_ENABLED=true`) upgrades CALLS edges with language-server-resolved targets. Pyright today; rust-analyzer / gopls / ts-server / jdtls planned. Graph stays functional with no LSP installed
Fast updates	Incremental ingest (`--incremental`): sha-diffs files against the graph, only re-parses what changed. Full reparse stays available as `--full`
Context for AI tools	Built-in MCP HTTP server → Cursor, VS Code, Claude Code drop in
Two query surfaces	REST (`/v1/`) for simple calls + GraphQL* (`/v1/graphql`) for composed traversals; both use the same API token
Per-token usage analytics	Every authenticated request is logged into `api_calls`; the `/integrations` page surfaces 24-h totals, top callers, top endpoints with p95 latency, and a live tail
CLI for automation	`ckg` Typer CLI: register, ingest, query, search
Spec-driven	Auto-generated OpenAPI at `/docs`; GraphiQL UI at `/v1/graphql`; ADRs under `docs/adr/`
Whole-codebase index	One Neo4j graph spans all registered repos
Neo4j-backed	Functions, classes, files, imports, calls all stored as labeled nodes + typed relationships
Secure	API tokens with scopes (`admin`, `repo:write`, `repo:read`); hashed at rest

Supported languages

Tree-sitter parsers (23): Python · JavaScript (incl. JSX → React) · TypeScript (incl. TSX → Angular) · Rust · Go · Java · Ruby · C · C++ · C# · Kotlin · Scala · Swift · PHP · Solidity · Dart · R · Perl · Lua · Zig · PowerShell · Julia · Nix

Extraction wrappers (3): Vue & Svelte SFCs (delegate <script> to JS/TS) · Jupyter / Databricks .ipynb (concatenate code cells, dispatch by kernel language)

Pluggable — adding another language is one file under ckg/parsers/ and one line in the registry.

Architecture

                      ┌──────────────┐
   AI agents ───MCP──▶│              │
   CLI (ckg) ──REST──▶│   FastAPI    │──▶ Auth (API tokens, scopes)
   Web UI ────GQL───▶ │              │──▶ Audit log
                      └──────┬───────┘
                             │
            ┌────────────────┼─────────────────────────────┐
            ▼                ▼                             ▼
     ┌────────────┐   ┌─────────────┐              ┌───────────────┐
     │ Neo4j 5    │   │ Postgres    │              │ Redis         │
     │ graph +    │   │ repos +     │              │ cache + queue │
     │ vector +   │   │ tokens +    │              └───────┬───────┘
     │ FTS        │   │ runs +      │                      │
     └────────────┘   │ audit       │              ┌───────▼───────┐
                      └─────────────┘              │ Celery workers│
                                                   │  - clone      │
                                                   │  - parse      │
                                                   │  - embed      │
                                                   │  - write graph│
                                                   └───────┬───────┘
                                                           │
                                                   ┌───────▼───────┐
                                                   │ Tree-sitter   │
                                                   │ parsers       │
                                                   │ Py / JS / TS  │
                                                   │ (Rust/Ruby/   │
                                                   │  Go/Java soon)│
                                                   └───────────────┘

Full design rationale: docs/adr/0001-architecture.md.

Quickstart

1. Prerequisites

	Required	Notes
Docker	Docker Desktop (macOS / Windows) or Docker Engine + Compose v2 (Linux)	Must be running before step 3. Confirm with `docker info`.
RAM	8 GB free	Neo4j wants 2 GB, sentence-transformers ~500 MB on first warmup
Disk	~3 GB free	Base images (Neo4j, Postgres, Redis, Python, Node) total ~2 GB. Plus your repo clones under the `repo_data` volume.
Network	Outbound HTTPS	First boot pulls images from Docker Hub + npm + PyPI
Python	3.11+ (host)	Only if you want to install the CLI on your laptop. Not needed otherwise — `make up` runs everything in containers.

2. Clone and configure

git clone https://github.com/ajankurjain/central-code-knowledge-graph.git
cd central-code-knowledge-graph
cp .env.example .env

Replace every change-me-* in .env with strong randoms — the snippet below generates a full, ready-to-go .env for local dev in one shot:

python3 - <<'PY'
import secrets, base64, os
subs = {
    "change-me-please-bootstrap-token": secrets.token_urlsafe(32),
    "change-me-please-fernet-key":     base64.urlsafe_b64encode(os.urandom(32)).decode(),
    "change-me-neo4j-password":        secrets.token_urlsafe(24),
    "change-me-postgres-password":     secrets.token_urlsafe(24),
}
env = open(".env").read()
for k, v in subs.items():
    env = env.replace(k, v)
open(".env", "w").write(env)
PY
chmod 600 .env

⚠️ Keep .env out of git — it's already in .gitignore, the pre-commit hook (scripts/audit-secrets.sh) refuses any commit that contains it.

3. Start the stack

Make sure Docker Desktop is running first (docker info should succeed), then pick one of two paths:

Fast path — pull pre-built images from GitHub Container Registry (linux/amd64 + linux/arm64, signed with the release tag):

docker compose pull           # pulls api / worker / web from ghcr.io
make up                       # starts the stack

Pin a specific release with CKG_IMAGE_TAG=0.1.5 docker compose pull; the default is latest. Browse all tags at github.com/ajankurjain/central-code-knowledge-graph/pkgs/container/central-code-knowledge-graph%2Fapi.

Local-dev path — build from the checkout (use this when you've edited code):

make up
# or: docker compose up -d --build

First boot takes 5–10 minutes when building locally — it pulls ~2 GB of base images and builds the api / worker / web / beat images. Subsequent make up runs are ~10 seconds. The pre-built ghcr.io path skips the build phase entirely.

Confirm everything came up healthy:

docker compose ps
# all containers should show "running" and (healthy):
# ckg-api-1, ckg-beat-1, ckg-neo4j-1, ckg-postgres-1, ckg-redis-1, ckg-web-1, ckg-worker-1

Health check from outside:

curl http://localhost:8080/readyz
# {"ready":true,"checks":{"neo4j":true,"postgres":true,"redis":true},"version":"0.1.6"}

URLs:

Service	URL
Web UI	http://localhost:3000
API (Swagger UI)	http://localhost:8080/docs
GraphQL (GraphiQL)	http://localhost:8080/v1/graphql
Neo4j Browser	http://localhost:7474 (login `neo4j` / value of `NEO4J_PASSWORD` from `.env`)
Postgres	`localhost:5433` (mapped off default port to avoid clashes)
Redis	`localhost:6379`

4. Sign in

Grab the bootstrap token from .env:

grep ^CKG_BOOTSTRAP_TOKEN .env | cut -d= -f2-

Then either:

a) Use the web UI — open http://localhost:3000/login, paste the token, click Sign in. The Dashboard lights up.

b) Use the ckg CLI:

# From PyPI (light install — CLI only, talks to the Docker server):
pip install central-code-knowledge-graph

# Or pipx for an isolated install:
pipx install central-code-knowledge-graph

# Or editable install from a checkout for development:
pip install -e '.[dev]'

# Then:
export CKG_SERVER=http://localhost:8080
ckg login --token "$(grep ^CKG_BOOTSTRAP_TOKEN .env | cut -d= -f2-)"
ckg status        # should print graph counts

The bootstrap token has admin scope and is meant for one-time setup — mint a scoped token and use that going forward:

ckg token create my-laptop --scope repo:read --scope repo:write
# copy the printed `ckg_…` token, then:
ckg login --token ckg_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

5. Ingest your first repo

# Pick a local repo to index. `file:///abs/path` clones-in-place, no network.
ckg repo register my-repo file:///Users/you/code/my-repo --branch main

# Run a full ingest.
ckg repo ingest    my-repo --full
ckg repo runs      my-repo                 # watch progress; a small repo finishes in seconds

# Verify the graph populated.
ckg graph stats
# → {"nodes": 3288, "edges": 6676, "repos": 1, "files": 80}  (example)

# Search.
ckg search keyword  "ingest pipeline"
ckg search semantic "where do we parse Tree-sitter trees?"

# Structural queries.
ckg graph callers    my-repo my.module.foo --depth 2
ckg graph blast      my-repo src/foo/bar.py        # files that break if bar.py changes
ckg graph downstream my-repo src/foo/bar.py        # files bar.py depends on

You can do the same from the web UI under Repos → Register → fill the form, then click ingest Δ or full reparse. Watch progress on the repo detail page (auto-refreshes while a run is in flight).

5b. Or pull an entire org / group / workspace at once

Paste a single URL — GitHub org/user, GitLab group/user, Bitbucket workspace, or a JSON/YAML manifest — and ckg discovers every accessible repo, registers them, and queues a full ingest for each.

# Public org, anonymous
ckg source add https://github.com/orgs/anthropics

# Private org with a Personal Access Token (example: read from env)
export CKG_SOURCE_TOKEN="$GH_PAT"
ckg source add https://github.com/orgs/acme --include-forks

# GitLab group (incl. subgroups)
ckg source add https://gitlab.com/groups/gitlab-org

# Bitbucket workspace (token format: "username:app-password")
ckg source add https://bitbucket.org/atlassian --token "$BB_USER:$BB_APP_PASSWORD"

# Manifest URL (JSON or YAML list)
ckg source add https://example.com/all-repos.yaml

ckg source list          # see what you've added
ckg source repos 1       # repos discovered for source 1
ckg source sync 1        # re-discover; queues ingests for newly-added repos
ckg source delete 1 --yes  # CASCADE — drops every repo + graph data this source created

PATs are encrypted at rest with Fernet (key in CKG_SECRET_KEY). They never appear in repos.url — the worker injects them into the clone URL at fetch time.

5c. Keep the graph fresh — polling + webhooks

Two ways to keep ingested repos up-to-date without manual triggers. Polling uses a Celery Beat scheduler (one extra Compose service); webhooks are push-driven by GitHub / GitLab / Bitbucket.

# Polling
ckg source schedule 1 30m       # re-discover source 1 every 30 minutes
ckg repo   poll     my-repo 5m  # incremental ingest of my-repo every 5 minutes

# Webhooks (returns the secret + receiver URL — paste both into the provider)
ckg source webhook  1 --enable

Provider setup:

Provider	Where	Field
GitHub	repo / org Settings → Webhooks	`Payload URL` = `<your-server>/v1/webhooks/<source_id>`; `Content type: application/json`; `Secret` = the printed value; tick just the `push` event
GitLab	project Settings → Webhooks	`URL` = same as above; `Secret token` = the printed value; tick Push events
Bitbucket	workspace Webhooks → Add	`URL` = `<your-server>/v1/webhooks/<source_id>?secret=<paste>`; trigger on Repository push

GitHub uses HMAC-SHA256 of the body, GitLab a shared-token header, Bitbucket Cloud the URL-embedded secret. The same /v1/webhooks/<id> endpoint detects the provider from headers automatically.

6. Browse it in the UI

Open http://localhost:3000, paste an API token, and explore:

Dashboard — node/edge/repo/file counts + an Integrations row (API calls 24h, top caller, sources, tokens) + paginated repo list with filter
Repos — register repos, queue incremental or full ingests, watch run status
Sources — paste a GitHub org / GitLab group / Bitbucket workspace / manifest URL and bulk-add every repo it exposes; per-source live progress bar + editable branch override + recent-failure disclosure
Integrations — one screen: backend health (Neo4j / Postgres / Redis), copy-paste connection URLs for MCP / GraphQL / REST, connected bulk sources, active AI-client tokens, and 24-hour usage analytics (top callers, top endpoints with p95 latency, live request tail)
Search — keyword (Lucene FTS) or semantic (vector) across all (or one) repos
Graph — force-directed call graph; pick a repo to see its top-20 most-connected functions as click-to-fill entry points, drill into callers + callees up to depth 4
Architecture — auto-generated module map (Louvain on file-level call/import edges, Maven/Gradle layout aware, directory-tree fallback for thin graphs) + coupling-smell warnings (cyclic deps, god modules, SDP violations)

The UI is a static Next.js bundle served from the web container; the browser hits the API directly using the bearer token kept in localStorage.

7. Hook up your editor

Editor	Guide
Cursor	integrations/cursor/README.md
VS Code (Copilot Chat / Cline / Roo Code)	integrations/vscode/README.md
Claude Code	integrations/claude-code/README.md

Day-2 operations

make logs                 # tail every service
make restart              # restart api + worker only
docker compose stop       # park everything; data volumes persist
make up                   # bring it back
make clean                # WARNING: removes volumes — wipes graph + Postgres
make psql                 # psql shell inside the postgres container
make neo4j-shell          # cypher-shell inside the neo4j container

Contributing — pre-push check

CI runs ruff + pytest on every push. Run the same gate locally before you push so you don't bounce off red builds:

make check        # lint + tests, same as CI
make lint         # ruff only
make test         # pytest only

One-time install of the git hook that runs make check automatically before every push (no-op when the diff is README-only):

make install-hooks

Bypass for a single push (e.g. README hotfix while the stack is down): SKIP_CKG_PREPUSH=1 git push.

Troubleshooting

Things that bit me during local setup — keep this open the first time you run.

Symptom	Diagnosis / fix
`docker: command not found`	Docker Desktop isn't on PATH. macOS shortcut: `export PATH="/Applications/Docker.app/Contents/Resources/bin:$PATH"`.
`docker info` fails / "Cannot connect to the Docker daemon"	Docker Desktop is installed but not running. Launch the Docker app and wait ~10s.
`make up` errors with `neo4j password required`	You skipped step 2 — `.env` doesn't exist (or still has `change-me-*` placeholders for the strict-required vars). Re-run the Python one-liner in step 2.
`ckg-web-1` stays in Created state and never starts	The image was never built. Run `docker compose build web && docker compose up -d web`.
`ckg-neo4j-1` flaps Restarting with `Unrecognized setting. No declared setting with name: PASSWORD`	Old compose file. Pull main — fixed in v0.1.1 by renaming the healthcheck env vars to `CKG_HEALTHCHECK_*`.
API container loops with `TypeError: APIRouter.__init__() got an unexpected keyword argument 'graphiql'`	strawberry-graphql renamed the arg. Fixed in v0.1.1. Pull main.
Worker / beat crash with `exec: "celery": executable file not found in $PATH`	Dockerfile didn't install `[server]` extras. Fixed in v0.1.1. Pull main + `docker compose build --no-cache worker beat`.
Ingest reports `files_skipped` for every file, `files_parsed: 0`	tree-sitter-language-pack 1.x compatibility issue. Fixed in v0.1.1 by pinning to 0.7-0.9. Pull main + rebuild api/worker.
GitHub README badge stuck on a stale version	GitHub's camo proxy caches images by URL. Bump the URL slightly (e.g. change `cacheSeconds=N` to a different N) to force a refetch.
Forgot the bootstrap token	`grep ^CKG_BOOTSTRAP_TOKEN .env \| cut -d= -f2-`
Want to wipe the graph and start over	`make clean && make up && python … (regenerate .env)`. Note: this also drops the Postgres data, so all minted tokens go too.
Forgot which port is which	All ports are configurable via `.env` (`CKG_API_PORT`, `CKG_WEB_PORT`). Defaults: 8080 / 3000 / 7474 (Neo4j) / 5433 (Postgres) / 6379 (Redis).
Run integration tests against the live stack	`docker compose exec api pytest tests/integration/ -q` (after `make up`).

What the graph looks like

(Repo)-[:CONTAINS]->(File)-[:DEFINES]->(Class)-[:HAS_METHOD]->(Function)
                          -[:DEFINES]->(Function)-[:CALLS]->(Function)
                          -[:IMPORTS]->(Module|File)

Function nodes carry a embedding vector property indexed for cosine similarity. Names + docs feed Lucene full-text indexes. So one Cypher store answers all three styles of query (structural / keyword / semantic).

API surface (short)

Full reference: docs/api.md.

Verb	Path	Purpose
`GET`	`/healthz`	Liveness
`GET`	`/readyz`	Readiness (per-store)
`POST`	`/v1/tokens`	Mint a token (admin)
`GET`	`/v1/tokens`	List tokens (admin)
`DELETE`	`/v1/tokens/{id}`	Revoke (admin)
`POST`	`/v1/repos`	Register a repo
`GET`	`/v1/repos`	List repos
`POST`	`/v1/repos/{id}/ingest`	Queue ingest
`GET`	`/v1/repos/{id}/runs`	Ingest history
`GET`	`/v1/graph/stats`	Graph counts
`GET`	`/v1/graph/callers_of`	Transitive callers
`GET`	`/v1/graph/callees_of`	Transitive callees
`GET`	`/v1/graph/imports_of`	Imports for a file
`GET`	`/v1/graph/blast_radius`	Files affected if this file changes (upstream callers)
`GET`	`/v1/graph/downstream_dependencies`	Files this file depends on (outgoing callees)
`GET`	`/v1/graph/file`	Symbols in a file
`GET`	`/v1/graph/entry_points`	Top-N most-connected functions in a repo (`/graph` page suggestions)
`GET`	`/v1/search/keyword`	Lucene FTS
`GET`	`/v1/search/semantic`	Vector cosine
`POST`	`/v1/repos/{id}/architecture`	Recompute cluster map (synchronous, returns `ArchStats`)
`GET`	`/v1/repos/{id}/architecture`	Read clusters + edges + `edge_source`
`GET`	`/v1/repos/{id}/architecture/warnings`	Coupling warnings (high fan-out / cyclic / low cohesion / SDP violation)
`GET`	`/v1/sources`	List bulk sources
`POST`	`/v1/sources`	Register a bulk source (auto-detects kind; supports `default_branch_override`)
`GET`	`/v1/sources/{id}/progress`	Live ingest progress (indexed / queued / running / failed + `recent_failures`)
`PUT`	`/v1/sources/{id}/branch`	Change a source's per-repo branch override
`POST`	`/v1/sources/{id}/sync`	Trigger discovery + ingest queueing
`GET`	`/v1/analytics/summary`	Sources / tokens / repos / ingests counts for the `/integrations` page
`GET`	`/v1/analytics/usage`	24-h API usage: totals, top callers, top endpoints (p95 latency), live tail
`POST`	`/v1/mcp`	MCP JSON-RPC for IDEs
`POST`	`/v1/graphql`	GraphQL endpoint (open in browser for GraphiQL UI)

Releases

Current: v0.1.6 on PyPI · full notes at Releases.

Version	Highlights
v0.1.6	New `/savings` page + dashboard glance row — turns the `api_calls` log into a concrete dollar figure of "tokens AI agents would have spent reading source files but didn't, because they asked ckg instead." Six built-in price cards (Claude Sonnet/Opus/Haiku 4, GPT-4o, GPT-4o mini, Gemini 2.5 Pro), 24h / 7d / 30d window toggle, daily bar chart, breakdown by integration (MCP / GraphQL / REST), by team (API token), and by endpoint. Backed by new `GET /v1/analytics/savings`. Heuristics + price cards live in `ckg/services/savings.py` — only routes that genuinely replace source-reading by an agent (callers_of / blast_radius / search / architecture / mcp / graphql) count; CRUD endpoints contribute zero so dashboard polling can't inflate the numbers. Zero-state banner explains the empty case clearly. New Docker image pipeline: a GitHub Actions workflow (`docker-publish.yml`) builds linux/amd64 + linux/arm64 images of api / worker / web on every v* tag and pushes them to `ghcr.io/ajankurjain/central-code-knowledge-graph/`; `docker-compose.yml` now declares both `image:` and `build:` so `docker compose pull && make up` skips the local build phase entirely. New `make check`* target (ruff + pytest, exactly what CI runs) and `make install-hooks` that installs a repo-managed pre-push hook to stop CI bouncing — runs `make check` automatically when a push touches Python or workflows, no-op for README-only changes.
v0.1.5	New `/integrations` page — one screen showing backend service health (Neo4j / Postgres / Redis / version), copy-paste connection URLs for MCP / GraphQL / REST clients, connected integrations (bulk sources by kind, AI clients with active-token sample, inbound webhooks) and usage analytics. Backed by new `GET /v1/analytics/summary`. New usage tracking: every authenticated, non-meta API request is logged into an indexed `api_calls` table by a FastAPI middleware (skips `/healthz`, `/readyz`, `/v1/analytics/`, `/docs`); `authenticate()` now stashes the resolved Principal on `request.state.principal` so the middleware attributes calls to a token without re-parsing the header. New `GET /v1/analytics/usage` aggregates the last 24 h: total calls, calls/hr, distinct active tokens, error rate, top callers (10), top endpoints (12) with `PERCENTILE_CONT(0.95)` for an honest p95 latency, and a live tail of the last 20 requests. UI panels render every 15 s. New `ckg.prune_api_calls` beat task drops `api_calls` rows older than 7 days hourly. Dashboard* got a new Integrations row centred on usage: API calls 24h (with calls/hr + error-rate tone), top caller, sources, API tokens. `ckg.__version__` now reads from `importlib.metadata` so `/readyz` reports the actual installed version (was stuck at `0.1.0`).
v0.1.4	`POST /v1/repos/{id}/architecture` now runs synchronously in the API handler and returns the full `ArchStats` payload — the prior Celery-queued flow could sit unprocessed for minutes behind a backlog of `ingest_repo` tasks. New `PermanentIngestError` is caught specifically in the worker and not retried (a misconfigured branch no longer burns three queue slots over 90 s). Operator-selectable branch on the Add Source form + a new `PUT /v1/sources/{id}/branch` endpoint and inline-editable branch column on the existing sources table. Empty checkout (configured branch contains no source) now raises with a clear error message listing the actual branches available on `origin`. `SourceProgress.recent_failures[]` returns up to 5 latest-failed runs with their error text so the `/sources` UI can show "X failed — see reasons" with the actual cause. `_file_dependency_edges` strips a layered Maven/Gradle layout prefix set (`src/main/{java,kotlin,scala,groovy}/`, `src/test/*/`, `src/`, `app/`, `lib/`, `pkg/`, `internal/`) before `/ → .` conversion so Java import edges actually resolve. Directory-tree fallback in `compute_architecture` produces a structural cluster map for any multi-file repo whose call / import graph is too thin to cluster on. `ArchStats.edge_source` (`"calls+imports"` / `"directory_fallback"` / `"no_files"`) is persisted on each `Cluster.edge_source` and surfaced via the GET response so the UI can render the appropriate caveat banner. `Warning.detail` is JSON-encoded into `detail_json` on write and decoded back on read — fixes the `CypherTypeError: Property values can only be of primitive types or arrays thereof` crash that previously killed every compute that produced warnings. `/arch` page reworked into a real state machine with elapsed-time spinner, in-line error display, and "click to set right branch" actionable banners when the result is `no_files` or `clusters: 0`.
v0.1.3	Self-hosted GitLab support (`gitlab_instance` kind + bring-your-own-base-URL). Worker now scrubs `<scheme>://user:pw@…` userinfo and bare GitHub / GitLab PAT shapes before persisting clone errors, so a failed `git clone https://oauth2:glpat-…@host/path` no longer leaks the token into `ingest_runs.error`. `credentialed_clone_url_for_repo` falls back to a host-aware token injector when the source's `kind` is unknown to the running worker. Per-source live progress bar on `/sources` driven by a new `GET /v1/sources/{id}/progress` endpoint (indexed / queued / running / failed counts + last-sync / last-ingest timestamps + 5 s auto-poll while in flight). New `ckg.reconcile_stuck_ingests` beat task every 60 s — re-publishes orphaned `queued` rows (DB-vs-broker desync) and reaps zombie `running` rows so the progress bar never silently freezes. Searchable repo combobox on `/repos`, `/arch`, `/graph` with `INDEXED` / `NOT INDEXED` badges + refresh icon. Dashboard and `/repos` table now paginated (10 / 25 per page) with id / url / language filter. `/graph` shows top-20 most-connected functions as click-to-fill entry points when no qname is set — backed by `GET /v1/graph/entry_points` — plus a back-to-functions button. `tree-sitter>=0.25.2,<0.26` (csharp grammar v15 ABI). `python` parser detects `async` either as the node type OR as a child keyword. `lua` parser covers modern (`function_declaration`, `local_function`) and legacy node types.
v0.1.2	Per-repo PAT for cloning private repos (`POST /v1/repos {token}` + `PUT …/credentials` + Credentials panel in the web UI + `ckg repo register --token` / `ckg repo credentials`). Idempotent `ADD COLUMN IF NOT EXISTS` migration so existing installs pick up new columns on restart. Per-row ingest feedback on the /repos page. Runs-table error column now collapsible with full text. Web register form auto-slugifies with live preview. `tree-sitter>=0.24,<0.25` (csharp grammar v15).
v0.1.1	Live-verified runtime fixes: Dockerfile installs `[server]` extras so `celery` is on the worker's PATH; Neo4j healthcheck creds exposed via `CKG_HEALTHCHECK_` (was conflicting with Neo4j's `NEO4J_`-as-setting parsing); strawberry-graphql `graphql_ide=` arg compatibility; `tree-sitter-language-pack` pinned to the 0.x line where the parser objects still expose `.parse()`.
v0.1.0	Initial release.

Upgrade:

pip install --upgrade central-code-knowledge-graph
# or, in a Docker checkout:
git pull && make build && make restart

Roadmap

Phase 1 — Foundation, auth, Python/JS/TS ingest, REST + MCP, CLI
Phase 2 — Incremental updates (per-file sha diff), GraphQL endpoint, Rust/Go/Java/Ruby parsers
Phase 3 — C/C++ parsers; opt-in LSP precision pass (pyright today; rust-analyzer / gopls / ts-server / jdtls planned)
Phase 4 — Next.js web UI: token login, dashboard, repo management, search (keyword + semantic), force-directed function call-graph viz
Phase 5 — Multi-tenant orgs/users, k8s/Helm, OpenTelemetry, Neo4j Causal Cluster

Development

Backend:

pip install -e '.[dev]'
pytest -q
ruff check ckg

Web UI:

cd web
npm install --legacy-peer-deps
NEXT_PUBLIC_CKG_API=http://localhost:8080 npm run dev
# open http://localhost:3000

Project layout:

ckg/
├── api/        # FastAPI app + routes (REST + GraphQL + MCP)
├── auth.py     # API tokens, principal, scopes
├── cli/        # `ckg` Typer CLI
├── config.py   # Pydantic settings
├── db/         # neo4j / postgres / redis clients + schema
├── lsp/        # Opt-in LSP precision pass (Phase 3)
├── parsers/    # tree-sitter parsers, one per language
├── services/   # ingest, embeddings, lsp_resolve
└── worker/     # Celery app + tasks
web/            # Next.js 15 + Tailwind + react-force-graph-2d (Phase 4)
docker/         # API + worker + web Dockerfiles
docs/           # ADRs, deployment, API
integrations/   # cursor / vscode / claude-code MCP snippets
tests/          # pytest

Security

API tokens are 32-byte URL-safe random strings prefixed ckg_, never stored in plaintext — only argon2id hashes are persisted.
The bootstrap token (.env) is your only way in on day 0; rotate it immediately after minting a scoped token.
All non-health endpoints require a token; CORS is restricted to CKG_CORS_ORIGINS.
.env is git-ignored. Do not commit it. Do not paste tokens into chats.

If you find a security issue, please open a private vulnerability report on GitHub.

Pre-commit credential audit

A small audit script refuses to commit credentials, IDE-assistant configs (.claude/, CLAUDE.md, .mcp.json, .cursor/, .continue/, .aider*, .windsurf/), or files matching common secret patterns (GitHub PAT, OpenAI key, AWS access key, Slack token, JWT, PEM private key):

./scripts/audit-secrets.sh

# install as a git pre-commit hook (recommended):
ln -sf ../../scripts/audit-secrets.sh .git/hooks/pre-commit

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.6

May 12, 2026

0.1.5

May 12, 2026

0.1.4

May 12, 2026

0.1.3

May 12, 2026

0.1.2

May 12, 2026

0.1.1

May 12, 2026

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

central_code_knowledge_graph-0.1.6.tar.gz (144.6 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

central_code_knowledge_graph-0.1.6-py3-none-any.whl (153.6 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file central_code_knowledge_graph-0.1.6.tar.gz.

File metadata

Download URL: central_code_knowledge_graph-0.1.6.tar.gz
Upload date: May 12, 2026
Size: 144.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for central_code_knowledge_graph-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`a0babddce52dca1c31156e26d9c3f771b1ebc790de03a917b0b879e8f2a416de`
MD5	`aed215528f6639514499e213154b9e41`
BLAKE2b-256	`017eb4ae7c56528f7e2f1fc315d5bc14c268eee2ed02e4481275724439e562a0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for central_code_knowledge_graph-0.1.6.tar.gz:

Publisher: publish.yml on ajankurjain/central-code-knowledge-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: central_code_knowledge_graph-0.1.6.tar.gz
- Subject digest: a0babddce52dca1c31156e26d9c3f771b1ebc790de03a917b0b879e8f2a416de
- Sigstore transparency entry: 1519929794
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: ajankurjain/central-code-knowledge-graph@8850c91e55fe5e91b8d2707cfbba6c28858adc55
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/ajankurjain
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8850c91e55fe5e91b8d2707cfbba6c28858adc55
- Trigger Event: push

File details

Details for the file central_code_knowledge_graph-0.1.6-py3-none-any.whl.

File metadata

Download URL: central_code_knowledge_graph-0.1.6-py3-none-any.whl
Upload date: May 12, 2026
Size: 153.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for central_code_knowledge_graph-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56d56b7b3e55cfbebabb55a3cc37227ab3d8618efdedbe455b24c5af090e0305`
MD5	`06099c1a3419208489863710d719e359`
BLAKE2b-256	`0e375ad7fd0b5463433ba3d59fc6364eb814952a7bc545190cb0ed94520a72ce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for central_code_knowledge_graph-0.1.6-py3-none-any.whl:

Publisher: publish.yml on ajankurjain/central-code-knowledge-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: central_code_knowledge_graph-0.1.6-py3-none-any.whl
- Subject digest: 56d56b7b3e55cfbebabb55a3cc37227ab3d8618efdedbe455b24c5af090e0305
- Sigstore transparency entry: 1519929811
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: ajankurjain/central-code-knowledge-graph@8850c91e55fe5e91b8d2707cfbba6c28858adc55
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/ajankurjain
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8850c91e55fe5e91b8d2707cfbba6c28858adc55
- Trigger Event: push

central-code-knowledge-graph 0.1.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

central-code-knowledge-graph

Why

Supported languages

Architecture

Quickstart

1. Prerequisites

2. Clone and configure

3. Start the stack

4. Sign in

5. Ingest your first repo

5b. Or pull an entire org / group / workspace at once

5c. Keep the graph fresh — polling + webhooks

6. Browse it in the UI

7. Hook up your editor

Day-2 operations

Contributing — pre-push check

Troubleshooting

What the graph looks like

API surface (short)

Releases

Roadmap

Development

Security

Pre-commit credential audit

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance