Code->CPG chunker: tree-sitter symbol + relation extraction, size-capped chunks, ProximaRecord projection. Shared by Victor, ProximaDB SDK, and AnvaiOps.
Project description
victor-codegraph
Shared code → Code-Property-Graph chunker: tree-sitter symbol + relation extraction,
size-capped embeddable chunks, and a ProximaRecord projection. One chunker, three
consumers — Victor (owner), the ProximaDB SDK ([codegraph] extra), and AnvaiOps (SaaS
code-graph vertical).
Design: ProximaDB
ADR-029(authoritative) · VictorADR-014(owner/donor) · AnvaiOpsADR-0018(consumer). This package is the TD-CG1 scaffold.
Why
The same tree-sitter code→symbol+relation chunker existed twice (ProximaDB SDK code.py
and Victor victor-coding) and was about to be written a third time in AnvaiOps. This
package is the single neutral home. It merges the best of both donors and fixes their two
gaps:
- Size-capping — ProximaDB's
code.pyemitted one chunk per symbol with no size bound (a huge function became a huge chunk). Here, oversized symbols are body-split with overlap (LlamaIndexCodeSplitterdiscipline). Seesizing.py. - Real JS/TS — the donor JS/TS parser was a stub returning no symbols. Here JS/TS get a
real tree-sitter extractor (functions, classes, methods,
const … = () =>, imports).
Install
Not yet published to PyPI — use an editable install from the monorepo for now. Consumers
(Victor, the ProximaDB SDK, AnvaiOps) reference it editable until the first victor-codegraph-v*
release is cut.
# dev: editable, with tree-sitter grammars + test tooling
make -C victor-codegraph dev # = pip install -e ../victor-contracts && pip install -e ".[dev]"
# minimal: Python-only (stdlib ast) path, zero native deps
pip install -e ./victor-codegraph
# once published:
# pip install victor-codegraph # Python path
# pip install "victor-codegraph[treesitter]" # + multi-language grammars
Releasing
CI: .github/workflows/ci-codegraph.yml runs the suite (editable install, grammars on) for every
PR touching victor-codegraph/**. Publishing: push a tag victor-codegraph-v0.1.0 to trigger
.github/workflows/release-codegraph.yml, which builds and publishes via PyPI Trusted Publishing
(OIDC — no API token). Configure the publisher once on PyPI (owner vjsingh1984, repo victor,
workflow release-codegraph.yml, environments pypi / testpypi); see the header of that workflow.
Use
from victor_codegraph import chunk, parse, to_proxima_records, ChunkConfig
# Size-capped, embeddable chunks:
chunks = chunk(source, file_path="app/service.py", config=ChunkConfig(max_chunk_tokens=512))
# Symbols + relations:
parsed = parse(source, file_path="app/service.py")
# Project to the ProximaDB substrate-keystone record shape (one symbol = row+node+vector):
records = to_proxima_records(parsed, repo_graph_id="myrepo", branch_id="main",
embedder=my_embed_fn) # embedder optional
Design principles (the "best posture" this encodes)
- Chunk at symbol granularity (not statement, not fixed-size).
- AST-aligned and size-capped — never split mid-statement, never exceed the budget.
- Extract relations (CALLS/EXTENDS/CONTAINS/…) and project to a CPG.
- Deterministic IDs + content hash → idempotent incremental re-index.
- Graceful fallback chain: python-ast → tree-sitter → sliding-window.
- Token budget matched to the embedding model (BGE-small 384-d ≈ 512 tokens).
Status
0.1.0 — TD-CG1 scaffold. Python (stdlib ast) is the primary, fully-offline path.
Multi-language extraction is best-effort via tree-sitter; deeper per-language relation
extraction (the donor parsers' Rust/Go/Java specifics) lands incrementally.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file victor_codegraph-0.1.1.tar.gz.
File metadata
- Download URL: victor_codegraph-0.1.1.tar.gz
- Upload date:
- Size: 23.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fc5e40f246e72f793e960a459db229ea2594584131121c2757bf2bd6f2e3bb2
|
|
| MD5 |
3a1fe2e9e4e61dc2df2478b20749ee86
|
|
| BLAKE2b-256 |
0a3e5d30373ee695d91e21fe22170d88700ba38674a6b1c6a15aa63f20cdb9f2
|
Provenance
The following attestation bundles were made for victor_codegraph-0.1.1.tar.gz:
Publisher:
release-codegraph.yml on vjsingh1984/victor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
victor_codegraph-0.1.1.tar.gz -
Subject digest:
4fc5e40f246e72f793e960a459db229ea2594584131121c2757bf2bd6f2e3bb2 - Sigstore transparency entry: 1997604785
- Sigstore integration time:
-
Permalink:
vjsingh1984/victor@bf018507c3ece74bb619b5ac557ccd41027c3f6c -
Branch / Tag:
refs/tags/victor-codegraph-v0.1.1 - Owner: https://github.com/vjsingh1984
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-codegraph.yml@bf018507c3ece74bb619b5ac557ccd41027c3f6c -
Trigger Event:
push
-
Statement type:
File details
Details for the file victor_codegraph-0.1.1-py3-none-any.whl.
File metadata
- Download URL: victor_codegraph-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b11356e94565afd69647fa44749caa97859aaa493c307d922b08110fdb491ba5
|
|
| MD5 |
21ef93419eadadcd8ad6beb419f0465b
|
|
| BLAKE2b-256 |
113011c258d2b36eeb61a7ede523721a68f8c91bc788d54200677fbfedfb8263
|
Provenance
The following attestation bundles were made for victor_codegraph-0.1.1-py3-none-any.whl:
Publisher:
release-codegraph.yml on vjsingh1984/victor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
victor_codegraph-0.1.1-py3-none-any.whl -
Subject digest:
b11356e94565afd69647fa44749caa97859aaa493c307d922b08110fdb491ba5 - Sigstore transparency entry: 1997604847
- Sigstore integration time:
-
Permalink:
vjsingh1984/victor@bf018507c3ece74bb619b5ac557ccd41027c3f6c -
Branch / Tag:
refs/tags/victor-codegraph-v0.1.1 - Owner: https://github.com/vjsingh1984
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-codegraph.yml@bf018507c3ece74bb619b5ac557ccd41027c3f6c -
Trigger Event:
push
-
Statement type: