Local Java code intelligence indexer backed by a graph database
Project description
CodeSpine
CodeSpine is a Java-native code intelligence graph for coding agents.
It indexes your Java codebase into a graph, then serves high-signal retrieval and analysis APIs over CLI + MCP for refactoring, impact analysis, architecture navigation, and safe change planning.
Why CodeSpine
Most tools answer "where is this symbol?". CodeSpine answers:
- What depends on this?
- What else changed with this historically?
- Is this dead or framework-exempt?
- Which architectural cluster/flow is this in?
- What changed between branches at symbol granularity?
Core Capabilities
1) Hybrid Search (BM25 + Vector + Fuzzy + RRF)
- Lexical ranking (BM25-based)
- Semantic matching (local embeddings)
- Typo-tolerant fuzzy matching
- Reciprocal Rank Fusion with ranking multipliers
2) Impact Analysis
- Traverses call graph + type/inheritance edges + coupling edges
- Groups results by depth (
1,2,3+) - Carries confidence (
1.0,0.8,0.5) per edge
3) Java-Aware Dead Code Detection
- Not just zero-callers: includes exemption passes for:
- constructors, tests,
main(String[] args) - override/interface contracts
- common lifecycle/framework annotations
- reflection/bean-friendly method patterns
4) Execution Flow Tracing
- Detects framework-agnostic entry points (
main, tests, public roots) - BFS flow traces with depth
- Flow classification (
intra_community,cross_community)
5) Community Detection
- Leiden-based clustering when dependencies are present
- Heuristic fallback when Leiden stack is unavailable
- Queryable symbol-to-community mapping
6) Git Change Coupling
- Mines recent git history (default 6 months)
- Links co-changing files with coupling strength
- Surfaces hidden dependencies in impact workflows
7) Watch Mode
- Live file watching for changed Java files
- Incremental reindexing
- Periodic global refresh phases (community/flow/deadcode/coupling)
8) Branch Diff (Symbol-Level)
- Uses git worktrees
- Diffs class/method symbols (
added,removed,modified) - Uses normalized structural hashes to reduce formatting-only noise
Performance Model
CodeSpine includes:
- Hash-based incremental invalidation (only changed files reindexed)
- Persistent embedding cache (
sqlite) for repeat semantic queries - Transactional write path during indexing to reduce commit overhead
Install
Local editable install
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
If your environment is externally managed (PEP 668), use a virtualenv as above.
You can also use pip3:
pip3 install -e .
Install from GitHub
pip install "git+https://github.com/vinayak3022/codeSpine.git"
or
pip3 install "git+https://github.com/vinayak3022/codeSpine.git"
Install from PyPI (after first release is published)
pip install codespine
or
pip3 install codespine
Optional extras
pip install -e .[ml]for local embedding model dependenciespip install -e .[community]for Leiden community detection stackpip install -e .[full]for all optional features
Quick Start
# 1) index a repo
codespine analyse /path/to/java-project --full
# 2) search by concept/typo/name
codespine search "payment validation typo procss" --k 20 --json
# 3) get actionable context in one call
codespine context "processPayment" --max-depth 3 --json
# 4) estimate blast radius before refactor
codespine impact com.example.Service#processPayment(java.lang.String) --max-depth 4 --json
Example output:
$ codespine analyse .
Walking files... 142 files found
Parsing code... 142/142
Tracing calls... 847 calls resolved
Analyzing types... 234 type relationships
Detecting communities... 8 clusters found
Detecting execution flows... 34 processes found
Finding dead code... 12 unreachable symbols
Analyzing git history... 18 coupled file pairs
Generating embeddings... 623 vectors stored
Done in 4.2s - 623 symbols, 1847 edges, 8 clusters, 34 flows
CLI Commands
Indexing and Retrieval
codespine analyse <path> [--full|--incremental]codespine search <query> [--k 20] [--json]codespine context <query> [--max-depth 3] [--json]
Analysis
codespine impact <symbol> [--max-depth 4] [--json]codespine deadcode [--limit 200] [--json]codespine flow [--entry <symbol>] [--max-depth 6] [--json]codespine community [--symbol <symbol>] [--json]codespine coupling [--months 6] [--min-strength 0.3] [--min-cochanges 3] [--json]
Operations
codespine watch [--path .] [--global-interval 30]codespine diff <base>..<head> [--json]codespine cypher <query> [--json]codespine list [--json]codespine statscodespine status [--json]codespine setupcodespine clean [--force]
MCP Service
codespine startcodespine stopcodespine serve(alias ofstart)codespine mcp(foreground stdio MCP)
MCP JSON (Paste Into mcp.json)
Use this if your MCP client supports stdio servers:
{
"mcpServers": {
"codespine": {
"command": "codespine",
"args": ["mcp"]
}
}
}
If codespine is not on your PATH, use an absolute path for command, for example:
- macOS/Linux:
"/Users/<you>/path/to/venv/bin/codespine" - Windows:
"C:\\\\Users\\\\<you>\\\\path\\\\to\\\\venv\\\\Scripts\\\\codespine.exe"
Optional working directory (recommended for repo-scoped usage):
{
"mcpServers": {
"codespine": {
"command": "codespine",
"args": ["mcp"],
"cwd": "/absolute/path/to/your/repo"
}
}
}
MCP Tool Surface
search_hybrid(query, k=20)get_symbol_context(query, max_depth=3)get_impact(symbol, max_depth=4)detect_dead_code(limit=200)trace_execution_flows(entry_symbol=None, max_depth=6)get_symbol_community(symbol)get_change_coupling(symbol=None, months=6, min_strength=0.3, min_cochanges=3)compare_branches(base_ref, head_ref)get_codebase_stats()run_cypher(query)
Runtime Artifacts
- Graph DB:
~/.codespine_db - MCP PID:
~/.codespine.pid - Log file:
~/.codespine.log - Embedding cache:
~/.codespine_embedding_cache.sqlite3
Architecture
codespine/indexer: Java parsing, symbols, call/type resolutioncodespine/db: Kuzu schema and persistencecodespine/search: BM25/fuzzy/vector/RRF rankingcodespine/analysis: impact/deadcode/flow/community/coupling/contextcodespine/diff: branch comparison at symbol levelcodespine/watch: incremental watch pipelinecodespine/mcp: MCP tool servercodespine/noise: noise blocklists for cleaner call graphs
Security and Governance
- Security policy:
SECURITY.md - Contributions:
CONTRIBUTING.md - Code of conduct:
CODE_OF_CONDUCT.md - Branch protection runbook:
docs/GITHUB_HARDENING.md
Publish to PyPI
This repo includes a release workflow:
Recommended setup (one-time):
- Create project on PyPI with the same name (
codespine) or updateproject.nameif unavailable. - In PyPI, configure Trusted Publisher for this GitHub repo/workflow.
- In GitHub, keep the
pypienvironment enabled for publishing.
Release flow:
- Bump version in
pyproject.toml. - Push commit + tag (for example
v0.1.1). - Create a GitHub Release for that tag.
- Workflow builds and publishes to PyPI.
Compatibility
gindex.py is retained as a compatibility shim for one release cycle.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codespine-0.1.1.tar.gz.
File metadata
- Download URL: codespine-0.1.1.tar.gz
- Upload date:
- Size: 36.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5249ca29b92a8300c1af925066ac063259903d6836fbd602a8f8982880954da
|
|
| MD5 |
06e88371d4e5e340f04ad82cebb33f1b
|
|
| BLAKE2b-256 |
99621c50ff075c5d42ee9e46901849b1cfd1629753fcbfb4b4b76922ed9b8dfc
|
Provenance
The following attestation bundles were made for codespine-0.1.1.tar.gz:
Publisher:
publish-pypi.yml on vinayak3022/codeSpine
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codespine-0.1.1.tar.gz -
Subject digest:
a5249ca29b92a8300c1af925066ac063259903d6836fbd602a8f8982880954da - Sigstore transparency entry: 1006951924
- Sigstore integration time:
-
Permalink:
vinayak3022/codeSpine@7290f38fec7eea41062f33dfca4f9af47d58e13c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/vinayak3022
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@7290f38fec7eea41062f33dfca4f9af47d58e13c -
Trigger Event:
push
-
Statement type:
File details
Details for the file codespine-0.1.1-py3-none-any.whl.
File metadata
- Download URL: codespine-0.1.1-py3-none-any.whl
- Upload date:
- Size: 40.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2977437e79e7afb035fc7a4926547486f7b571353c36b45499ed30d20b26cf95
|
|
| MD5 |
19a8c97f34dcb6108ee828e657227d28
|
|
| BLAKE2b-256 |
da493a692979296b385e2d05267148be9df376334b3103bf6a7c24717b5f625b
|
Provenance
The following attestation bundles were made for codespine-0.1.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on vinayak3022/codeSpine
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codespine-0.1.1-py3-none-any.whl -
Subject digest:
2977437e79e7afb035fc7a4926547486f7b571353c36b45499ed30d20b26cf95 - Sigstore transparency entry: 1006951927
- Sigstore integration time:
-
Permalink:
vinayak3022/codeSpine@7290f38fec7eea41062f33dfca4f9af47d58e13c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/vinayak3022
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@7290f38fec7eea41062f33dfca4f9af47d58e13c -
Trigger Event:
push
-
Statement type: