AST-based code analysis toolkit: symbol extraction, dependency graphs, semantic search, impact analysis, and repository mapping.
Project description
CervellaSwarm Code Intelligence
Find any symbol, trace any dependency, map any repository. Built on tree-sitter.
pip install cervellaswarm-code-intelligence
What It Does
Extract symbols from source code, build dependency graphs with PageRank scoring, and answer questions like:
- Where is
UserServicedefined? - What calls
authenticate()? What does it call? - How risky is it to refactor
DatabasePool? - What are the most important symbols in this repo?
Quick Start
Find symbols across your codebase
from cervellaswarm_code_intelligence import SemanticSearch
search = SemanticSearch("/path/to/your/repo")
# Where is this symbol defined?
location = search.find_symbol("UserService")
# => ("/path/to/your/repo/app/services.py", 42)
# Who calls this function?
callers = search.find_callers("authenticate")
# => [("app/auth.py", 15, "login"), ("app/api.py", 88, "verify_token")]
# What does this function call?
callees = search.find_callees("login")
# => ["authenticate", "generate_token", "log_attempt"]
Estimate impact of code changes
from cervellaswarm_code_intelligence import ImpactAnalyzer
analyzer = ImpactAnalyzer("/path/to/your/repo")
result = analyzer.estimate_impact("DatabasePool")
print(result.risk_level) # => "high"
print(result.risk_score) # => 0.62
print(result.callers_count) # => 14
print(result.files_affected) # => 7
print(result.reasons)
# => ["14 callers - high impact",
# "Used in 7 files - moderate scope",
# "Class type - changes may affect multiple methods"]
Generate repository maps within token budgets
from cervellaswarm_code_intelligence import RepoMapper
mapper = RepoMapper("/path/to/your/repo")
repo_map = mapper.build_map(token_budget=2000)
print(repo_map)
# => # REPOSITORY MAP
#
# ## app/auth.py
#
# def login(username: str, password: str) -> Token
# def verify_token(token: str) -> bool
# class AuthMiddleware
# ...
Extract symbols from a single file
from cervellaswarm_code_intelligence import SymbolExtractor, TreesitterParser
parser = TreesitterParser()
extractor = SymbolExtractor(parser)
symbols = extractor.extract_symbols("app/models.py")
for symbol in symbols:
print(f"{symbol.type:10} {symbol.name:20} line {symbol.line}")
# => class User line 5
# function create_user line 28
# function get_user_by_email line 45
Build and analyze dependency graphs
from cervellaswarm_code_intelligence import DependencyGraph, Symbol
graph = DependencyGraph()
# Add symbols and references
graph.add_symbol(login_symbol)
graph.add_symbol(auth_symbol)
graph.add_reference("auth.py:login", "auth.py:verify_credentials")
# Compute importance via PageRank
scores = graph.compute_importance()
# Get the most important symbols
top_10 = graph.get_top_symbols(n=10)
CLI Tools
Three command-line tools are included:
# Find where a symbol is defined, who calls it, what it calls
cervella-search /path/to/repo UserService
cervella-search /path/to/repo authenticate callers
cervella-search /path/to/repo login callees
# Estimate impact of modifying a symbol
cervella-impact /path/to/repo DatabasePool
# Risk: HIGH (0.62) - 14 callers, 7 files affected
# Generate a repository map within a token budget
cervella-map --repo-path /path/to/repo --budget 2000 --output repo_map.md
cervella-map --repo-path /path/to/repo --filter "**/*.py" --stats
Architecture
Source Files (.py, .ts, .tsx, .js, .jsx)
|
TreesitterParser -- Parse into AST
|
SymbolExtractor -- Extract functions, classes, interfaces
| |
PythonExtractor TypeScriptExtractor
|
DependencyGraph -- Build edges, compute PageRank
|
+-----------+-----------+
| | |
SemanticSearch RepoMapper ImpactAnalyzer
5 layers, 14 modules, 4 external dependencies.
Supported Languages
| Language | Extensions | Functions | Classes | Interfaces | Types | References |
|---|---|---|---|---|---|---|
| Python | .py |
Yes | Yes | -- | -- | Yes |
| TypeScript | .ts, .tsx |
Yes | Yes | Yes | Yes | Yes |
| JavaScript | .js, .jsx |
Yes | Yes | -- | -- | Yes |
Other languages: contributions welcome. The extractor architecture is designed for easy addition of new language backends.
API Reference
Core Classes
| Class | Purpose | Key Methods |
|---|---|---|
Symbol |
Data class for extracted symbols | .name, .type, .file, .line, .signature, .references |
TreesitterParser |
Parse source files into ASTs | .parse_file(path), .detect_language(path) |
SymbolExtractor |
Extract symbols from parsed files | .extract_symbols(path), .clear_cache() |
DependencyGraph |
Build and analyze dependency graphs | .add_symbol(), .compute_importance(), .get_top_symbols(n) |
SemanticSearch |
High-level code navigation | .find_symbol(), .find_callers(), .find_callees(), .find_references() |
ImpactAnalyzer |
Risk assessment for code changes | .estimate_impact(name), .find_dependencies(path), .find_dependents(path) |
RepoMapper |
Generate token-budgeted repo maps | .build_map(budget), .get_stats() |
Risk Score Algorithm
Impact analysis computes risk as: min(base + caller_factor + type_factor, 1.0)
| Factor | Range | Source |
|---|---|---|
base |
0.0 - 0.3 | PageRank importance score |
caller_factor |
0.0 - 0.4 | min(callers / 20, 0.4) |
type_factor |
0.0 - 0.3 | Symbol type (class=0.3, interface=0.25, function=0.2) |
Risk levels: low (< 0.3), medium (0.3-0.5), high (0.5-0.7), critical (> 0.7).
Limitations
- Language support: Python, TypeScript, and JavaScript only. No Go, Rust, Java, C++.
- Reference extraction: Based on name matching within AST, not full type resolution. This means it can produce false positives for common names.
- Performance: Builds a full in-memory index on initialization. For very large repositories (10k+ files), the initial scan may take several seconds.
- Token estimation: Uses a 4-chars-per-token heuristic, which is approximate.
Development
# Clone and install in development mode
git clone https://github.com/rafapra3008/cervellaswarm.git
cd cervellaswarm/packages/code-intelligence
pip install -e ".[dev]"
# Run tests (395 tests, ~0.5s)
pytest
# Run with coverage
pytest --cov=cervellaswarm_code_intelligence --cov-report=term-missing
Part of CervellaSwarm
This package is the code intelligence engine of CervellaSwarm, a multi-agent AI coordination system. It works standalone -- no other CervellaSwarm packages are required.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cervellaswarm_code_intelligence-0.1.0.tar.gz.
File metadata
- Download URL: cervellaswarm_code_intelligence-0.1.0.tar.gz
- Upload date:
- Size: 76.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
970ae6eca63ff000b20a030c564d7f2424f86fefe9a531203fe3718dadfaf79f
|
|
| MD5 |
03239da233866adc186fbcd85610d0c1
|
|
| BLAKE2b-256 |
da06f1b4ff455ec80666494fdbb21a38e6cb3d44bd2d9fd1aeeb9ed76fbda029
|
Provenance
The following attestation bundles were made for cervellaswarm_code_intelligence-0.1.0.tar.gz:
Publisher:
publish-pypi.yml on rafapra3008/cervellaswarm-internal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cervellaswarm_code_intelligence-0.1.0.tar.gz -
Subject digest:
970ae6eca63ff000b20a030c564d7f2424f86fefe9a531203fe3718dadfaf79f - Sigstore transparency entry: 962452376
- Sigstore integration time:
-
Permalink:
rafapra3008/cervellaswarm-internal@5179d58f798bd37e9e0c2a90ec836cb96123bdeb -
Branch / Tag:
refs/tags/code-intelligence-v0.1.0 - Owner: https://github.com/rafapra3008
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5179d58f798bd37e9e0c2a90ec836cb96123bdeb -
Trigger Event:
push
-
Statement type:
File details
Details for the file cervellaswarm_code_intelligence-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cervellaswarm_code_intelligence-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0677a71d9230cd3aedab74bb3104d094b409d3d1e113508b3cad06da3c8dcee4
|
|
| MD5 |
853b0966ed19fb99d766daf97261d2f7
|
|
| BLAKE2b-256 |
58fc5ec9d8c2c000082b960a950796062cf2cef868f9a22a9452e6986dbc897b
|
Provenance
The following attestation bundles were made for cervellaswarm_code_intelligence-0.1.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on rafapra3008/cervellaswarm-internal
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cervellaswarm_code_intelligence-0.1.0-py3-none-any.whl -
Subject digest:
0677a71d9230cd3aedab74bb3104d094b409d3d1e113508b3cad06da3c8dcee4 - Sigstore transparency entry: 962452379
- Sigstore integration time:
-
Permalink:
rafapra3008/cervellaswarm-internal@5179d58f798bd37e9e0c2a90ec836cb96123bdeb -
Branch / Tag:
refs/tags/code-intelligence-v0.1.0 - Owner: https://github.com/rafapra3008
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5179d58f798bd37e9e0c2a90ec836cb96123bdeb -
Trigger Event:
push
-
Statement type: