Deterministic codebase context for AI coding agents
Project description
sourcecode
Deterministic codebase context for AI coding agents.
Turn any repository into structured, reproducible context optimized for AI coding agents — in one command.
pip install sourcecode
sourcecode . --agent
{
"project": {
"type": "api",
"summary": "Python REST API built with FastAPI and SQLAlchemy. Layered architecture with domain, service, and infrastructure layers.",
"primary_stack": "python",
"frameworks": ["FastAPI", "SQLAlchemy"]
},
"entry_points": [
{ "path": "src/app/main.py", "kind": "server", "confidence": "high" }
],
"architecture": "FastAPI application. Clean Architecture with domain, application, and infrastructure layers. Hub modules: schema.py, models.py.",
"key_dependencies": [
{ "name": "fastapi", "declared_version": ">=0.100", "role": "runtime" },
{ "name": "sqlalchemy", "declared_version": "^2.0", "role": "runtime" },
{ "name": "pydantic", "declared_version": "^2.0", "role": "runtime" }
],
"confidence_summary": { "overall": "high" }
}
The problem
AI coding agents are only as good as the context they receive. In large, real-world repositories, that context is almost always wrong.
- Agents start blind. Without repo structure, they hallucinate imports, file paths, and architecture decisions.
- Context is noisy. Raw file trees contain benchmark dirs, generated files, tooling configs, and docs that consume tokens without helping.
- Architecture is invisible. LLMs see files, not systems. They miss layers, plugin systems, entry points, and runtime topology.
- Context decays. What you paste today is stale tomorrow. There's no reproducible baseline.
- Manual context doesn't scale. Handcrafting prompts per project is engineering debt that grows with every new agent, team, and task.
The solution
sourcecode analyzes your repository and produces a structured, reproducible context package — ready to inject into any AI coding agent.
What it does:
- Detects stacks, frameworks, entry points, and project type across 10+ languages
- Infers runtime topology: which packages are core, which are plugins, which are noise
- Ranks files by operational relevance for agents: git churn + runtime proximity + bootstrap signal
- Suppresses non-runtime noise: benchmarks, docs, tooling, generated files
- Produces structured JSON/YAML that agents can reason over, not raw file trees
- Runs deterministically — same repo, same output, every time
What it outputs:
project_summary— one-sentence natural language descriptionarchitecture_summary— runtime topology: layers, plugin systems, entry flowsentry_points— where execution actually starts (production, not benchmarks)key_dependencies— runtime dependencies with role classificationrelevant_files— ranked by usefulness for coding tasks, not folder positionconfidence_summary— detection quality and analysis gaps
All fields are stable, machine-readable, and designed for LLM consumption.
Install
pip install sourcecode
Requires Python 3.9+. No API keys. No network calls. Runs locally.
Quickstart
Basic analysis:
sourcecode .
Agent-optimized output (structured, noise-free, gap-aware):
sourcecode . --agent
Task-specific context for coding agents:
# Explain the project architecture
sourcecode . prepare-context explain
# Find likely bug locations
sourcecode . prepare-context fix-bug
# Onboard a new agent to the codebase
sourcecode . prepare-context onboard
# Ranked context for a specific task
sourcecode . prepare-context refactor
Pipe directly into Claude Code or any agent:
sourcecode . --agent | claude -p "Review the architecture and suggest improvements"
Write to file for session injection:
sourcecode . --agent --output context.json
Include git activity signals:
sourcecode . --agent --git-context
Use cases
Claude Code
# Start every session with full context
sourcecode . --agent > .claude/context.json
# Use with CLAUDE.md for persistent context
echo "$(sourcecode . --agent --compact)" >> CLAUDE.md
Cursor / Windsurf / Copilot
# Generate context snapshot before starting a feature
sourcecode . --agent --git-context --output .cursor/context.json
OpenAI / Anthropic API
import json, subprocess
context = json.loads(
subprocess.check_output(["sourcecode", ".", "--agent"])
)
system_prompt = f"""
You are working on: {context['project']['summary']}
Architecture: {context['architecture']}
Entry points: {[ep['path'] for ep in context['entry_points']]}
"""
CI / CD pipelines
# .github/workflows/context.yml
- name: Generate codebase context
run: sourcecode . --agent --output context.json
- name: AI-assisted code review
run: |
CONTEXT=$(cat context.json)
# Inject into your preferred AI review step
Onboarding new engineers
# Generate human-readable architecture summary
sourcecode . prepare-context onboard --llm-prompt
Architecture audits
sourcecode . --agent --architecture --graph-modules --dependencies
How it works
sourcecode runs a local, static analysis pipeline on your repository:
Repository
│
├── Scanner # File tree, manifests, workspace detection
├── Stack Detectors # Language, framework, package manager detection
├── Entry Points # Production entry points (not benchmarks/docs)
├── Git Analyzer # Churn hotspots, uncommitted changes
├── Relevance Scorer # Runtime proximity × git churn × bootstrap signal
└── Serializer # Structured JSON/YAML output
No LLM calls. No network requests. No sampling. Fully deterministic.
The same repository produces the same output on every run — which means agents can cache it, diff it, and rely on it.
Output modes
| Mode | Use case | Size |
|---|---|---|
sourcecode . |
Full analysis | Full |
sourcecode . --agent |
AI agent injection | ~600–1000 tokens |
sourcecode . --compact |
Prompts, handoffs | ~500–700 tokens |
sourcecode . prepare-context <task> |
Task-specific context | ~800–1200 tokens |
Available flags
| Flag | Description |
|---|---|
--agent |
Structured, noise-free output for AI agents. Auto-enables --dependencies, --env-map, --code-notes. |
--dependencies |
Direct dependencies with versions and role classification. |
--git-context |
Recent commits, change hotspots, uncommitted files. |
--architecture |
Layer inference: MVC, layered, hexagonal, domain-based. |
--graph-modules |
Module import graph and call relationships. |
--semantics |
Cross-file symbol resolution and call graph. |
--env-map |
All environment variables referenced in source. |
--code-notes |
TODOs, FIXMEs, HACKs, and Architecture Decision Records. |
--compact |
Minimal output for token-constrained prompts. |
--format yaml |
YAML instead of JSON. |
--output PATH |
Write to file instead of stdout. |
Full reference: sourcecode --help
Prepare-context tasks
| Task | What it produces |
|---|---|
explain |
Architecture + entry points + key dependencies |
fix-bug |
Risk-ranked files + suspected areas + code annotations |
refactor |
Structural issues + improvement opportunities |
generate-tests |
Untested source files + test gap analysis |
onboard |
Full project understanding for new agents/developers |
review-pr |
Changed files + architectural impact |
delta |
Git-changed files only — incremental context |
Philosophy
Determinism over approximation. Every run on the same repository produces the same output. Agents, pipelines, and teams can depend on that.
Runtime topology over file trees. What matters is where execution starts, what calls what, and which modules are actually critical — not alphabetical file lists.
Noise suppression by default. Benchmark dirs, generated files, tooling configs, and docs are suppressed unless explicitly requested. Agents get signal, not inventory.
Local-first, privacy-respecting. No code leaves your machine. No API keys required. Analysis is fully offline.
Composable, not monolithic. Output is structured data. Pipe it, transform it, inject it, cache it. It's infrastructure, not a magic black box.
Confidence-aware. Every analysis includes a confidence summary and gap list. Agents know what they don't know.
Supported languages and stacks
| Language | Package detection | Entry points | Frameworks |
|---|---|---|---|
| Python | pyproject.toml, requirements.txt, setup.py |
CLI, scripts, __main__ |
FastAPI, Django, Flask, Typer, Click |
| Node.js | package.json, lock files |
main, bin, scripts |
Express, Next.js, Fastify, NestJS, React, Vue |
| Go | go.mod |
main.go, cmd/ |
Standard library, Gin, Echo |
| Rust | Cargo.toml |
main.rs, lib.rs |
Tokio, Actix, Axum |
| Java | pom.xml, build.gradle |
Spring Boot, Quarkus, Micronaut | Spring, Quarkus |
| Kotlin | build.gradle.kts |
Spring Boot, Ktor | Spring, Ktor |
| .NET / C# | .csproj, .sln |
Program.cs |
ASP.NET, Blazor |
| PHP | composer.json |
index.php |
Laravel, Symfony |
| Ruby | Gemfile |
config.ru |
Rails, Sinatra |
| Dart | pubspec.yaml |
main.dart |
Flutter |
Monorepos with mixed stacks are fully supported.
Roadmap
Now — Core stability
- Ranking improvements (git churn, runtime proximity)
- Better architecture inference
- Broader language coverage
Next — Agent integrations
- MCP server for native Claude Code integration
- VS Code extension
- Context diffing (compare before/after changes)
- Incremental updates (delta mode improvements)
Later — Team features
- Shared context snapshots
- Architecture drift detection
- CI integration templates
- Governance and compliance context
Focus is on adoption and utility. No monetization until the core is genuinely useful to the community.
Contributing
We welcome contributions. See CONTRIBUTING.md for setup, testing, and guidelines.
Quick start for contributors:
git clone https://github.com/sourcecode-ai/sourcecode
cd sourcecode
pip install -e ".[dev]"
pytest tests/
Security
sourcecode analyzes local repositories. It does not transmit code, paths, or analysis results to any external service. See SECURITY.md for our security policy and responsible disclosure process.
Privacy
Telemetry is opt-in only and disabled by default. If you choose to enable it, only anonymous usage metadata is collected — never code, paths, or content. See docs/privacy.md for full details.
sourcecode telemetry status # check current setting
sourcecode telemetry enable # opt in
sourcecode telemetry disable # opt out
License
Apache License 2.0. See LICENSE for details.
Built for the age of AI coding agents.
GitHub ·
PyPI ·
Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sourcecode-0.35.0.tar.gz.
File metadata
- Download URL: sourcecode-0.35.0.tar.gz
- Upload date:
- Size: 274.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2978c4103ce0d6ad763539d7c660d5ce5d10e53c0326692baf5d510742b47fa
|
|
| MD5 |
6a5c62f3091ac6d69081b11ce054ea00
|
|
| BLAKE2b-256 |
9c7ebe3f685b5f01fe546f065f661f73c8f072d8ffb705be4114c03b2542ff99
|
File details
Details for the file sourcecode-0.35.0-py3-none-any.whl.
File metadata
- Download URL: sourcecode-0.35.0-py3-none-any.whl
- Upload date:
- Size: 215.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adb664657bb56c325bd6ba2fabca08a4df5b3e1172a8c9a12fb112dd5fcdc48a
|
|
| MD5 |
547fc2997ddf2b32565748de53d06791
|
|
| BLAKE2b-256 |
19b1685355dc30a5b1eead941f7c52501c54a3dc5ab3b6ead4ba5ee4b1f53cb4
|