Deterministic codebase context for AI coding agents

These details have not been verified by PyPI

Project description

sourcecode

Deterministic codebase context for AI coding agents.

Turn any repository into structured, reproducible context optimized for AI coding agents — in one command.

pip install sourcecode
sourcecode . --agent

{
  "project": {
    "type": "api",
    "summary": "Python REST API built with FastAPI and SQLAlchemy. Layered architecture with domain, service, and infrastructure layers.",
    "primary_stack": "python",
    "frameworks": ["FastAPI", "SQLAlchemy"]
  },
  "entry_points": [
    { "path": "src/app/main.py", "kind": "server", "confidence": "high" }
  ],
  "architecture": "FastAPI application. Clean Architecture with domain, application, and infrastructure layers. Hub modules: schema.py, models.py.",
  "key_dependencies": [
    { "name": "fastapi", "declared_version": ">=0.100", "role": "runtime" },
    { "name": "sqlalchemy", "declared_version": "^2.0", "role": "runtime" },
    { "name": "pydantic", "declared_version": "^2.0", "role": "runtime" }
  ],
  "confidence_summary": { "overall": "high" }
}

The problem

AI coding agents are only as good as the context they receive. In large, real-world repositories, that context is almost always wrong.

Agents start blind. Without repo structure, they hallucinate imports, file paths, and architecture decisions.
Context is noisy. Raw file trees contain benchmark dirs, generated files, tooling configs, and docs that consume tokens without helping.
Architecture is invisible. LLMs see files, not systems. They miss layers, plugin systems, entry points, and runtime topology.
Context decays. What you paste today is stale tomorrow. There's no reproducible baseline.
Manual context doesn't scale. Handcrafting prompts per project is engineering debt that grows with every new agent, team, and task.

The solution

sourcecode analyzes your repository and produces a structured, reproducible context package — ready to inject into any AI coding agent.

What it does:

Detects stacks, frameworks, entry points, and project type across 10+ languages
Infers runtime topology: which packages are core, which are plugins, which are noise
Ranks files by operational relevance for agents: git churn + runtime proximity + bootstrap signal
Suppresses non-runtime noise: benchmarks, docs, tooling, generated files
Produces structured JSON/YAML that agents can reason over, not raw file trees
Runs deterministically — same repo, same output, every time

What it outputs:

project_summary — one-sentence natural language description
architecture_summary — runtime topology: layers, plugin systems, entry flows
entry_points — where execution actually starts (production, not benchmarks)
key_dependencies — runtime dependencies with role classification
relevant_files — ranked by usefulness for coding tasks, not folder position
confidence_summary — detection quality and analysis gaps

All fields are stable, machine-readable, and designed for LLM consumption.

Install

pip install sourcecode

Requires Python 3.9+. No API keys. No network calls. Runs locally.

Quickstart

Basic analysis:

sourcecode .

Agent-optimized output (structured, noise-free, gap-aware):

sourcecode . --agent

Task-specific context for coding agents:

# Explain the project architecture
sourcecode . prepare-context explain

# Find likely bug locations
sourcecode . prepare-context fix-bug

# Onboard a new agent to the codebase
sourcecode . prepare-context onboard

# Ranked context for a specific task
sourcecode . prepare-context refactor

Pipe directly into Claude Code or any agent:

sourcecode . --agent | claude -p "Review the architecture and suggest improvements"

Write to file for session injection:

sourcecode . --agent --output context.json

Include git activity signals:

sourcecode . --agent --git-context

Use cases

Claude Code

# Start every session with full context
sourcecode . --agent > .claude/context.json

# Use with CLAUDE.md for persistent context
echo "$(sourcecode . --agent --compact)" >> CLAUDE.md

Cursor / Windsurf / Copilot

# Generate context snapshot before starting a feature
sourcecode . --agent --git-context --output .cursor/context.json

OpenAI / Anthropic API

import json, subprocess

context = json.loads(
    subprocess.check_output(["sourcecode", ".", "--agent"])
)

system_prompt = f"""
You are working on: {context['project']['summary']}
Architecture: {context['architecture']}
Entry points: {[ep['path'] for ep in context['entry_points']]}
"""

CI / CD pipelines

# .github/workflows/context.yml
- name: Generate codebase context
  run: sourcecode . --agent --output context.json

- name: AI-assisted code review
  run: |
    CONTEXT=$(cat context.json)
    # Inject into your preferred AI review step

Onboarding new engineers

# Generate human-readable architecture summary
sourcecode . prepare-context onboard --llm-prompt

Architecture audits

sourcecode . --agent --architecture --graph-modules --dependencies

How it works

sourcecode runs a local, static analysis pipeline on your repository:

Repository
    │
    ├── Scanner          # File tree, manifests, workspace detection
    ├── Stack Detectors  # Language, framework, package manager detection
    ├── Entry Points     # Production entry points (not benchmarks/docs)
    ├── Git Analyzer     # Churn hotspots, uncommitted changes
    ├── Relevance Scorer # Runtime proximity × git churn × bootstrap signal
    └── Serializer       # Structured JSON/YAML output

No LLM calls. No network requests. No sampling. Fully deterministic.

The same repository produces the same output on every run — which means agents can cache it, diff it, and rely on it.

Output modes

Mode	Use case	Size
`sourcecode .`	Full analysis	Full
`sourcecode . --agent`	AI agent injection	~600–1000 tokens
`sourcecode . --compact`	Prompts, handoffs	~500–700 tokens
`sourcecode . prepare-context <task>`	Task-specific context	~800–1200 tokens

Available flags

Flag	Description
`--agent`	Structured, noise-free output for AI agents. Auto-enables `--dependencies`, `--env-map`, `--code-notes`.
`--dependencies`	Direct dependencies with versions and role classification.
`--git-context`	Recent commits, change hotspots, uncommitted files.
`--architecture`	Layer inference: MVC, layered, hexagonal, domain-based.
`--graph-modules`	Module import graph and call relationships.
`--semantics`	Cross-file symbol resolution and call graph.
`--env-map`	All environment variables referenced in source.
`--code-notes`	TODOs, FIXMEs, HACKs, and Architecture Decision Records.
`--compact`	Minimal output for token-constrained prompts.
`--format yaml`	YAML instead of JSON.
`--output PATH`	Write to file instead of stdout.

Full reference: sourcecode --help

Prepare-context tasks

Task	What it produces
`explain`	Architecture + entry points + key dependencies
`fix-bug`	Risk-ranked files + suspected areas + code annotations
`refactor`	Structural issues + improvement opportunities
`generate-tests`	Untested source files + test gap analysis
`onboard`	Full project understanding for new agents/developers
`review-pr`	Changed files + architectural impact
`delta`	Git-changed files only — incremental context

Philosophy

Determinism over approximation. Every run on the same repository produces the same output. Agents, pipelines, and teams can depend on that.

Runtime topology over file trees. What matters is where execution starts, what calls what, and which modules are actually critical — not alphabetical file lists.

Noise suppression by default. Benchmark dirs, generated files, tooling configs, and docs are suppressed unless explicitly requested. Agents get signal, not inventory.

Local-first, privacy-respecting. No code leaves your machine. No API keys required. Analysis is fully offline.

Composable, not monolithic. Output is structured data. Pipe it, transform it, inject it, cache it. It's infrastructure, not a magic black box.

Confidence-aware. Every analysis includes a confidence summary and gap list. Agents know what they don't know.

Supported languages and stacks

Language	Package detection	Entry points	Frameworks
Python	`pyproject.toml`, `requirements.txt`, `setup.py`	CLI, scripts, `__main__`	FastAPI, Django, Flask, Typer, Click
Node.js	`package.json`, lock files	`main`, `bin`, scripts	Express, Next.js, Fastify, NestJS, React, Vue
Go	`go.mod`	`main.go`, `cmd/`	Standard library, Gin, Echo
Rust	`Cargo.toml`	`main.rs`, `lib.rs`	Tokio, Actix, Axum
Java	`pom.xml`, `build.gradle`	Spring Boot, Quarkus, Micronaut	Spring, Quarkus
Kotlin	`build.gradle.kts`	Spring Boot, Ktor	Spring, Ktor
.NET / C#	`.csproj`, `.sln`	`Program.cs`	ASP.NET, Blazor
PHP	`composer.json`	`index.php`	Laravel, Symfony
Ruby	`Gemfile`	`config.ru`	Rails, Sinatra
Dart	`pubspec.yaml`	`main.dart`	Flutter

Monorepos with mixed stacks are fully supported.

Roadmap

Now — Core stability

Ranking improvements (git churn, runtime proximity)
Better architecture inference
Broader language coverage

Next — Agent integrations

MCP server for native Claude Code integration
VS Code extension
Context diffing (compare before/after changes)
Incremental updates (delta mode improvements)

Later — Team features

Shared context snapshots
Architecture drift detection
CI integration templates
Governance and compliance context

Focus is on adoption and utility. No monetization until the core is genuinely useful to the community.

Contributing

We welcome contributions. See CONTRIBUTING.md for setup, testing, and guidelines.

Quick start for contributors:

git clone https://github.com/sourcecode-ai/sourcecode
cd sourcecode
pip install -e ".[dev]"
pytest tests/

Security

sourcecode analyzes local repositories. It does not transmit code, paths, or analysis results to any external service. See SECURITY.md for our security policy and responsible disclosure process.

Privacy

Telemetry is opt-in only and disabled by default. If you choose to enable it, only anonymous usage metadata is collected — never code, paths, or content. See docs/privacy.md for full details.

sourcecode telemetry status   # check current setting
sourcecode telemetry enable   # opt in
sourcecode telemetry disable  # opt out

License

Apache License 2.0. See LICENSE for details.

Built for the age of AI coding agents.
GitHub · PyPI · Documentation

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.31.30

May 26, 2026

1.31.29

May 26, 2026

1.31.28

May 26, 2026

1.31.27

May 26, 2026

1.31.26

May 26, 2026

1.31.25

May 25, 2026

1.31.24

May 25, 2026

1.31.23

May 25, 2026

1.31.22

May 25, 2026

1.31.21

May 25, 2026

1.31.20

May 25, 2026

1.31.18

May 24, 2026

1.31.17

May 24, 2026

1.31.16

May 24, 2026

1.31.15

May 24, 2026

1.31.14

May 24, 2026

1.31.13

May 24, 2026

1.31.12

May 22, 2026

1.31.11

May 22, 2026

1.31.10

May 22, 2026

1.31.9

May 21, 2026

1.31.8

May 21, 2026

1.31.7

May 21, 2026

1.31.6

May 21, 2026

1.31.5

May 21, 2026

1.31.4

May 21, 2026

1.31.3

May 21, 2026

1.31.2

May 21, 2026

1.31.1

May 21, 2026

1.31.0

May 20, 2026

1.30.30

May 20, 2026

1.30.29

May 19, 2026

1.30.28

May 19, 2026

1.30.27

May 18, 2026

1.30.26

May 18, 2026

1.30.25

May 18, 2026

1.30.24

May 18, 2026

1.30.23

May 18, 2026

1.30.22

May 18, 2026

1.30.21

May 18, 2026

1.30.20

May 18, 2026

1.30.19

May 18, 2026

1.30.18

May 18, 2026

1.30.17

May 18, 2026

1.30.16

May 18, 2026

1.30.15

May 17, 2026

1.30.14

May 17, 2026

1.30.13

May 16, 2026

1.30.12

May 16, 2026

1.30.11

May 16, 2026

1.30.10

May 16, 2026

1.30.9

May 16, 2026

1.30.8

May 16, 2026

1.30.7

May 16, 2026

1.30.6

May 16, 2026

1.30.5

May 16, 2026

1.30.4

May 16, 2026

1.30.3

May 16, 2026

1.30.2

May 16, 2026

1.30.1

May 16, 2026

1.30.0

May 16, 2026

1.29.0

May 16, 2026

1.28.0

May 16, 2026

1.27.0

May 16, 2026

1.26.0

May 13, 2026

1.24.0

May 13, 2026

1.23.0

May 13, 2026

1.22.0

May 13, 2026

1.21.0

May 13, 2026

1.20.0

May 13, 2026

1.19.0

May 13, 2026

1.18.0

May 13, 2026

1.17.0

May 13, 2026

1.16.0

May 13, 2026

1.15.1

May 13, 2026

1.15.0

May 13, 2026

1.14.0

May 12, 2026

1.13.0

May 12, 2026

1.12.0

May 11, 2026

1.11.0

May 11, 2026

1.10.0

May 9, 2026

1.9.0

May 9, 2026

1.8.0

May 8, 2026

1.7.0

May 8, 2026

1.6.0

May 8, 2026

1.5.0

May 8, 2026

1.4.0

May 8, 2026

1.3.0

May 8, 2026

1.2.0

May 8, 2026

1.1.0

May 8, 2026

1.0.0

May 5, 2026

0.49.0

May 5, 2026

0.48.0

May 5, 2026

0.47.0

May 5, 2026

0.46.0

May 5, 2026

0.45.0

May 5, 2026

0.44.0

May 4, 2026

0.43.0

May 4, 2026

0.42.0

May 4, 2026

0.41.0

May 3, 2026

0.39.0

May 3, 2026

0.38.0

May 3, 2026

0.37.0

May 3, 2026

0.36.0

May 3, 2026

0.35.0

May 3, 2026

0.34.0

May 3, 2026

0.33.0

May 2, 2026

This version

0.32.0

May 1, 2026

0.31.0

May 1, 2026

0.30.0

May 1, 2026

0.29.0

May 1, 2026

0.28.0

Apr 30, 2026

0.27.0

Apr 29, 2026

0.26.0

Apr 29, 2026

0.25.0

Apr 29, 2026

0.24.0

Apr 29, 2026

0.23.0

Apr 26, 2026

0.22.0

Apr 25, 2026

0.21.0

Apr 25, 2026

0.20.0

Apr 24, 2026

0.19.0

Apr 24, 2026

0.18.0

Apr 24, 2026

0.17.0

Apr 24, 2026

0.15.1

Apr 23, 2026

0.15.0

Apr 23, 2026

0.14.0

Apr 23, 2026

0.13.0

Apr 18, 2026

0.12.0

Apr 18, 2026

0.11.0

Apr 15, 2026

0.10.0

Apr 14, 2026

0.9.0

Apr 11, 2026

0.8.0

Apr 10, 2026

0.7.0

Apr 10, 2026

0.6.0

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourcecode-0.32.0.tar.gz (251.0 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sourcecode-0.32.0-py3-none-any.whl (193.7 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file sourcecode-0.32.0.tar.gz.

File metadata

Download URL: sourcecode-0.32.0.tar.gz
Upload date: May 1, 2026
Size: 251.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sourcecode-0.32.0.tar.gz
Algorithm	Hash digest
SHA256	`24050163fe68e78d770a38ffb3b3c6853ac7a45c767824197026d736eb9591a9`
MD5	`4918b92ec3fa449ab6438d89612cc138`
BLAKE2b-256	`9cf2d014771cd398049c4eaf5348400d72d9880b54757b54a7c888e10b95de0d`

See more details on using hashes here.

File details

Details for the file sourcecode-0.32.0-py3-none-any.whl.

File metadata

Download URL: sourcecode-0.32.0-py3-none-any.whl
Upload date: May 1, 2026
Size: 193.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sourcecode-0.32.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cdd6b0304f9ceb00a13c0d4b95fd688c3dab09ebe8f8ebdb52004f3bf7efc347`
MD5	`0119bd7fbab3265f9c25527fb9c55945`
BLAKE2b-256	`9e013843f87ce3e0ce7e8f14b46dd6e2fac4c0572b79ac71d0a66e790b0e0f18`

See more details on using hashes here.

sourcecode 0.32.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

sourcecode

The problem

The solution

Install

Quickstart

Use cases

Claude Code

Cursor / Windsurf / Copilot

OpenAI / Anthropic API

CI / CD pipelines

Onboarding new engineers

Architecture audits

How it works

Output modes

Available flags

Prepare-context tasks

Philosophy

Supported languages and stacks

Roadmap

Contributing

Security

Privacy

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes