Genera un mapa de contexto estructurado de proyectos de software para agentes IA
Project description
sourcecode
sourcecode generates a structured project context map so an agent can quickly understand a repository's stack, entry points, and overall shape. Designed for injection into AI development agents as initial session context.
Installation
pip install sourcecode
Requires Python 3.9+.
Quick Start
Analyze the current directory as JSON:
sourcecode .
Generate a compact view for prompts or handoff (~500-700 tokens):
sourcecode --compact .
Analyze another directory and write YAML to a file:
sourcecode --format yaml --output sourcecode.yaml /path/to/project
Include direct dependencies, exact versions, and transitive dependencies when compatible lockfiles are available:
sourcecode . --dependencies
Include an internal module graph with imports and structural relations:
sourcecode . --graph-modules
Extract docstrings, signatures, and comments from Python and JS/TS modules:
sourcecode . --docs
Control how deep the documentation extraction goes:
sourcecode . --docs --docs-depth module # module-level docs only
sourcecode . --docs --docs-depth symbols # modules + functions/classes (default)
sourcecode . --docs --docs-depth full # all symbols including methods
Show the version:
sourcecode --version
What It Detects
- Stacks: Node.js, Python, Go, Rust, Java, PHP, Ruby, and Dart.
- Frameworks associated with each stack when enough signals are present.
project_type:webapp,api,library,cli,fullstack,monorepo, orunknown.- Relevant
entry_points, such asmain.py,cmd/api/main.go, orapp/page.tsx. - Workspace roots in multi-stack or monorepo repositories.
CLI Options
| Option | Default | Description |
|---|---|---|
PATH |
. |
Directory to analyze. |
--format json|yaml |
json |
Output format. |
--output PATH |
stdout | Write to a file instead of stdout. |
--compact |
off | Reduced output (~500-700 tokens): schema_version, project_type, project_summary, architecture_summary, stacks, entry_points, file_tree_depth1, and dependency_summary when available. |
--dependencies |
off | Include direct dependencies, resolved versions, and transitive relationships when lockfiles make that possible. Also populates key_dependencies. |
--graph-modules |
off | Include a structural module graph with imports and simple relations. |
--graph-detail high|medium|full |
high |
Graph detail level: summarized (high), balanced (medium), or full-fidelity (full). |
--max-nodes INTEGER |
none | Cap graph size in high and medium modes. Min: 1. |
--graph-edges imports,calls,contains,extends |
none | Override the default edge kinds for the selected detail level. |
--docs |
off | Include extracted documentation: docstrings, signatures, and comments from Python and JS/TS modules and symbols. |
--docs-depth module|symbols|full |
symbols |
Documentation extraction depth: module-level only, modules and top-level symbols (functions/classes), or all symbols including methods. |
--full-metrics |
off | Include code quality metrics: LOC, symbols, complexity, tests, and coverage per file. |
--semantics |
off | Include semantic call graph, cross-file symbol linking, and advanced import resolution. |
--architecture |
off | Architectural inference: groups files into functional domains, detects layer patterns (MVC, layered, hexagonal, fullstack), and infers approximate bounded contexts. Optionally uses --graph-modules for more precise bounded context inference. |
--git-context / -g |
off | Include git context: recent commits, most-changed files (hotspots), uncommitted changes, contributors, and a natural-language summary. |
--git-depth INTEGER |
20 |
Number of recent commits to include with --git-context. Range: 1–100. |
--git-days INTEGER |
90 |
Time window in days for hotspot detection with --git-context. Range: 1–3650. |
--env-map |
off | Map all environment variables referenced in source code: key name, required/optional status, inferred type (string, int, bool, url, path, enum), functional category (database, auth, cache, storage, service, observability, feature_flag, server, general), default value when present, and source file locations. Supplements with descriptions from .env.example, .env.sample, and similar reference files. |
--code-notes |
off | Extract inline code annotations — TODO, FIXME, HACK, NOTE, DEPRECATED, WARNING, XXX, BUG, OPTIMIZE — with file path, line number, annotation text, and the nearest enclosing function or class. Also detects Architecture Decision Records (ADRs) in docs/decisions/, docs/adr/, adr/, and similar directories, extracting title, status, and summary. |
--depth INTEGER |
4 |
Maximum file tree depth. Range: 1–20. |
--no-redact |
off | Disable secret redaction (enabled by default). |
--version |
— | Show version and exit. |
Output Fields
The full schema (SourceMap) includes the following fields:
Always present
| Field | Type | Description |
|---|---|---|
metadata |
object | Schema version, timestamp, sourcecode version, and analyzed path. |
file_tree |
object | Repository tree where null represents a file and {} represents a directory. |
file_paths |
array | Flat list of all project paths derived from file_tree, with forward-slash separators. Always present; respects --depth. |
project_summary |
string|null | Deterministic natural-language description of the project. Includes: manifest/README description when available, detected architecture pattern (layered, mvc, hexagonal, fullstack), business domain names inferred from directory structure, entry points (when no domains are detected), and dependency count. Present when stacks are detected. |
architecture_summary |
string|null | Static summary of the main execution flow, orchestrated modules, and output produced by the project. Present when enough structural evidence is available. |
stacks |
array | Stack detections with confidence, frameworks, manifests, primary, root, workspace, and signals. |
project_type |
string|null | Overall project classification. |
entry_points |
array | Detected entry points by stack. |
With --dependencies
| Field | Type | Description |
|---|---|---|
dependencies |
array | Dependency records with declared and resolved versions, scope, and manifest path. |
dependency_summary |
object | Summary with ecosystem coverage, counts (total, direct, transitive), sources, and known limitations. |
key_dependencies |
array | Top-15 direct dependencies from manifest or lockfile sources, sorted by primary ecosystem first then alphabetically. Only populated when --dependencies is active. |
With --graph-modules
| Field | Type | Description |
|---|---|---|
module_graph |
object | Structural graph with nodes, edges, and analysis summary. |
module_graph_summary |
object | Compact graph summary (node/edge counts, layers, main flows, truncation status). |
With --docs
| Field | Type | Description |
|---|---|---|
docs |
array | Extracted DocRecord objects for each documented symbol. |
doc_summary |
object | Summary with total count, languages, depth used, truncation status, and limitations. |
With --env-map
| Field | Type | Description |
|---|---|---|
env_map |
array | One EnvVarRecord per detected environment variable. |
env_summary |
object | Summary: total count, required vs optional counts, categories present, and example files found. Also included in --compact output when the flag is active. |
Each EnvVarRecord:
| Field | Description |
|---|---|
key |
Environment variable name (e.g. DATABASE_URL). |
required |
true if the code reads the variable without a fallback default (os.environ["KEY"], process.env.KEY). false if a default was found (os.getenv("KEY", "default")). |
default |
Default value detected in code, or null if none. |
type_hint |
Inferred type: string, int, bool, url, path, or enum. Derived from the key name (e.g. _PORT → int, _URL → url, ENABLE_* → bool). |
category |
Functional group inferred from the key name: database, cache, storage, auth, service, observability, feature_flag, server, or general. |
description |
Description extracted from a comment above the variable in .env.example, or null. |
files |
Up to 10 "path:line" references where the variable is read in source code. Empty for variables found only in .env.example. |
Supported languages: Python (os.getenv, os.environ), JavaScript/TypeScript (process.env), Go (os.Getenv, os.LookupEnv), Ruby (ENV[], ENV.fetch), Java (System.getenv), PHP (getenv, $_ENV), Rust (env::var).
With --code-notes
| Field | Type | Description |
|---|---|---|
code_notes |
array | One CodeNote per annotation found in source files. |
code_adrs |
array | One AdrRecord per Architecture Decision Record detected. |
code_notes_summary |
object | Summary: total count, counts by kind, top files with most annotations, and ADR count. Also included in --compact output when the flag is active. |
Each CodeNote:
| Field | Description |
|---|---|
kind |
Annotation type: TODO, FIXME, HACK, NOTE, DEPRECATED, WARNING, XXX, BUG, or OPTIMIZE. |
path |
File path relative to project root. |
line |
1-based line number. |
text |
Annotation text (truncated to 200 characters). |
symbol |
Name of the nearest enclosing function or class found by scanning backward up to 25 lines. null at module level. |
Each AdrRecord:
| Field | Description |
|---|---|
path |
File path relative to project root. |
title |
Title extracted from the first heading (# ...) in the file. |
status |
Normalized status: accepted, proposed, deprecated, or superseded. null if no status line was found. |
summary |
First paragraph of body text after the title. null if not parseable. |
ADR detection looks for Markdown files in docs/decisions/, docs/adr/, adr/, decisions/, and architecture/decisions/, and for files with names matching ADR-*.md, 0001-*.md, or DECISION-*.md patterns.
With --git-context
| Field | Type | Description |
|---|---|---|
git_context |
object | Git context with recent commits, change hotspots, uncommitted changes, contributors, and a natural-language summary. |
git_context fields:
| Field | Description |
|---|---|
branch |
Current branch name. |
recent_commits |
Up to --git-depth commits (default 20). Each includes hash, message, author, date, and files_changed (capped at 10 per commit). |
change_hotspots |
Up to 20 files sorted by commit frequency within the --git-days window (default 90 days). Each includes file, commit_count, and last_changed. |
uncommitted_changes |
Object with staged, unstaged, and untracked file lists from git status. |
contributors |
Unique author names active within the --git-days window. |
git_summary |
Deterministic natural-language summary: branch, pending changes, top hotspots, and last commit. |
limitations |
List of errors or degraded analysis signals (e.g. no_git_repo, git_not_found). |
Compact Mode
--compact returns a reduced JSON view optimized for LLM prompts (~500-700 tokens). It excludes the full dependencies list, docs, and module_graph, while retaining the fields most useful for project orientation. When optional flags are also active, their summaries are included: dependency_summary (with --dependencies), env_summary (with --env-map), and code_notes_summary (with --code-notes).
Real output from a Python FastAPI project:
{
"schema_version": "1.0",
"project_type": "api",
"project_summary": "API en Python (FastAPI). Entry points: src/main.py. 12 dependencias (python).",
"architecture_summary": null,
"stacks": [
{
"stack": "python",
"detection_method": "manifest",
"confidence": "high",
"frameworks": [
{ "name": "FastAPI", "source": "package.json" }
],
"package_manager": null,
"manifests": ["pyproject.toml"],
"primary": true,
"root": ".",
"workspace": null,
"signals": ["manifest:pyproject.toml", "framework:FastAPI", "entry:src/main.py"]
}
],
"entry_points": [
{
"path": "src/main.py",
"stack": "python",
"kind": "cli",
"source": "manifest"
}
],
"file_tree_depth1": {
"pyproject.toml": null,
"src": {},
"tests": {}
},
"dependency_summary": null
}
When --compact --dependencies is used, dependency_summary is populated instead of null.
Docs Mode
--docs extracts docstrings, function signatures, and comments from Python and JS/TS source files. Each extracted record is a DocRecord:
{
"symbol": "create_user",
"kind": "function",
"language": "python",
"path": "src/users.py",
"doc_text": "Create a new user in the database.\n\nReturns the created user ID.",
"signature": "def create_user(name: str, email: str) -> int",
"source": "docstring",
"importance": "high",
"workspace": null
}
DocRecord fields
| Field | Description |
|---|---|
symbol |
Symbol name (module name, function name, class name, etc.). |
kind |
Kind of symbol: module, function, class, method, or similar. |
language |
Source language: python, javascript, typescript. |
path |
File path relative to the project root, forward-slash separated. |
doc_text |
Extracted docstring or comment text. null if unavailable. |
signature |
Function or class signature as found in source. null for modules. |
source |
How the doc was obtained: docstring, comment, or unavailable. Records with source="unavailable" are not emitted in docs[] — they appear only in doc_summary.limitations. |
importance |
Inferred priority: high, medium, or low. |
workspace |
Workspace path for monorepo packages, null for single-workspace projects. |
Importance inference rules
high: the file path matches a project entry point, or the module is at depth 1 in the file tree (e.g.,src/main.py).medium: the file is at depth 2, or the symbol kind isclassorfunction(not a method).low: methods and utilities in deeper subdirectories.
--docs-depth levels
module: extracts module-level docstrings only. One record per file.symbols(default): module-level plus top-level functions and classes.full: all of the above plus methods inside classes.
Output — Full Schema Examples
Dependencies
{
"dependencies": [
{
"name": "fastapi",
"ecosystem": "python",
"scope": "direct",
"declared_version": ">=0.115",
"resolved_version": "0.115.2",
"source": "lockfile",
"parent": null,
"manifest_path": "poetry.lock",
"workspace": null
},
{
"name": "starlette",
"ecosystem": "python",
"scope": "transitive",
"declared_version": null,
"resolved_version": "0.38.6",
"source": "lockfile",
"parent": "fastapi",
"manifest_path": "poetry.lock",
"workspace": null
}
],
"dependency_summary": {
"requested": true,
"total_count": 2,
"direct_count": 1,
"transitive_count": 1,
"ecosystems": ["python"],
"sources": ["lockfile"],
"limitations": []
}
}
Dependency analysis is offline and conservative: if a lockfile does not expose a reliable transitive graph, sourcecode reports direct dependencies and records the limitation instead of guessing.
Module Graph
{
"module_graph": {
"nodes": [
{
"id": "module:app",
"kind": "module",
"language": "python",
"path": "app",
"symbol": null,
"display_name": "app",
"workspace": null,
"importance": "high"
}
],
"edges": [],
"summary": {
"requested": true,
"node_count": 1,
"edge_count": 0,
"languages": ["python"],
"methods": ["ast"],
"main_flows": [],
"layers": ["app"],
"entry_points_count": 1,
"truncated": false,
"detail": "high",
"max_nodes_applied": 80,
"edge_kinds": ["imports"],
"limitations": []
}
},
"module_graph_summary": {
"requested": true,
"node_count": 1,
"edge_count": 0,
"main_flows": [],
"layers": ["app"],
"entry_points_count": 1,
"truncated": false,
"limitations": []
}
}
--graph-modules is tiered for LLM workflows:
high: summarized graph, modules only, imports only, directory collapsing when useful. Default.medium: balanced graph with key functions and selected call edges.full: full-fidelity graph, equivalent to exhaustive AST analysis.
Graph analysis is offline and conservative. sourcecode prefers partial but defensible edges over pretending to build a perfect semantic call graph, and records parse failures, unresolved imports, or analysis budgets in module_graph.summary.limitations.
Monorepo Support
In a monorepo, each stack includes its own root and workspace, and one of them is marked as primary. Entry point and doc record paths are prefixed with the workspace path so they are relative to the repository root.
{
"project_type": "monorepo",
"project_summary": "Monorepo con 2 workspaces en Node.js, Python.",
"stacks": [
{
"stack": "nodejs",
"primary": true,
"root": "apps/web",
"workspace": "apps/web"
},
{
"stack": "python",
"primary": false,
"root": "packages/api",
"workspace": "packages/api"
}
],
"entry_points": [
{ "path": "apps/web/app/page.tsx", "stack": "nodejs", "kind": "web", "source": "manifest" },
{ "path": "packages/api/main.py", "stack": "python", "kind": "cli", "source": "manifest" }
]
}
LLM Usage Tips
Different modes optimize for different tradeoffs between context size and depth of information.
--compact — minimal context (~500-700 tokens)
Best for: initial orientation, deciding what to explore next, fast handoffs between agents.
Includes: project_summary (instant project description), architecture_summary (execution-oriented static summary), stacks, entry_points, file_tree_depth1, and dependency_summary when --dependencies was also requested.
sourcecode --compact .
sourcecode --compact --dependencies .
Full output — deep analysis
Best for: thorough codebase understanding, architecture analysis, onboarding a new agent to an unfamiliar project.
sourcecode .
sourcecode --dependencies --graph-modules .
--docs --docs-depth symbols — API contracts
Best for: understanding what a module exports and how to call it, without reading source files. The default depth (symbols) covers top-level functions and classes — the most useful level for most agentic tasks.
sourcecode --docs .
sourcecode --docs --docs-depth full . # include methods
project_summary field
project_summary is always generated when stacks are detected. It provides instant project context without requiring the LLM to parse the full structure. The content adapts to what is detectable:
- If
pyproject.toml,package.json, or the README provides a description, it leads the summary; stack, architecture pattern, and domains are appended as context. - If the directory structure reveals a known architecture pattern (
layered,mvc,hexagonal,fullstack), it is included. - Business domain names (directories that are not architectural layers or generic utilities) replace entry points when two or more distinct domains are detected.
Example values:
"API en Python (FastAPI) con arquitectura layered. Dominios: auth, users, billing. 12 dependencias (python)."(structured project)"API en Python (FastAPI). Entry points: src/main.py. 12 dependencias (python)."(flat project, no domains detected)"Aplicacion web en Node.js (Next.js) con arquitectura mvc. Dominios: products, orders. 24 dependencias (nodejs).""Monorepo con 2 workspaces en Node.js, Python."
architecture_summary field
architecture_summary is a static 3-5 line summary oriented to execution flow. It answers what the main entry point does, which modules it orchestrates, and what the project produces. In compact mode it replaces the low-signal value that file_paths used to occupy.
key_dependencies field
When --dependencies is active, key_dependencies contains the top-15 direct dependencies sorted by primary ecosystem first. Use it to understand core library choices without scanning hundreds of transitive records.
--git-context — temporal project context
Best for: understanding recent activity before touching code, debugging regressions, onboarding to an active repository.
Answers questions a static analysis cannot: what changed recently, which files are actively maintained, whether there is uncommitted work in progress, and who is driving the project.
sourcecode --git-context .
sourcecode --git-context --git-depth 10 --git-days 30 .
git_summary condenses the key signals into a single line:
"Rama main. 3 cambios pendientes (staged: 1, unstaged: 2, untracked: 0). Archivos más activos: src/cli.py (18 commits), src/schema.py (14 commits). Último commit: 2026-04-22 — docs(13): add gap closure plan."
Combine with --compact for fast handoffs that include both static structure and recent activity:
sourcecode --compact --dependencies --git-context .
Note: git_context is excluded from the --compact token budget but is always present in full output when the flag is active.
--env-map — configuration surface
Best for: onboarding a new agent to an unfamiliar project, understanding what environment a service requires to run, reviewing configuration completeness before deployment.
Answers: what variables does this project expect, which ones are required vs optional, where are they read, and what type and category are they?
sourcecode --env-map .
sourcecode --compact --env-map . # configuration surface in ~700 tokens
Example output (excerpt):
{
"env_map": [
{
"key": "DATABASE_URL",
"required": true,
"default": null,
"type_hint": "url",
"category": "database",
"description": "PostgreSQL connection string",
"files": ["src/db.py:12", "src/config.py:5"]
},
{
"key": "LOG_LEVEL",
"required": false,
"default": "INFO",
"type_hint": "enum",
"category": "observability",
"description": null,
"files": ["src/logger.py:3"]
}
],
"env_summary": {
"requested": true,
"total": 14,
"required_count": 6,
"optional_count": 8,
"categories": ["auth", "database", "observability", "server"],
"example_files_found": [".env.example"]
}
}
--code-notes — technical debt and intent
Best for: understanding known issues before modifying code, identifying deprecated APIs, discovering design decisions embedded in comments, and locating ADRs when they exist.
Answers: what do the authors know is broken or suboptimal, what is explicitly marked for removal, what architectural decisions were recorded, and which areas carry the most annotation debt?
sourcecode --code-notes .
sourcecode --compact --code-notes . # debt overview in compact form
Example output (excerpt):
{
"code_notes": [
{
"kind": "FIXME",
"path": "src/payments.py",
"line": 42,
"text": "currency conversion is broken for EUR",
"symbol": "process_payment"
},
{
"kind": "DEPRECATED",
"path": "src/auth.py",
"line": 18,
"text": "use AuthService instead",
"symbol": "UserService"
}
],
"code_adrs": [
{
"path": "docs/decisions/0001-use-postgresql.md",
"title": "ADR-0001: Use PostgreSQL as primary database",
"status": "accepted",
"summary": "PostgreSQL was chosen for its JSONB support and strong ACID guarantees."
}
],
"code_notes_summary": {
"requested": true,
"total": 23,
"by_kind": {"TODO": 10, "FIXME": 7, "HACK": 3, "DEPRECATED": 2, "WARNING": 1},
"top_files": ["src/payments.py", "src/legacy.py"],
"adr_count": 3
}
}
Combine flags for a comprehensive project handoff:
sourcecode --compact --env-map --code-notes --git-context .
Development
Editable install with development dependencies:
pip install -e ".[dev]"
Local validation:
ruff check src tests
mypy src
pytest -q
Detailed schema reference: docs/schema.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sourcecode-0.18.0.tar.gz.
File metadata
- Download URL: sourcecode-0.18.0.tar.gz
- Upload date:
- Size: 162.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd96f5ad22b0bdc61bd086395afa24c0bb352b82a0f4009e027aef1860c103c0
|
|
| MD5 |
47fc9ac50aa90cd3077880b3757cb93f
|
|
| BLAKE2b-256 |
1a42f75dd3e9b0b594484a3cf73bad1ab444a24de2b35f283ee8b673cd7792a4
|
File details
Details for the file sourcecode-0.18.0-py3-none-any.whl.
File metadata
- Download URL: sourcecode-0.18.0-py3-none-any.whl
- Upload date:
- Size: 120.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
777e34c751386d08e917fad1612d15922aef0cbd28d00ce35f45e792717f3a02
|
|
| MD5 |
9ef9dd1e024549232c1d668f33ef0f87
|
|
| BLAKE2b-256 |
ee5b33b07f9c5d4750c70b0e4adf611e7e9e96dbe23e4a32b0094b02e69b0d17
|