Skip to main content

Deterministic codebase context for AI coding agents

Project description

sourcecode

Deterministic, behavior-aware codebase context for AI agents and PR review.

Version Python


What is it?

sourcecode analyzes a repository and produces structured JSON or YAML designed to be fed directly to AI agents or language models. It solves the "stuff the whole repo into the prompt" problem by extracting a deterministic, high-signal summary: stack detection, entry points, dependencies, git hotspots, inline annotations, and confidence metadata.

For PR review specifically, sourcecode extracts execution paths: ordered chains from entry point through service to data access, with runtime signals (auth guards, cache short-circuits, async execution) anchored to the specific step where they affect behavior. A reviewer sees what the system does under this change, not just which files changed.

Optimized for Java/Spring Boot monorepos. Works on any codebase.


Installation

Homebrew (macOS / Linux)

brew tap haroundominique/sourcecode
brew install sourcecode

pip / pipx

pip install sourcecode
# or with isolation:
pipx install sourcecode

Verify

sourcecode version
# sourcecode 1.31.13

Quickstart

# High-signal summary (1000–3000 tokens depending on repo size) — recommended starting point
sourcecode --compact

# Add git hotspots and uncommitted file count
sourcecode --compact --git-context

# Analyze a specific path
sourcecode /path/to/repo --compact

# Copy result to clipboard
sourcecode --compact --copy

# Structured output for AI agents (identity, entry points, dependencies, confidence)
sourcecode --agent

# Only process git-modified files (forces compact output)
sourcecode --changed-only

Example output for a Spring Boot project (--compact):

{
  "project_type": "api",
  "stacks": [{ "stack": "java", "detection_method": "manifest", "confidence": "high",
               "primary": true, "frameworks": ["Spring Boot", "MyBatis"] }],
  "entry_points": {
    "bootstrap": ["src/main/java/io/spring/RealWorldApplication.java"],
    "security":  ["src/main/java/io/spring/api/security/WebSecurityConfig.java"],
    "controllers": { "count": 8, "sample": ["src/main/java/io/spring/api/ArticleApi.java"] }
  },
  "key_dependencies": [
    { "name": "org.mybatis.spring.boot:mybatis-spring-boot-starter",
      "version": "2.2.2", "risk_flags": ["spring-boot-2.x-eol"] }
  ],
  "language_version": "11",
  "deployment": { "spring_boot_version": "2.6.3", "packaging": "jar" },
  "mybatis": { "mapper_interfaces": 4, "xml_files": 4 },
  "confidence_summary": { "overall": "high", "stack": "high", "entry_points": "high" }
}

Flags reference

Flag Alias Default Description
--compact off High-signal summary (1000–3000 tokens): stacks, entry points, dependencies, risk flags, confidence, gaps. Includes security_surface, mybatis, and transactional_boundaries for Java projects.
--agent off Structured noise-free JSON for AI agents: identity, entry points, dependencies, confidence, gaps. Auto-enables dependency, env-var, and code-notes analysis.
--full off Remove truncation limits on transactional_boundaries, mybatis.dto_mappers, and other capped lists.
--git-context -g off Include git activity: recent commits, change hotspots, and uncommitted changes.
--changed-only off Limit output to git-modified files (staged, unstaged, untracked). Forces compact output.
--depth 4 File tree traversal depth (1–20). Java/Maven projects auto-adjust to 12.
--format -f json Output format: json or yaml.
--output -o stdout Write output to a file instead of stdout.
--copy -c off Copy output to clipboard after a successful run. No-op when --output is set or clipboard is unavailable.
--no-redact off Disable automatic secret redaction. Output may contain sensitive values.
--version -v Show version and exit.

prepare-context — task-specific context

Generates a focused context bundle for a specific AI coding task. More targeted than --compact: each task re-ranks files according to its own signal priorities.

sourcecode prepare-context TASK [PATH] [OPTIONS]

Tasks

Task What it surfaces Primary use
explain Architecture, entry points, key dependencies Onboarding an LLM to a new project
onboard Full structural context: entry points, architecture, key files, dependencies New developer or agent joining the codebase
fix-bug Files ranked by risk (annotations, churn, uncommitted changes), suspected areas Debugging session
refactor Structural problems, improvement opportunities, high-annotation files Code quality review
generate-tests Source files without test pairs, coverage gap analysis Writing missing tests
review-pr Execution paths with per-step runtime signals, security/transactional impact, test coverage gaps Pre-merge behavior review
delta Changed files with multi-hop impact analysis, structural import graph, system-level impact summary Incremental CI/review context

Options

Option Description
--since REF Git ref for delta task (e.g. HEAD~3, main, v1.2.0). Required for delta; ignored for other tasks.
--symptom TEXT (fix-bug only) Keyword hint for the bug — boosts matching files and surfaces related code notes.
--format TEXT Output format: json (default) | github-comment (Markdown PR comment, review-pr only).
--llm-prompt Append a ready-to-use LLM prompt to the output.
--dry-run Show what would be analyzed without running it.
--copy / -c Copy output to clipboard after a successful run.
--output / -o Write output to a file.
--task-help List all tasks with descriptions and exit.

Examples

# Explain the current repo
sourcecode prepare-context explain

# Focus on bug-prone files, with a symptom hint
sourcecode prepare-context fix-bug --symptom "NullPointerException in OrderService"

# Incremental context: files changed since branch diverged from main
sourcecode prepare-context delta . --since main

# Onboard with a ready-to-paste LLM prompt
sourcecode prepare-context onboard --llm-prompt

# PR analysis as a GitHub Markdown comment (paste directly into PR)
sourcecode prepare-context review-pr --since main --format github-comment

# List all tasks
sourcecode prepare-context --task-help

delta — incremental impact analysis

The delta task is the recommended mode for CI pipelines and PR reviews. It goes beyond listing changed files: it builds a structural import graph and propagates impact transitively up to 3 hops.

sourcecode prepare-context delta [PATH] --since REF

Output fields:

Field Description
changed_files Files modified in the git range
relevant_files Changed files + files pulled in by the import graph (scored by artifact type and hop distance)
impact_summary Human-readable summary: artifact types changed and active risk areas
affected_modules DDD domain modules touched by the change
risk_areas Per-area severity breakdown (security, api, persistence, etc.)
change_type Closed taxonomy: behavioral_change, structural_change, configuration_change, dependency_change, security_change
system_impact Subsystems affected, behavioral changes, runtime impact notes
dependency_graph_summary Verified structural import edges (hop 1–3) and propagation_depth. Only real imports — no heuristics, no test files.
impact_score_per_file Per-file numeric impact score (0–1)
since The git ref used
gaps What the analysis could not determine

How the import graph works:

  1. Each changed file is classified by artifact type (controller, service, repository, security, spring_config, etc.).
  2. A BFS traversal walks the import graph repo-wide (not restricted to the same module), up to 3 hops deep.
  3. dependency_graph_summary.edges only contains verified import / @Autowired / constructor-injection relationships. Test files and heuristic proximity matches are excluded from edges (they appear in relevant_files only if they have real imports of changed files).
  4. Score decays 30% per hop: a directly-changed SecurityConfig.java scores 0.90; its direct importer scores 0.63; a transitive importer scores 0.44.
# Changed service → controller → facade (3 hops)
sourcecode prepare-context delta . --since main

# Output includes:
# dependency_graph_summary.edges:
#   hop-1: OrderService.java → OrderRepository.java
#   hop-2: OrderRepository.java → OrderController.java
#   hop-3: OrderController.java → OrderFacade.java
# propagation_depth: 3

review-pr — behavior-aware PR analysis

Extracts execution paths: ordered chains from entry point through service to data access layer, with runtime signals anchored to the specific step where they affect behavior.

sourcecode prepare-context review-pr [PATH] --since REF
# or against uncommitted working-tree changes:
sourcecode prepare-context review-pr

execution_paths schema:

{
  "execution_paths": [
    {
      "name": "Order",
      "entry_point": {
        "step": "OrderController.createOrder",
        "notes": [
          { "note": "condition: authorization check present (@PreAuthorize / @Secured)",
            "epistemic_level": "STRUCTURAL SIGNAL" }
        ]
      },
      "path": [
        {
          "step": "ShippingService.process",
          "notes": [
            { "note": "branch: Spring cache annotation present — downstream call may be short-circuited",
              "epistemic_level": "STRUCTURAL SIGNAL" },
            { "note": "async: @Async annotation present — runs in separate thread",
              "epistemic_level": "STRUCTURAL SIGNAL" }
          ]
        },
        { "step": "OrderRepository.save", "notes": [] }
      ],
      "end_state": "DB write",
      "end_state_epistemic_level": "INFERRED (LOW CONFIDENCE)"
    }
  ]
}

Path rules:

  • One path per changed entry point — most-evident downstream call, not all branches
  • Each step requires direct code evidence: field injection, constructor param, method call, or type annotation
  • notes are scanned from that step's own source file — no cross-contamination between steps
  • Path terminates where evidence ends; no gap-filling by naming convention or module similarity

Runtime signals detected per step:

Signal Example code Note emitted Epistemic level
Auth guard @PreAuthorize, @Secured condition: authorization check present (@PreAuthorize / @Secured) STRUCTURAL SIGNAL
Auth context read isAuthenticated(), SecurityContextHolder condition: reads authentication context STRUCTURAL SIGNAL
Feature flag featureFlag.isEnabled(), FeatureToggle condition: feature flag gates execution INFERRED (LOW CONFIDENCE)
Null/empty guard if (x == null) return condition: null/empty guard with early return STRUCTURAL SIGNAL
Spring cache @Cacheable, @CacheEvict branch: Spring cache annotation present — downstream call may be short-circuited STRUCTURAL SIGNAL
Manual cache cache.get(), cacheManager. branch: manual cache lookup detected — downstream call may be short-circuited INFERRED (LOW CONFIDENCE)
Optional absence Optional<>, .orElseThrow() branch: Optional type in use — result may be absent STRUCTURAL SIGNAL
Async thread @Async async: @Async annotation present — runs in separate thread STRUCTURAL SIGNAL
CompletableFuture CompletableFuture, .supplyAsync() async: CompletableFuture detected — non-blocking execution STRUCTURAL SIGNAL
Event publishing publishEvent(), applicationEventPublisher async: Spring application event emitted STRUCTURAL SIGNAL
Kafka kafkaTemplate., KafkaProducer async: Kafka producer detected STRUCTURAL SIGNAL
RabbitMQ rabbitTemplate., amqpTemplate. async: RabbitMQ producer detected STRUCTURAL SIGNAL

Epistemic contract:

Every output field in review-pr carries an explicit epistemic_level:

Level Meaning
FACT Directly observed in diff (file present, config changed)
STRUCTURAL SIGNAL Annotation or type-system evidence in source (@Service, @Transactional, injection)
INFERRED (LOW CONFIDENCE) Heuristic pattern match — no full structural proof
OMITTED Insufficient evidence — field not emitted

No field blends certainty levels without labeling. end_state (e.g. "DB write") is always accompanied by end_state_epistemic_level: "INFERRED (LOW CONFIDENCE)" — it is a keyword-match heuristic, not an AST-verified fact.

Other review-pr output fields:

Field Description
review_hotspots Top changed files ranked by impact score
suggested_review_order Security → API → Service → Persistence → Config
security_impact Changed security-classified files (epistemic_level: STRUCTURAL SIGNAL) + risk note (INFERRED (LOW CONFIDENCE))
transactional_impact Changed service/business-logic files with possible transaction boundary effect
test_coverage_risk Changed source files with no corresponding test (epistemic_level: INFERRED (LOW CONFIDENCE))
affected_modules DDD domain modules touched by the change

Output schema

All outputs include a confidence_summary block with overall, stack, and entry_points confidence levels (high / medium / low), plus an analysis_gaps list describing what could not be analyzed and why.

Java/Spring-specific fields

When a Java manifest (pom.xml or build.gradle) is detected, the output includes additional fields:

Field Description
language_version Java version from maven.compiler.source or equivalent
deployment.spring_boot_version Spring Boot version
deployment.packaging jar or war
deployment.app_server_hint weblogic, wildfly, etc. (when detectable)
security_surface.resource_names Values of @M3FiltroSeguridad(nombreRecurso=...) annotations across all controllers
mybatis Mapper interface / XML file pairing summary
transactional_boundaries Classes annotated with @Transactional
deployment_risks Static risk flags: spring-boot-2.x-eol, legacy-java-runtime, legacy-app-server-deployment

Telemetry

Anonymous, opt-in telemetry collects: version, OS, commands used, flags, duration, repo size range, and errors. No source code, paths, secrets, or output content is ever collected.

sourcecode telemetry status    # current setting
sourcecode telemetry enable    # opt in
sourcecode telemetry disable   # opt out (permanent)

Alternatively, set the environment variable:

export SOURCECODE_TELEMETRY=0

Configuration

sourcecode config    # show version, config file path, telemetry status

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourcecode-1.31.13.tar.gz (543.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourcecode-1.31.13-py3-none-any.whl (389.1 kB view details)

Uploaded Python 3

File details

Details for the file sourcecode-1.31.13.tar.gz.

File metadata

  • Download URL: sourcecode-1.31.13.tar.gz
  • Upload date:
  • Size: 543.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sourcecode-1.31.13.tar.gz
Algorithm Hash digest
SHA256 6a02dd6a3f7016097fbe0705498c0df16e7a5d9df29bd935151b817b0df0973a
MD5 3dc91f451d59664e02fa518fc4ba60a1
BLAKE2b-256 5e145178d18eaf5b5e90f9d07ff3fdf317f7cefa2bbfec22b226606e7308c4fb

See more details on using hashes here.

File details

Details for the file sourcecode-1.31.13-py3-none-any.whl.

File metadata

  • Download URL: sourcecode-1.31.13-py3-none-any.whl
  • Upload date:
  • Size: 389.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for sourcecode-1.31.13-py3-none-any.whl
Algorithm Hash digest
SHA256 9ea5ef399e719a00e4b3dd41c96a94fca6365b68f2c0a830dc444c11ac999509
MD5 50fcc23f90d82a3cbea5671b6cf7195d
BLAKE2b-256 88e3bc078d4ac43e5a371e3aad1829ac7bd13b5ac58427419b29496998b99890

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page