Skip to main content

Language-agnostic code context extraction from Git diffs for LLM agents.

Project description

code-context-py

code-context-py turns Git and GitHub diffs into structured, LLM-ready code context.

It is designed for documentation agents, code review agents, changelog generators, release note tools and internal engineering assistants that need to understand what changed without dumping an entire repository into a prompt.

Features

  • Parses unified diffs and GitHub patch responses.
  • Loads changed files at the target commit/ref.
  • Includes exact file patch excerpts, including removed and added lines.
  • Extracts full enclosing blocks around changed lines.
  • Finds related same-file blocks through changed identifiers.
  • Follows local dependencies from imports, includes, templates and aliases.
  • Resolves common project aliases from tsconfig.json, jsconfig.json and Composer psr-4.
  • Searches repository-wide references when the provider can list files.
  • Detects symbols across language families with a generic adapter system.
  • Includes a WordPress-aware adapter for hooks, filters, CPTs, taxonomies, shortcodes, enqueued assets and template parts.
  • Produces a full debug JSON report when debug=True.
  • Includes the exact compact LLM pack inside the debug JSON.
  • Provides a configurable to_llm_pack(...) output that prioritizes the most important context.
  • Works without native parser dependencies, while allowing custom adapters for deeper language-specific analysis.

Installation

pip install code-context-py

For development from source:

git clone https://github.com/mlinaresweb/code-context-py.git
cd code-context-py
pip install -e .

Quick Start

from code_context_py import CallbackFileProvider, ExtractionOptions, build_context_from_diff

def load_file_at_ref(path: str) -> str | None:
    ...

provider = CallbackFileProvider(load_file_at_ref)

report = build_context_from_diff(
    diff_text,
    provider,
    options=ExtractionOptions(debug=True),
)

full_context = report.to_prompt()
llm_pack = report.to_llm_pack(max_chars=70000, max_block_chars=14000)

With debug=True, the library writes:

code-context-debug/{diff_hash}.json

The debug JSON contains the full structured report and the compact LLM pack generated from that report.

Local Git Provider

from code_context_py import LocalGitFileProvider, build_context_from_diff

provider = LocalGitFileProvider("/path/to/repo", "commit-sha")
report = build_context_from_diff(diff_text, provider)
pack = report.to_llm_pack(max_chars=70000)

LocalGitFileProvider can read files from a commit and list repository files, enabling dependency traversal and repository-wide reference search.

GitHub Commit

from code_context_py import build_context_from_github_commit

pack = build_context_from_github_commit(
    "owner/repo",
    "commit-sha",
    token="github-token",
    max_chars=70000,
)

Private GitHub Repositories

Private repositories work through the GitHub API as long as the token can read the repository contents.

You can pass the token explicitly:

from code_context_py import build_context_from_github_commit

pack = build_context_from_github_commit(
    "owner/private-repo",
    "commit-sha",
    token="github-token-with-repo-read-access",
    max_chars=70000,
)

Or set one of these environment variables:

export GITHUB_TOKEN="github-token-with-repo-read-access"
# or
export GH_TOKEN="github-token-with-repo-read-access"

Then call the API without passing token:

pack = build_context_from_github_commit("owner/private-repo", "commit-sha", max_chars=70000)

The token needs permission to read repository metadata and contents. For classic personal access tokens, use repo for private repositories. For fine-grained tokens, grant read access to Contents and Metadata for the target repositories.

You can check access first:

from code_context_py import can_access_github_repository

if not can_access_github_repository("owner/private-repo", token="github-token"):
    raise RuntimeError("Token cannot read this repository")

GitHub Compare

from code_context_py import build_context_from_github_compare

pack = build_context_from_github_compare(
    "owner/repo",
    "base-ref",
    "head-ref",
    token="github-token",
    max_chars=70000,
)

Debug Mode

Debug mode is enabled from code, not from environment variables:

ExtractionOptions(debug=True)
ExtractionOptions(debug=True, debug_dir="debug/code-context")
ExtractionOptions(debug=True, debug_path="debug/context.json")

The debug report includes:

  • changed files
  • exact patch excerpts
  • symbols
  • enclosing blocks
  • related blocks
  • dependency blocks
  • repository reference blocks
  • graph edges
  • warnings
  • the compact LLM pack
  • the LLM pack length

LLM Pack

to_prompt() returns the full extracted report. Use it for inspection or for very large context windows.

to_llm_pack(...) returns a prioritized, compact prompt pack for LLMs:

pack = report.to_llm_pack(
    max_chars=70000,
    max_block_chars=14000,
)

Priority order:

  1. Exact file patches.
  2. Changed enclosing blocks.
  3. Dependency blocks.
  4. Same-file related blocks.
  5. Repository-wide references.
  6. File preludes.

The pack includes a manifest and an omitted-section list when the budget is too small. The full context remains available in the debug JSON.

WordPress Support

The default adapter registry includes WordPress-aware extraction for PHP themes/plugins and mixed WordPress codebases.

It detects and relates context around:

  • add_action
  • add_filter
  • register_post_type
  • register_taxonomy
  • add_shortcode
  • wp_enqueue_script
  • wp_enqueue_style
  • get_template_part
  • locate_template
  • PHP require / include
  • theme templates
  • template parts
  • JS/CSS/SCSS assets referenced by changed code

This is still dependency-free. For highly specialized projects, you can register your own adapter.

Custom Adapters

from code_context_py import AdapterRegistry, build_context_from_diff

registry = AdapterRegistry()
registry.register(MyLanguageAdapter())

report = build_context_from_diff(diff_text, provider, adapters=registry)

Adapters can customize:

  • file support detection
  • enclosing range detection
  • related range detection
  • symbol extraction
  • import/template/dependency extraction
  • tokenization

CLI

code-context-py --diff change.patch --repo-path /path/to/repo --ref commit-sha
code-context-py --diff github --github-repo owner/repo --github-ref commit-sha --github-token "$GITHUB_TOKEN"
code-context-py --diff github-compare --github-repo owner/repo --github-base main --github-head feature --github-token "$GITHUB_TOKEN"

Useful options:

--max-chars 70000
--max-block-chars 14000
--max-changed-files 50
--max-blocks-per-file 8
--dependency-depth 2
--debug
--debug-dir code-context-debug
--debug-path debug/context.json
--llm-pack-max-chars 70000
--llm-pack-max-block-chars 14000

Publishing

python -m pip install --upgrade build twine
python -m build
python -m twine upload dist/*

Design Philosophy

No generic tool can provide perfect semantic call graphs for every programming language and framework without language-specific parsers. code-context-py is built to be robust in real mixed repositories by combining:

  • exact diff data
  • full changed blocks
  • dependency traversal
  • repository-wide reference search
  • framework-aware adapters
  • compact prompt packing
  • full debug inspection

This makes it useful immediately across many languages while keeping a clean path for deeper adapters where a project needs more precision.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_context_py-1.0.3.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_context_py-1.0.3-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file code_context_py-1.0.3.tar.gz.

File metadata

  • Download URL: code_context_py-1.0.3.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for code_context_py-1.0.3.tar.gz
Algorithm Hash digest
SHA256 b9bee9f9b7df8c63fbd78648fa2ebab1548ea07a102d3bb7854dd9f2c61901d1
MD5 0ab948f034f09c11ecda1c29cbe381ad
BLAKE2b-256 0a541a75b513288b1bc1cbe2258bdccb80b83f117ed0b73204c29b39a25fea9c

See more details on using hashes here.

File details

Details for the file code_context_py-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for code_context_py-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 300628f554d2cf7e9993cc89ca1b43760b4c4c4694bd7200b662a4118d5c4095
MD5 50c0624f6870fe8169d6f38cc5045dc2
BLAKE2b-256 e3d1a3ff93f56416d6f29777d6f18866aee2168af882ade5c3954e356e96a603

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page