Language-agnostic code context extraction from Git diffs for LLM agents.
Project description
code-context-py
code-context-py turns Git and GitHub diffs into structured, LLM-ready code context.
It is designed for documentation agents, code review agents, changelog generators, release note tools and internal engineering assistants that need to understand what changed without dumping an entire repository into a prompt.
Features
- Parses unified diffs and GitHub patch responses.
- Loads changed files at the target commit/ref.
- Includes exact file patch excerpts, including removed and added lines.
- Extracts full enclosing blocks around changed lines.
- Finds related same-file blocks through changed identifiers.
- Follows local dependencies from imports, includes, templates and aliases.
- Resolves common project aliases from
tsconfig.json,jsconfig.jsonand Composerpsr-4. - Searches repository-wide references when the provider can list files.
- Detects symbols across language families with a generic adapter system.
- Includes a WordPress-aware adapter for hooks, filters, CPTs, taxonomies, shortcodes, enqueued assets and template parts.
- Produces a full debug JSON report when
debug=True. - Includes the exact compact LLM pack inside the debug JSON.
- Provides a configurable
to_llm_pack(...)output that prioritizes the most important context. - Works without native parser dependencies, while allowing custom adapters for deeper language-specific analysis.
Installation
pip install code-context-py
For development from source:
git clone https://github.com/mlinaresweb/code-context-py.git
cd code-context-py
pip install -e .
Quick Start
from code_context_py import CallbackFileProvider, ExtractionOptions, build_context_from_diff
def load_file_at_ref(path: str) -> str | None:
...
provider = CallbackFileProvider(load_file_at_ref)
report = build_context_from_diff(
diff_text,
provider,
options=ExtractionOptions(debug=True),
)
full_context = report.to_prompt()
llm_pack = report.to_llm_pack(max_chars=70000, max_block_chars=14000)
With debug=True, the library writes:
code-context-debug/{diff_hash}.json
The debug JSON contains the full structured report and the compact LLM pack generated from that report.
Local Git Provider
from code_context_py import LocalGitFileProvider, build_context_from_diff
provider = LocalGitFileProvider("/path/to/repo", "commit-sha")
report = build_context_from_diff(diff_text, provider)
pack = report.to_llm_pack(max_chars=70000)
LocalGitFileProvider can read files from a commit and list repository files, enabling dependency traversal and repository-wide reference search.
GitHub Commit
from code_context_py import build_context_from_github_commit
pack = build_context_from_github_commit(
"owner/repo",
"commit-sha",
token="github-token",
max_chars=70000,
)
Private GitHub Repositories
Private repositories work through the GitHub API as long as the token can read the repository contents.
You can pass the token explicitly:
from code_context_py import build_context_from_github_commit
pack = build_context_from_github_commit(
"owner/private-repo",
"commit-sha",
token="github-token-with-repo-read-access",
max_chars=70000,
)
Or set one of these environment variables:
export GITHUB_TOKEN="github-token-with-repo-read-access"
# or
export GH_TOKEN="github-token-with-repo-read-access"
Then call the API without passing token:
pack = build_context_from_github_commit("owner/private-repo", "commit-sha", max_chars=70000)
The token needs permission to read repository metadata and contents. For classic personal access tokens, use repo for private repositories. For fine-grained tokens, grant read access to Contents and Metadata for the target repositories.
You can check access first:
from code_context_py import can_access_github_repository
if not can_access_github_repository("owner/private-repo", token="github-token"):
raise RuntimeError("Token cannot read this repository")
GitHub Compare
from code_context_py import build_context_from_github_compare
pack = build_context_from_github_compare(
"owner/repo",
"base-ref",
"head-ref",
token="github-token",
max_chars=70000,
)
Debug Mode
Debug mode is enabled from code, not from environment variables:
ExtractionOptions(debug=True)
ExtractionOptions(debug=True, debug_dir="debug/code-context")
ExtractionOptions(debug=True, debug_path="debug/context.json")
The debug report includes:
- changed files
- exact patch excerpts
- symbols
- enclosing blocks
- related blocks
- dependency blocks
- repository reference blocks
- graph edges
- warnings
- the compact LLM pack
- the LLM pack length
LLM Pack
to_prompt() returns the full extracted report. Use it for inspection or for very large context windows.
to_llm_pack(...) returns a prioritized, compact prompt pack for LLMs:
pack = report.to_llm_pack(
max_chars=70000,
max_block_chars=14000,
)
Priority order:
- Exact file patches.
- Changed enclosing blocks.
- Dependency blocks.
- Same-file related blocks.
- Repository-wide references.
- File preludes.
The pack includes a manifest and an omitted-section list when the budget is too small. The full context remains available in the debug JSON.
WordPress Support
The default adapter registry includes WordPress-aware extraction for PHP themes/plugins and mixed WordPress codebases.
It detects and relates context around:
add_actionadd_filterregister_post_typeregister_taxonomyadd_shortcodewp_enqueue_scriptwp_enqueue_styleget_template_partlocate_template- PHP
require/include - theme templates
- template parts
- JS/CSS/SCSS assets referenced by changed code
This is still dependency-free. For highly specialized projects, you can register your own adapter.
Custom Adapters
from code_context_py import AdapterRegistry, build_context_from_diff
registry = AdapterRegistry()
registry.register(MyLanguageAdapter())
report = build_context_from_diff(diff_text, provider, adapters=registry)
Adapters can customize:
- file support detection
- enclosing range detection
- related range detection
- symbol extraction
- import/template/dependency extraction
- tokenization
CLI
code-context-py --diff change.patch --repo-path /path/to/repo --ref commit-sha
code-context-py --diff github --github-repo owner/repo --github-ref commit-sha --github-token "$GITHUB_TOKEN"
code-context-py --diff github-compare --github-repo owner/repo --github-base main --github-head feature --github-token "$GITHUB_TOKEN"
Useful options:
--max-chars 70000
--max-block-chars 14000
--max-changed-files 50
--max-blocks-per-file 8
--dependency-depth 2
--debug
--debug-dir code-context-debug
--debug-path debug/context.json
--llm-pack-max-chars 70000
--llm-pack-max-block-chars 14000
Publishing
python -m pip install --upgrade build twine
python -m build
python -m twine upload dist/*
Design Philosophy
No generic tool can provide perfect semantic call graphs for every programming language and framework without language-specific parsers. code-context-py is built to be robust in real mixed repositories by combining:
- exact diff data
- full changed blocks
- dependency traversal
- repository-wide reference search
- framework-aware adapters
- compact prompt packing
- full debug inspection
This makes it useful immediately across many languages while keeping a clean path for deeper adapters where a project needs more precision.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file code_context_py-1.0.3.tar.gz.
File metadata
- Download URL: code_context_py-1.0.3.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9bee9f9b7df8c63fbd78648fa2ebab1548ea07a102d3bb7854dd9f2c61901d1
|
|
| MD5 |
0ab948f034f09c11ecda1c29cbe381ad
|
|
| BLAKE2b-256 |
0a541a75b513288b1bc1cbe2258bdccb80b83f117ed0b73204c29b39a25fea9c
|
File details
Details for the file code_context_py-1.0.3-py3-none-any.whl.
File metadata
- Download URL: code_context_py-1.0.3-py3-none-any.whl
- Upload date:
- Size: 24.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
300628f554d2cf7e9993cc89ca1b43760b4c4c4694bd7200b662a4118d5c4095
|
|
| MD5 |
50c0624f6870fe8169d6f38cc5045dc2
|
|
| BLAKE2b-256 |
e3d1a3ff93f56416d6f29777d6f18866aee2168af882ade5c3954e356e96a603
|