Skip to main content

Build perfect LLM context from your Python codebase — automatically.

Project description

ctxeng — Python Context Engineering Library

Stop copy-pasting files into ChatGPT.
Build the perfect LLM context from your codebase, automatically.

PyPI CI Python License Downloads


Context engineering is the new prompt engineering. The quality of your LLM's output depends almost entirely on what you put in the context window — not how you phrase the question.

ctxeng solves this automatically:

  • Scans your codebase and scores every file for relevance to your query
  • Ranks by signal — keyword overlap, AST symbols, git recency, import graph
  • Fits the budget — smart truncation keeps the best parts within any model's token limit
  • Ships ready to paste — XML, Markdown, or plain text output that works with Claude, GPT-4o, Gemini, and every other model

Zero required dependencies. Works with any LLM.


Installation

pip install ctxeng

For accurate token counting (strongly recommended):

pip install "ctxeng[tiktoken]"

For one-line LLM calls:

pip install "ctxeng[anthropic]"    # Claude
pip install "ctxeng[openai]"       # GPT-4o
pip install "ctxeng[all]"          # everything

Quickstart

Python API

from ctxeng import ContextEngine

engine = ContextEngine(root=".", model="claude-sonnet-4")
ctx = engine.build("Fix the authentication bug in the login flow")

print(ctx.summary())
# Context summary (12,340 tokens / 197,440 budget):
#   Included : 8 files
#   Skipped  : 23 files (over budget)
#   [████████  ] 0.84  src/auth/login.py
#   [███████   ] 0.71  src/auth/middleware.py
#   [█████     ] 0.53  src/models/user.py
#   [████      ] 0.41  tests/test_auth.py
#   ...

# Paste directly into your LLM
print(ctx.to_string())

Fluent Builder API

from ctxeng import ContextBuilder

ctx = (
    ContextBuilder(root=".")
    .for_model("gpt-4o")
    .only("**/*.py")
    .exclude("tests/**", "migrations/**")
    .from_git_diff()                        # only changed files
    .with_system("You are a senior Python engineer. Be concise.")
    .build("Refactor the payment module to use async/await")
)

print(ctx.to_string("markdown"))

One-line LLM call

from ctxeng import ContextEngine
from ctxeng.integrations import ask_claude

engine = ContextEngine(".", model="claude-sonnet-4")
ctx = engine.build("Why is the test_login test failing?")

response = ask_claude(ctx)
print(response)

CLI

# Build context for a query and print to stdout
ctxeng build "Fix the auth bug"

# Focused on git-changed files only
ctxeng build "Review my changes" --git-diff

# Target a specific model with markdown output
ctxeng build "Refactor this" --model gpt-4o --fmt markdown

# Save to file
ctxeng build "Explain the payment flow" --output context.md

# Project stats
ctxeng info

How It Works

Your codebase                    ctxeng                      Your LLM
─────────────                ────────────────            ────────────────
src/auth/login.py  ─┐
src/models/user.py ─┤  1. Score files         2. Fit budget     <context>
src/api/routes.py  ─┼─► vs query + git  ─►   smart truncate ─► <file>...</file>
tests/test_auth.py ─┤     recency + AST        token-aware       <file>...</file>
...500 more files  ─┘                                           </context>

Scoring signals

Each file gets a relevance score from 0 → 1, combining:

Signal What it measures
Keyword overlap How many query terms appear in the file content
AST symbols Class/function/import names that match the query (Python)
Path relevance Filename and directory names matching query tokens
Git recency Files touched in recent commits score higher

Token budget optimization

Files are ranked by score and filled greedily into the token budget. Files that don't fit are smart-truncated (head + tail, never middle) rather than dropped entirely — the top of a file has imports and class defs; the tail has recent changes. Both are high-signal.


Examples

Debug a failing test

from ctxeng import ContextBuilder
from ctxeng.integrations import ask_claude

ctx = (
    ContextBuilder(".")
    .for_model("claude-sonnet-4")
    .include_files("tests/test_payment.py", "src/payment/service.py")
    .with_system("You are a Python debugging expert.")
    .build("test_charge_user is failing with a KeyError on 'amount'")
)
response = ask_claude(ctx)

Code review on a PR

# Only include what changed in this branch vs main
ctx = (
    ContextBuilder(".")
    .for_model("gpt-4o")
    .from_git_diff(base="main")
    .with_system("Do a thorough code review. Flag security issues first.")
    .build("Review this pull request")
)

Explain an unfamiliar codebase

from ctxeng import ContextEngine

engine = ContextEngine(
    root="/path/to/project",
    model="gemini-1.5-pro",  # 1M token window → include everything
)
ctx = engine.build("Give me a high-level architecture overview")
print(ctx.to_string())

Targeted refactor

ctx = (
    ContextBuilder(".")
    .for_model("claude-sonnet-4")
    .only("src/database/**/*.py")
    .exclude("**/*_test.py")
    .build("Convert all raw SQL queries to use SQLAlchemy ORM")
)

API Reference

ContextEngine

ContextEngine(
    root=".",               # Project root
    model="claude-sonnet-4",# Sets token budget automatically
    budget=None,            # Or explicit TokenBudget(total=50_000)
    max_file_size_kb=500,   # Skip files larger than this
    include_patterns=None,  # ["**/*.py"] — only these files
    exclude_patterns=None,  # ["tests/**"] — skip these
    use_git=True,           # Use git recency signal
)
engine.build(
    query="",               # What you want the LLM to do
    files=None,             # Explicit list of paths (skips auto-discovery)
    git_diff=False,         # Only changed files
    git_base="HEAD",        # Diff base ref
    system_prompt="",       # System prompt (counts against budget)
    fmt="xml",              # "xml" | "markdown" | "plain"
)
# → Context

ContextBuilder (fluent API)

ContextBuilder(root=".")
    .for_model("gpt-4o")
    .with_budget(total=50_000, reserved_output=4096)
    .only("**/*.py", "**/*.yaml")
    .exclude("tests/**", "migrations/**")
    .include_files("src/specific.py")
    .from_git_diff(base="main")
    .with_system("You are an expert Python engineer.")
    .max_file_size(200)     # KB
    .no_git()
    .build("query")
# → Context

Context

ctx.to_string(fmt="xml")    # → str ready to paste into an LLM
ctx.summary()               # → human-readable summary with token counts
ctx.files                   # → list[ContextFile], sorted by relevance
ctx.skipped_files           # → files that didn't fit the budget
ctx.total_tokens            # → estimated token usage
ctx.budget.available        # → remaining token budget

TokenBudget

TokenBudget.for_model("claude-sonnet-4")  # auto-detect limit
TokenBudget(total=50_000, reserved_output=2048, reserved_system=512)

Supported models (auto-detected): claude-opus-4, claude-sonnet-4, claude-haiku-4, gpt-4o, gpt-4-turbo, gpt-4, gpt-3.5-turbo, gemini-1.5-pro, gemini-1.5-flash, llama-3.


CLI Reference

ctxeng [--root PATH] <command> [options]

Commands:
  build   Build context for a query
  info    Show project info and file stats

build options:
  --model, -m     Target model (default: claude-sonnet-4)
  --fmt, -f       Output format: xml | markdown | plain (default: xml)
  --output, -o    Write to file instead of stdout
  --only          Glob patterns to include
  --exclude       Glob patterns to exclude
  --files         Explicit file list
  --git-diff      Only include git-changed files
  --git-base      Git base ref (default: HEAD)
  --system        System prompt text
  --budget        Override total token budget
  --no-git        Disable git recency scoring
  --max-size      Max file size in KB (default: 500)

Supported Models

Model Context window Auto-detected
claude-opus-4, claude-sonnet-4, claude-haiku-4 200K
gpt-4o, gpt-4-turbo 128K
gpt-4 8K
gpt-3.5-turbo 16K
gemini-1.5-pro, gemini-1.5-flash 1M
llama-3 32K
any other 32K (safe default)

Why not just paste files manually?

You could. But you'll hit these problems immediately:

  • Token limit errors — too many files, context overflows
  • Irrelevant noise — wrong files dilute signal, hurt output quality
  • Stale context — you forget to update when code changes
  • Manual effort — figuring out which files matter takes time

ctxeng solves all four. The right files, in the right order, trimmed to fit, every time.


Roadmap

  • Semantic similarity scoring (optional embedding model)
  • ctxeng watch — auto-rebuild context on file changes
  • VSCode extension
  • Import graph analysis (include files imported by relevant files)
  • .ctxengignore file support
  • Streaming context into LLM APIs

Contributing

PRs welcome! See CONTRIBUTING.md.

git clone https://github.com/sayeem3051/python-context-engineer
cd python-context-engineer
pip install -e ".[dev]"
pytest

License

MIT. Use freely, modify as needed, contribute back if you can.


If ctxeng saved you time, please ⭐ the repo — it helps others find it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxeng-0.1.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctxeng-0.1.0-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file ctxeng-0.1.0.tar.gz.

File metadata

  • Download URL: ctxeng-0.1.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ctxeng-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c461a27358a57b731301fe41cec46a5396640206684f897b99308c4889fc417b
MD5 642439160fa58b947dc3ff28d31e7c4f
BLAKE2b-256 0f4780e35d3f816f6d45aa38f303a7cbb315e06dc34a6cbf57d8c9bef4d7ada1

See more details on using hashes here.

File details

Details for the file ctxeng-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ctxeng-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for ctxeng-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49288c5318aa7064e86a4a08582329bc939fa826577e2b7f2128e3b00801211b
MD5 31e37d087a33de0c3624b70ddc3e9767
BLAKE2b-256 2bcefedfd75ed028a0767fd6bb56be8e8686247c3a9858ab12379bf56c84dce8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page