Skip to main content

Codebase context compiler for AI agents

Project description

codectx

PyPI Downloads Python License

A CLI tool that compiles repository context for AI agents — ranking files by importance, compressing them into structured summaries, and emitting a single markdown document optimized for LLM reasoning.

The problem

Feeding a raw repository to an AI agent wastes context. Most files are noise — tests, docs, boilerplate, lockfiles. The signal (core architecture, key abstractions, dependency structure) is buried.

Naive approaches either blow past context limits or arbitrarily truncate. Neither helps an agent reason about a codebase.

How codectx works

codectx treats context generation as a compilation step:

  1. Scans the repository and builds a dependency graph
  2. Scores every file by fan-in centrality, git commit frequency, and entry-point proximity
  3. Assigns tiers — top 15% get structured summaries, next 30% get signatures, rest get one-liners
  4. Enforces a token budget, highest-signal files first
  5. Emits a structured CONTEXT.md an agent can reason from immediately

The output is not a source dump. Core files get AST-derived structured summaries: purpose, internal dependencies, public types, and function signatures — at roughly 10% of the token cost of the raw source.

Benchmark

Tested against five well-known Python repos. Naive baseline counts all source files excluding tests, docs, and examples — the same file set codectx analyzes.

Repo Naive tokens codectx tokens Reduction
fastapi 224k 89k 60%
requests 41k 7k 83%
typer 80k 36k 54%
rich 354k 28k 92%
httpx 64k 6k 91%

Average reduction: 76%. Every repo fits within a 128k context window. Naive baseline exceeds it for fastapi and rich.

Token Comparison

Repos with lower reduction (fastapi, typer) contain large entry point files that codectx includes as full source by design — the CLI surface is exactly what an agent needs to see in full.

Installation

Requires Python 3.10+.

pip install codectx
uv add codectx
pipx install codectx

Quick start

# analyze the current repository
codectx analyze .

# adjust token budget
codectx analyze . --tokens 60000

# custom output path
codectx analyze . --output my-context.md

# watch mode — regenerate on file changes
codectx watch .

# semantic search ranking
codectx analyze . --query "authentication flows"

# task-specific ranking profiles
codectx analyze . --task architecture
codectx analyze . --task refactor
codectx analyze . --task debug
codectx analyze . --task feature

Output

CONTEXT.md is structured with fixed sections in a consistent order:

Section Content
ARCHITECTURE Auto-generated project description and subsystem map
ENTRY_POINTS Full source for entry point files (cli.py, main.py, etc.)
SYMBOL_INDEX All public symbols across Tier 1 and Tier 2 files
IMPORTANT_CALL_PATHS Execution paths traced from entry points
CORE_MODULES AST-driven structured summaries for highest-ranked files
SUPPORTING_MODULES Function signatures and docstrings for mid-ranked files
DEPENDENCY_GRAPH Mermaid diagram of module relationships, cyclic deps flagged
RANKED_FILES Every file's score, tier, and token cost — fully auditable
PERIPHERY One-line summaries for remaining files

A structured summary for a core module looks like this:

### `src/myapp/core/engine.py`

**Purpose:** Main execution engine for request processing.

**Depends on:** `core.models`, `core.cache`, `utils.retry`

**Types:**
- `Engine` — methods: process, shutdown, health_check

**Functions:**
- `def process(request: Request, timeout: float = 30.0) -> Response`
  Handle a single request through the full processing pipeline.
- `def shutdown(graceful: bool = True) -> None`
  Stop accepting new requests and drain the queue.

Configuration

Create .codectx.toml in your project root:

[codectx]
token_budget = 120000
output_file = "CONTEXT.md"
extra_ignore = ["**/generated/**", "*.draft.py"]

CLI flags override config file values. Supported task profiles for --task: default, debug, feature, architecture, refactor.

Supported languages

Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, Ruby. Language detection is automatic from file extension via tree-sitter.

How it works

Repository
    ↓
[Walker]      → scan files, apply .gitignore + .ctxignore
    ↓
[Parser]      → extract imports and symbols via tree-sitter
    ↓
[Graph]       → build dependency graph with rustworkx
    ↓
[Ranker]      → score files by fan-in, git frequency, entry proximity
    ↓
[Compressor]  → assign tiers, enforce token budget
    ↓
[Summarizer]  → generate AST-driven structured summaries
    ↓
[Formatter]   → emit structured markdown
    ↓
CONTEXT.md

See ARCHITECTURE.md for detail on each stage. See DECISIONS.md for reasoning behind key design choices.

Development setup

git clone https://github.com/hey-granth/codectx.git
cd codectx
uv sync

Or with pip:

pip install -e ".[dev]"

Tests

pytest
pytest --cov=src/codectx   # with coverage

Type checking and linting

mypy src
ruff check src tests
ruff format src tests

Contributing

Issues and pull requests are welcome. The project prioritizes correctness, performance, and maintainability. File an issue before starting significant work.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codectx-0.2.0.tar.gz (926.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codectx-0.2.0-py3-none-any.whl (54.9 kB view details)

Uploaded Python 3

File details

Details for the file codectx-0.2.0.tar.gz.

File metadata

  • Download URL: codectx-0.2.0.tar.gz
  • Upload date:
  • Size: 926.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for codectx-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0665499a7ee25f419bf8e112ffec3c196c2906c08d0751d84f106f1860f5b446
MD5 5eb305bfd6c2aa22269cea58537079db
BLAKE2b-256 0d5e163d4d85a51de967558a4b33e56763d72aa9e31cebb045f2fb60fe702b42

See more details on using hashes here.

File details

Details for the file codectx-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: codectx-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 54.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for codectx-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b3176ecfa8c12423773c3666085e7ce0bcfb51d80cb9a15cb5d23403bda90a6
MD5 a01ff82b90fc39d375e5bea70ef8597f
BLAKE2b-256 9de46e77c242cf35cb7cdb0af0cee03a08e9c459255fc7e9dc77615ce02aef6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page