Intelligently package codebases into optimized context for Large Language Models
Project description
Context Packer
Intelligently package codebases into optimized context for Large Language Models (LLMs). Context Packer uses hybrid ranking (semantic search + PageRank) to select the most relevant files within your token budget, with comprehensive fallback strategies for production reliability.
Features
- Hybrid Ranking: Combines semantic search with structural analysis (PageRank) to identify the most important code
- Token Budget Management: Precise token counting using tiktoken to fit LLM context windows
- Dual Output Formats:
- XML for one-shot paste workflows (Claude.ai, ChatGPT)
- ZIP for multi-turn upload workflows (Cursor, Claude Code)
- Production Ready: Automatic fallback strategies for every component
- Incremental Indexing: Build indexes once, reuse for fast queries
- Flexible Configuration: Customize weights, filters, and backends via YAML
Installation
Context Packer offers three installation tiers based on your needs:
Minimal (Core Only)
Basic functionality with regex-based parsing and file size ranking:
pip install ctx-packer
Includes: tiktoken, PyYAML, lxml, typer, rich
Fast (Recommended)
Core + fallback backends for semantic search and graph analysis:
pip install ctx-packer[fast]
Adds: faiss-cpu (vector search), networkx (graph analysis)
All (Full Features)
All features including primary backends for optimal performance:
pip install ctx-packer[all]
Adds: python-igraph (fast PageRank), sentence-transformers (local embeddings), py-tree-sitter (accurate AST parsing)
Quick Start
1. Index Your Repository
Build indexes for semantic search and dependency analysis:
ctx-packer index /path/to/your/repo
This creates a .ctx-pack/ directory with:
vector.idx- Semantic search indexgraph.pkl- Dependency graph with PageRank scoresmetadata.json- Staleness detection metadatalogs/- Execution logs
2. Generate Context Pack
Create an optimized context pack for LLM review:
# Generate ZIP output (default)
ctx-packer pack /path/to/your/repo
# Generate XML output for paste workflows
ctx-packer pack /path/to/your/repo --format xml
# Specify token budget
ctx-packer pack /path/to/your/repo --budget 50000
# Query with natural language
ctx-packer query "authentication and user management" --format zip
3. Use the Output
For XML output: Copy the generated repomix-output.xml and paste into Claude.ai or ChatGPT
For ZIP output: Upload ctx-packer.zip to Cursor or Claude Code. The archive includes:
files/- Selected source files with preserved directory structureREVIEW_CONTEXT.md- Manifest with importance scores and reading order
CLI Commands
ctx-packer index
Build and save indexes for later queries:
ctx-packer index <repo_path> [OPTIONS]
Options:
--config PATH- Custom configuration file (default:.ctx-pack.yaml)--verbose- Enable detailed logging with timing information
ctx-packer query
Search indexed repository and generate output:
ctx-packer query <query_text> [OPTIONS]
Options:
--format {xml|zip}- Output format (default: zip)--budget INT- Token budget (default: 100000)--config PATH- Custom configuration file--output PATH- Output directory (default: ./output)--verbose- Enable detailed logging
ctx-packer pack
Full workflow: index + query + pack:
ctx-packer pack <repo_path> [OPTIONS]
Options:
--query TEXT- Natural language query for semantic search--changed-files PATH- File with list of changed files (one per line)--format {xml|zip}- Output format (default: zip)--budget INT- Token budget (default: 100000)--config PATH- Custom configuration file--output PATH- Output directory (default: ./output)--verbose- Enable detailed logging
Configuration
Create a .ctx-pack.yaml file in your repository root to customize behavior:
# Output settings
format: zip # xml | zip
token_budget: 100000
output_path: ./output
# Scoring weights (must sum to 1.0)
semantic_weight: 0.6
pagerank_weight: 0.4
# File filtering
include_tests: false
include_patterns:
- "**/*.py"
- "**/*.js"
- "**/*.ts"
exclude_patterns:
- "*.min.js"
- "node_modules/**"
- "__pycache__/**"
- ".git/**"
# Backend selection (auto | primary | fallback)
backends:
vector_index: auto # auto | leann | faiss
graph: auto # auto | igraph | networkx
embeddings: auto # auto | local | api
# Embeddings configuration
embeddings:
model: all-MiniLM-L6-v2
device: cpu
batch_size: 32
api_provider: openai
api_key_env: OPENAI_API_KEY
# Performance tuning
performance:
max_workers: 4
cache_embeddings: true
incremental_index: true
See .ctx-pack.yaml.example for detailed documentation of all options.
How It Works
Context Packer uses a multi-stage pipeline to select the most relevant code:
1. AST Parsing
Parse source code into structured chunks with metadata:
- Primary: py-tree-sitter (accurate, 40+ languages)
- Fallback: Regex patterns (Python, JavaScript, TypeScript)
2. Semantic Indexing
Build vector embeddings for semantic search:
- Primary: LEANN (97% storage savings, graph-based)
- Fallback: FAISS (battle-tested HNSW index)
- Embeddings: sentence-transformers (local) or OpenAI API (fallback)
3. Dependency Graph
Analyze code structure and compute PageRank:
- Primary: python-igraph (C++ backend, <1s for 10k files)
- Fallback: NetworkX (pure Python, <10s for 10k files)
4. Hybrid Ranking
Merge semantic and structural scores:
importance_score = semantic_weight × semantic_score + pagerank_weight × pagerank_score
5. Budget Selection
Greedy knapsack algorithm to maximize importance within token budget:
- 80% budget for file content
- 20% reserved for metadata and manifest
6. Output Generation
Package selected files in chosen format:
- XML: Single file with Repomix-style structure
- ZIP: Preserved directory structure + manifest
Fallback Strategy
Context Packer never fails due to missing dependencies. Each component has automatic fallbacks:
Level 1: igraph + LEANN + local embeddings (optimal)
↓ igraph fails
Level 2: NetworkX + LEANN + local embeddings
↓ LEANN fails
Level 3: NetworkX + FAISS + local embeddings
↓ local embeddings OOM
Level 4: NetworkX + FAISS + API embeddings
↓ API fails
Level 5: NetworkX + TF-IDF (no embeddings)
↓ NetworkX too slow
Level 6: File size ranking only (no graph)
All fallback transitions are logged with actionable suggestions.
Performance
Performance targets with primary backends:
- Indexing: <5 minutes for 10,000 files
- Query: <10 seconds for 10,000 files
- Parsing: <5 seconds per 1,000 lines of code
- Token Counting: ±2% accuracy vs actual LLM count
Fallback backends maintain functionality within 2x of primary performance.
Examples
Code Review Workflow
# Index your repository once
ctx-packer index ~/projects/myapp
# Generate context for PR review
ctx-packer query "authentication changes" \
--changed-files changed.txt \
--format zip \
--budget 50000
# Upload ctx-packer.zip to Cursor for review
Bug Investigation
# Find relevant code for a bug
ctx-packer pack ~/projects/myapp \
--query "database connection pooling and timeout handling" \
--format xml \
--budget 30000
# Paste repomix-output.xml into Claude.ai
Documentation Generation
# Select core API files
ctx-packer pack ~/projects/myapp \
--query "public API endpoints and data models" \
--format zip \
--budget 80000
Development
Running Tests
# Install development dependencies
pip install -e ".[dev,all]"
# Run all tests
pytest
# Run with coverage
pytest --cov=context_packer --cov-report=html
# Run property-based tests only
pytest -m property
# Run integration tests
pytest -m integration
# Run benchmarks
pytest -m benchmark --benchmark-only
Test Profiles
Hypothesis property tests support multiple profiles:
# CI profile: 100 examples, verbose output
pytest --hypothesis-profile=ci
# Dev profile: 20 examples, quick feedback
pytest --hypothesis-profile=dev
# Debug profile: 10 examples, maximum verbosity
pytest --hypothesis-profile=debug
Troubleshooting
"LEANN not available, using FAISS fallback"
LEANN is an optional primary backend. Install with:
pip install ctx-packer[all]
"igraph not available, using NetworkX fallback"
python-igraph requires C++ compilation. Install with:
pip install ctx-packer[all]
Or force NetworkX backend in config:
backends:
graph: networkx
"Local embeddings OOM, falling back to API"
Reduce batch size or use API embeddings:
embeddings:
batch_size: 16 # Reduce from default 32
# Or use API
backends:
embeddings: api
Set OPENAI_API_KEY environment variable for API access.
"Index is stale, rebuilding"
Files have changed since last index. This is automatic. To force rebuild:
rm -rf .ctx-pack/
ctx-packer index /path/to/repo
License
GPL-3.0-or-later - see LICENSE file for details.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This ensures that any derivative work must also be open source under GPL-3.0.
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
AI Agents
See AI_AGENTS.md for guidelines on how AI agents should use this tool.
Citation
If you use Context Packer in research, please cite:
@software{context_packer,
title = {Context Packer: Intelligent Codebase Packaging for LLMs},
author = {zamery},
year = {2024},
url = {https://github.com/maemreyo/zmr-ctx-paker}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ctx_packer-0.1.7.tar.gz.
File metadata
- Download URL: ctx_packer-0.1.7.tar.gz
- Upload date:
- Size: 90.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47606cbd9e1a60da762eff35cd6e5400fecd9d8a18cf8a881b87b3cca00ea2c7
|
|
| MD5 |
b5482ffe8ae2c7f82afc0dc2e77c6ac3
|
|
| BLAKE2b-256 |
fae4a16f4c99b39d694fab1158d4a14104c34879aa54304e00fa39223cc6056a
|
File details
Details for the file ctx_packer-0.1.7-py3-none-any.whl.
File metadata
- Download URL: ctx_packer-0.1.7-py3-none-any.whl
- Upload date:
- Size: 90.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b198e4ee2b25231c2111008d89247c65b26b2bfb8fd682b6c02f3e17de26bb57
|
|
| MD5 |
785115ffc5fa19b3928afb9a9a1778d2
|
|
| BLAKE2b-256 |
837a18f9c2b376cff3aa28641ddec24c762440c86b728ee41f030d62ecaca4fe
|