Skip to main content

Bundle a code repository into a single LLM-ready text file.

Project description

contextpack

Bundle a code repository into a single LLM-ready text file.

contextpack walks a directory, drops binaries / lockfiles / build output, respects .gitignore, prioritizes the files an LLM actually wants to see, and emits a single text artifact you can paste into any chat window.

Zero API calls. Zero ML dependencies. Just click and pathspec.

Why this exists

Pasting a whole repo into a chat is annoying. Existing tools either:

  • Dump every file (including node_modules and PNGs) and blow your context window
  • Need an API key or a model to summarize
  • Make you hand-curate the file list every time

contextpack does the boring middle layer: pick the right files, keep them inside a token budget, format the result so an LLM can navigate it on a single read. It runs offline, finishes in seconds, and produces deterministic output you can diff.

Install

pip install contextpack

Or from source:

git clone https://github.com/pranavviswanathan/contextpack
cd contextpack
pip install -e .

Requires Python 3.8+.

Usage

contextpack .                        # pack current directory to stdout
contextpack ./myrepo                 # pack a specific path
contextpack . --limit 100k           # token limit (default 200k)
contextpack . --out context.txt      # write to file instead of stdout
contextpack . --ignore tests/ docs/  # additional ignore patterns
contextpack . --summarize            # summarize large files instead of truncating

--limit accepts integers, k (thousands), or m (millions): 50000, 100k, 1.5m.

--ignore is repeatable and accepts gitignore-style globs:

contextpack . --ignore "*.test.ts" --ignore "fixtures/"

What gets included

contextpack always skips:

  • VCS / build / cache directories: .git, node_modules, __pycache__, build/, dist/, .venv, target, .next, ...
  • Lockfiles: package-lock.json, yarn.lock, poetry.lock, Cargo.lock, *.lock
  • Binaries and media: *.png, *.jpg, *.pdf, *.zip, *.so, *.dll, fonts, audio, video
  • Generated noise: *.log, *.pyc, *.min.js, *.map
  • Secrets: .env, .env.*

On top of that, it honors your .gitignore and any --ignore patterns you pass.

How budget allocation works

When the repo fits inside --limit, every text file is included verbatim.

When it doesn't, files are ranked and packed in priority order:

  1. Entry points (main.py, index.js, app.py, server.js, main.go, ...)
  2. README files
  3. Source code (.py, .ts, .go, .rs, .java, ...)
  4. Configs (pyproject.toml, package.json, Dockerfile, ...)
  5. Tests
  6. Everything else

A single file is capped at roughly 10% of the total budget. Files exceeding that cap are either truncated (with a [FILE TRUNCATED - N lines omitted] marker) or, with --summarize, replaced by a heuristic summary: first 20 lines, last 10 lines, plus a list of function/class names found via regex.

Output format

=== CONTEXTPACK ===
Repo: myrepo
Files included: 23 of 31
Estimated tokens: 94,200 / 200,000
Skipped (too large): migrations.sql, package-lock.json
Generated: 2025-01-15 14:32

=== FILE: src/main.py ===
[file contents]

=== FILE: src/utils/helpers.py ===
[file contents]
...

The === FILE: <path> === delimiter is unambiguous and easy for an LLM to parse on a single pass.

Token estimation

Token counts use a chars / 4 heuristic. It matches BPE tokenizers within ~10% on typical source code — close enough for budgeting, and free of ML dependencies.

Library use

from pathlib import Path
from contextpack.walker import walk
from contextpack.tokenizer import rank, estimate_tokens

entries = walk(Path("./myrepo"), extra_ignores=["docs/"])
ranked = rank(entries)
for rf in ranked[:10]:
    print(rf.priority, rf.tokens, rf.entry.path)

Publishing (maintainers)

pip install --upgrade build twine
python -m build
python -m twine upload dist/*

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxbundle-0.1.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctxbundle-0.1.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file ctxbundle-0.1.0.tar.gz.

File metadata

  • Download URL: ctxbundle-0.1.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for ctxbundle-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f7f1b1c846b886007cae23e64fe5f6b954e0805c8d180f54fa38068ee4a672c3
MD5 ebcd0e3cef3a3357dd709bea7cbfc63a
BLAKE2b-256 38b39fa964ad00bc0d8b6d5bc239c8d9c7934086030195c14563ae6c2bb5b5a3

See more details on using hashes here.

File details

Details for the file ctxbundle-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ctxbundle-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for ctxbundle-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 919a1e22cbb216a5632697255f1311858db027059d1bc8c160446c789946d9b4
MD5 2bbb093e1e666d9d52c92eb55885741c
BLAKE2b-256 8080c36eac779980eb5dd61d39e3067d6d2e0a5bc0af73c64b6f8b3dcbd4d924

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page