Skip to main content

Distill any code repo into a compact, secret-redacted LLM context pack — and fit it to a token budget.

Project description

robinctx 🐦

CI PyPI Python License: MIT

Distill any code repo into a compact, secret-redacted LLM context pack — then fit it to a token budget.

Every LLM works better when it knows your repo's purpose, stack, conventions, and API surface. robinctx extracts exactly that — heuristically, locally, with zero dependencies — and packages it as a single Markdown document (plus a machine-readable JSON sidecar) sized for a prompt. The sidekick that preps the briefing so the hero can fight crime.

30-second quickstart

# no install needed:
uvx robinctx distill .
# or
pipx run robinctx distill .

# then build a task-focused prompt from the pack:
uvx robinctx pack myrepo_context.json --task "add rate limiting to the API" --budget 8000
# or do both at once:
uvx robinctx distill . --pack --task "add rate limiting to the API"

Runs anywhere Python 3.9+ runs — including locked-down environments — because the core uses only the standard library. Git and gitleaks are used when present, never required.

The disclaimer philosophy

Every artifact robinctx generates starts with this block, on purpose:

⚠️ AUTO-GENERATED — USE AT YOUR OWN RISK This context pack was produced by robinctx by heuristic analysis. It may contain errors, omissions, or — despite redaction — sensitive data. Review before sharing outside your trust boundary, and verify any claims (especially refactor suggestions) against the actual source.

Heuristics are honest about being heuristics. The disclaimer carries provenance (tool version, timestamp, repo commit SHA, scan settings) and redaction counts, so anyone downstream — human or LLM — knows exactly what they're holding and how much to trust it.

What gets captured

Section How Notes
Overview & docs README / ARCHITECTURE / CLAUDE.md excerpts redacted, capped
Tech stack manifests (package.json, pyproject, go.mod, Cargo.toml, Gemfile, …) framework inference from deps
Conventions statistical style inference (indentation, quotes, naming, semicolons, type-hint ratio) from up to 60 source samples
Layout rendered directory tree + entry points depth/size capped
API surface Python: ast (precise, incl. async/decorators/methods) • JS/TS/Go/Rust/Ruby/Java: regex (best effort) public symbols only
Git intel branch, recent commits, churn hotspots, contributor count optional, if git present
TODO/FIXME markers comment scan redacted, capped at 300
Refactor signals large files, long functions, churn×size hotspots, TODO clusters, missing tests heuristic — verify before acting

Security model

The output of this tool is destined to be pasted into LLM prompts and shared. Three independent layers stand between your secrets and that output:

  1. File exclusion — credential-like files (.env*, *.pem, secrets.*, id_rsa, .ssh/, .aws/, …) are never read and never listed (names alone can leak). Inside a git repo, files are enumerated via git ls-files --exclude-standard, so anything .gitignore'd — where local secrets usually live — is never touched. A .robinctxignore file adds your own exclusions.
  2. Inline redaction — every embedded excerpt is scrubbed for known token formats (AWS, GitHub, Slack, Google, OpenAI/Anthropic-style, JWTs, private-key blocks, credentialed URLs), secret-keyed assignments (password = …), and high-entropy values (>4.5 bits/char, ≥20 chars, assigned to a variable — hex digests and shaNNN- SRI hashes are exempt).
  3. Output scanning — after generation, the artifacts themselves are scanned with gitleaks (or trufflehog) if installed, falling back to the built-in detectors with a notice. Findings print as file:line [rule], the output is quarantined (renamed *.quarantined), and the run exits 3 — CI-friendly.

Flags: --no-secret-scan opts out entirely; --strict also fails on built-in-scanner findings (recommended in CI). Exit codes: 0 ok • 1 error • 2 usage • 3 leaks found.

Found a leak that survived all three layers? That's a vulnerability — see SECURITY.md.

Library API

pip install robinctx and build on the same engine (fully typed, py.typed shipped):

from robinctx import distill, pack, to_markdown

context = distill("path/to/repo")        # dict — the JSON-sidecar structure
print(context["style"], context["frameworks"])

markdown = to_markdown(context)          # the .md artifact, disclaimer included

result = pack(context, task="refactor the auth module", budget=8000)
print(result.prompt)                     # budget-fitted prompt
print(result.sections, result.est_tokens)

distill() is a pure function over the filesystem (writes nothing); the CLI owns file output and scanning. The sidecar dict carries schema_version with a documented compatibility contract.

CLI reference

robinctx distill <repo> [-o NAME] [--max-file-kb N]
                         [--format md|json|both|claude-md|agents-md|cursorrules]
                         [--no-secret-scan] [--strict]
                         [--pack --task "..." [--mode M] [--budget N] [--sections ...]]
robinctx pack <context.json> [--task "..."] [--mode task|onboard|refactor]
                              [--budget N] [--sections overview,style,api,...]
                              [--since REF] [-o FILE]
robinctx update <context.json> [--strict] [--no-secret-scan]
robinctx serve  <context.json>            # requires robinctx[mcp]

Pack modes prioritize differently when trimming to budget: task leads with conventions and relevant APIs, onboard with overview and layout, refactor with signals and git hotspots. With a --task, API entries / TODOs / refactor signals are relevance-ranked so the most useful detail survives trimming. --since <ref> prepends a redacted "Recent Changes" section (git log + diff stat) — useful for LLMs working on actively evolving repos.

Agent files: --format claude-md emits a ready-to-commit CLAUDE.md (likewise agents-mdAGENTS.md, cursorrules.cursorrules) — a condensed, imperative version of the pack for coding agents that re-read it on every task. The secret-scan gate applies to these too.

Staying fresh: robinctx update ctx.json is a no-op when the repo hasn't changed since the recorded commit SHA, and re-distills when it has — cheap enough for a pre-commit hook or CI step. See docs/recipes.md for ready-made GitHub Action and pre-commit configs.

.robinctxignore

Drop a .robinctxignore (or .repoctxignore) file at the repo root to exclude more files, using a gitignore-flavored subset (fnmatch wildcards; dir/ for directories; leading / anchors to root; ! negation and git-style ** are not supported — * matches across /).

Limitations (read this)

  • Non-Python extraction is regex-based. It catches conventional declarations and misses clever ones; interfaces may be labeled class. Python uses ast and is precise.
  • Refactor signals are heuristics — line counts, churn, TODO density. They're prompts for investigation, not findings. The output says so.
  • Redaction is pattern-based. A password that looks like an English word in prose will not be caught. The entropy detector can't see secrets shorter than ~23 characters (Shannon entropy of a string is bounded by log2 of its length), and may rarely flag random-looking identifiers. Run with gitleaks installed; review output before sharing.
  • Token counts are estimates (len/4) unless you install robinctx[tokens].

Extras

Install Adds
pip install robinctx everything above, stdlib-only
pip install robinctx[tokens] exact token counts via tiktoken
pip install robinctx[mcp] robinctx serve — MCP server exposing the pack as queryable tools

Contributing

See CONTRIBUTING.md. Security-relevant changes require tests, no exceptions.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robinctx-0.1.0.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robinctx-0.1.0-py3-none-any.whl (39.7 kB view details)

Uploaded Python 3

File details

Details for the file robinctx-0.1.0.tar.gz.

File metadata

  • Download URL: robinctx-0.1.0.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for robinctx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3da7adfa4e805a506d43ac07569e34d5f01a06fd3a30efa4ef6e6bcd40aee639
MD5 c38d7aa496e9afc75b9ac13a200aa4b0
BLAKE2b-256 f8f6bb33cb35267a54e318b12c4ccf822b94282e47ecb187bfe3bf78aa21c6ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for robinctx-0.1.0.tar.gz:

Publisher: release.yml on kp-dubbs/robinctx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file robinctx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: robinctx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for robinctx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 320c363cf3bea5b829f3841a4843776320aa6874ed7f2b89a20799df8c3f3b38
MD5 1c850a117c8e6c376a5b1a741d2d3723
BLAKE2b-256 17d380c65b44424df4ef5b3a0806c68036b6464a0c0421bae50041bc933a6475

See more details on using hashes here.

Provenance

The following attestation bundles were made for robinctx-0.1.0-py3-none-any.whl:

Publisher: release.yml on kp-dubbs/robinctx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page