Skip to main content

Local-first MCP-compatible persistent knowledge base and media contextualization system.

Project description

ConTXT BOX

ConTXT BOX

A local-first external context box for coding agents.

CI Python MCP MarkItDown Docling License


What Is It?

ConTXT BOX is a strict, local-first knowledge layer that sits beside any project or document folder. It gives coding agents such as Claude Code, Codex, Cursor, and other MCP clients a fast external memory: indexed filenames, folders, neighbors, summaries, cached document/image context, and durable chat preservation.

The design is intentionally narrow. Documents and images are the core path because they cover most real user context. Heavy extraction uses exactly one configured engine: MarkItDown or Docling. No multi-tool fallback chain is used in core extraction.

Features

  • Lazy indexing with rel_path, filename, folder, mtime, size, type, neighbors, folder summaries, and cheap file summaries.
  • On-demand extraction only through MarkItDown or Docling.
  • Permanent Markdown sidecars under .contextbox/history/media/.
  • MCP tools for coding agents.
  • Watchdog-based watch command for continuous index updates.
  • Preview-only smart reorganization.
  • Auto preservation into .contextbox/CONTEXT.md plus JSONL history.

Quick Start

uv sync
uv run contxtbox --help
uv run contxtbox init --root "S:\Papers"
uv run contxtbox config-show --root "S:\Papers"
uv run contxtbox index --root "S:\Papers"
uv run contxtbox health --root "S:\Papers"
uv run contxtbox search "computer vision" --root "S:\Papers"

Install the document/image engines:

uv sync --extra media

Extract one file with the strict default engine:

uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers"

Use Docling explicitly:

uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers" --engine docling

Watch a folder:

uv run contxtbox watch --root "S:\Papers"

Run production readiness checks:

uv run contxtbox health --root "S:\Papers" --fail-on-error

Show the effective workspace config:

uv run contxtbox config-show --root "S:\Papers"

Production and MCP setup guides:

How It Works

workspace/
`-- .contextbox/
    |-- index.json
    |-- config.toml
    |-- CONTEXT.md
    |-- preservation.jsonl
    `-- history/
        `-- media/
            `-- sanitized__file__path.context.md

Indexing Rules

index, update_index, and watch always record:

  • rel_path
  • filename
  • folder_path
  • mtime
  • size
  • file_type
  • neighbors
  • parent_folder_summary
  • last_indexed
  • context_summary

The default summary is cheap and deterministic. It uses filename, folder name, and 5-7 nearby files. It does not open PDFs or images during indexing.

Configuration

init creates .contextbox/config.toml:

extraction_engine = "markitdown"
max_inline_bytes = 512000
large_file_bytes = 50000000
max_neighbors = 10
debounce_seconds = 2.0

ignored_dirs = [
  ".git",
  ".venv",
  "node_modules",
]

priority_folders = [
  "codebases/",
  "research/",
  "specs/",
  "decisions/",
  "assets/images/",
]

Use "docling" when you want Docling as the strict extraction engine.

Extraction Rules

Heavy extraction only happens when:

  • extract-media path is called,
  • or an MCP client calls get_file(path, depth="full").

The result is cached as Markdown in .contextbox/history/media/, and index.json receives:

  • extracted_at
  • context_ref
  • extraction_method
  • extraction_status
  • extraction_warnings
  • extraction_duration_seconds

Sidecars include the same audit header before extracted content. Status values are conservative: success, partial, metadata-only, or cached.

MCP Tools

  • update_index()
  • health()
  • search(query, limit=10)
  • get_file(path, depth="metadata" | "full")
  • pull_context(task, limit=5)
  • extract_media(path, force=false)
  • reorganize(instruction)
  • auto_preserve_context(summary, metadata=null)

Start the MCP server:

uv run contxtbox mcp --root "S:\Papers"

Attribution

Roadmap

  • Stronger semantic search over sidecars.
  • Reorganization scoring based on folder summaries and neighbor cues.
  • MCP client recipes for Claude Code, Codex, Cursor, and others.
  • Safe apply/undo flow for reorganization.
  • Configurable ignore rules and extraction engine policy.

License

MIT. See LICENSE.

Release

PyPI publishing is configured for Trusted Publishing through GitHub Actions. See Production readiness.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contxt_box-0.1.0.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contxt_box-0.1.0-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file contxt_box-0.1.0.tar.gz.

File metadata

  • Download URL: contxt_box-0.1.0.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contxt_box-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d8d6e2064276565f11e4bec8f7508a8d8ae6f39c39109f8ec43f9323eed6b318
MD5 6959dd500251bcf37c7d024573e01c57
BLAKE2b-256 8e58a78dd59fcd27c48242f0547b16af539b26e0763e68510bd21fc22b02c458

See more details on using hashes here.

Provenance

The following attestation bundles were made for contxt_box-0.1.0.tar.gz:

Publisher: publish-pypi.yml on Oshadha345/contxt-box

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file contxt_box-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: contxt_box-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contxt_box-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 42b850f62134b2400b4fe6a20b40abb704396f5f3734c5193b24d66c0fa0ee90
MD5 9e619359978da0785b937ebcb0b4febe
BLAKE2b-256 6ab8ca8f971646fc633e6e3b17419078dd2c6e686a069c7aa0a3f13f71b14540

See more details on using hashes here.

Provenance

The following attestation bundles were made for contxt_box-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on Oshadha345/contxt-box

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page