Skip to main content

Local-first MCP-compatible persistent knowledge base and media contextualization system.

Project description

ConTXT BOX

ConTXT BOX

A local-first external context box for coding agents.

CI Python MCP MarkItDown Docling License


What Is It?

ConTXT BOX is a strict, local-first knowledge layer that sits beside any project or document folder. It gives coding agents such as Claude Code, Codex, Cursor, and other MCP clients a fast external memory: indexed filenames, folders, neighbors, summaries, cached document/image context, and durable chat preservation.

The design is intentionally narrow. Documents and images are the core path because they cover most real user context. Heavy extraction uses exactly one configured engine: MarkItDown or Docling. No multi-tool fallback chain is used in core extraction.

Features

  • Lazy indexing with rel_path, filename, folder, mtime, size, type, neighbors, folder summaries, and cheap file summaries.
  • On-demand extraction only through MarkItDown or Docling.
  • Permanent Markdown sidecars under .contextbox/history/media/.
  • MCP tools for coding agents.
  • Watchdog-based watch command for continuous index updates.
  • Preview-only smart reorganization.
  • Auto preservation into .contextbox/CONTEXT.md plus JSONL history.

Quick Start

uv sync
uv run contxtbox --help
uv run contxtbox init --root "S:\Papers"
uv run contxtbox config-show --root "S:\Papers"
uv run contxtbox index --root "S:\Papers"
uv run contxtbox health --root "S:\Papers"
uv run contxtbox search "computer vision" --root "S:\Papers"

When commands are run from inside the target workspace, --root can be omitted.

Install the document/image engines:

uv sync --extra media

Extract one file with the strict default engine:

uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers"

Use Docling explicitly:

uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers" --engine docling

Watch a folder:

uv run contxtbox watch --root "S:\Papers"

Run production readiness checks:

uv run contxtbox health --root "S:\Papers" --fail-on-error

Show the effective workspace config:

uv run contxtbox config-show --root "S:\Papers"

Production and MCP setup guides:

How It Works

workspace/
`-- .contextbox/
    |-- index.json
    |-- config.toml
    |-- CONTEXT.md
    |-- preservation.jsonl
    `-- history/
        `-- media/
            `-- sanitized__file__path.context.md

Indexing Rules

index, update_index, and watch always record:

  • rel_path
  • filename
  • folder_path
  • mtime
  • size
  • file_type
  • neighbors
  • parent_folder_summary
  • last_indexed
  • context_summary

The default summary is cheap and deterministic. It uses filename, folder name, and 5-7 nearby files. It does not open PDFs or images during indexing.

Configuration

init creates .contextbox/config.toml:

extraction_engine = "markitdown"
max_inline_bytes = 512000
large_file_bytes = 50000000
max_neighbors = 10
debounce_seconds = 2.0
auto_watch = true

ignored_dirs = [
  ".git",
  ".venv",
  "node_modules",
]

priority_folders = [
  "codebases/",
  "research/",
  "specs/",
  "decisions/",
  "assets/images/",
]

Use "docling" when you want Docling as the strict extraction engine.

Extraction Rules

Heavy extraction only happens when:

  • extract-media path is called,
  • or an MCP client calls get_file(path, depth="full").

The result is cached as Markdown in .contextbox/history/media/, and index.json receives:

  • extracted_at
  • context_ref
  • extraction_method
  • extraction_status
  • extraction_warnings
  • extraction_duration_seconds

Sidecars include the same audit header before extracted content. Status values are conservative: success, partial, metadata-only, or cached.

MCP Tools

  • update_index()
  • server_info()
  • set_root(root, index=true)
  • health()
  • search(query, limit=10)
  • get_file(path, depth="metadata" | "full")
  • pull_context(task, limit=5)
  • extract_media(path, force=false)
  • reorganize(instruction)
  • auto_preserve_context(summary, metadata=null)

Start the MCP server:

uv run contxtbox mcp --root "S:\Papers"

Attribution

Roadmap

  • Stronger semantic search over sidecars.
  • Reorganization scoring based on folder summaries and neighbor cues.
  • MCP client recipes for Claude Code, Codex, Cursor, and others.
  • Safe apply/undo flow for reorganization.
  • Configurable ignore rules and extraction engine policy.

License

MIT. See LICENSE.

Release

PyPI publishing is configured for Trusted Publishing through GitHub Actions. See Production readiness.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contxt_box-0.1.3.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contxt_box-0.1.3-py3-none-any.whl (28.8 kB view details)

Uploaded Python 3

File details

Details for the file contxt_box-0.1.3.tar.gz.

File metadata

  • Download URL: contxt_box-0.1.3.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contxt_box-0.1.3.tar.gz
Algorithm Hash digest
SHA256 de52b3ff5c1dda3fcbd78b9e497735cbff34ca811cbd581e0f73a4406ca6a418
MD5 68fddd4e5598d52f9432a99c4978c4f5
BLAKE2b-256 c313f4a402bbbc2290c0636f5b6115765e255dc1913bb88201846529a61df7d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for contxt_box-0.1.3.tar.gz:

Publisher: publish-pypi.yml on Oshadha345/contxt-box

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file contxt_box-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: contxt_box-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 28.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contxt_box-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a8716194d482761ec42b9c4fe972dea192f3e1153dd6109dba74f2f6b5ce8b22
MD5 828a8241152114b83b58f85873df279e
BLAKE2b-256 7d86cf89399bf5ece898e4fea7d8a350021dfd565971280366bb07d883534d56

See more details on using hashes here.

Provenance

The following attestation bundles were made for contxt_box-0.1.3-py3-none-any.whl:

Publisher: publish-pypi.yml on Oshadha345/contxt-box

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page