Local-first MCP-compatible persistent knowledge base and media contextualization system.
Project description
What Is It?
ConTXT BOX is a strict, local-first knowledge layer that sits beside any project or document folder. It gives coding agents such as Claude Code, Codex, Cursor, and other MCP clients a fast external memory: indexed filenames, folders, neighbors, summaries, cached document/image context, and durable chat preservation.
The design is intentionally narrow. Documents and images are the core path because they cover most real user context. Heavy extraction uses exactly one configured engine: MarkItDown or Docling. No multi-tool fallback chain is used in core extraction.
Features
- Lazy indexing with
rel_path, filename, folder, mtime, size, type, neighbors, folder summaries, and cheap file summaries. - On-demand extraction only through MarkItDown or Docling.
- Permanent Markdown sidecars under
.contextbox/history/media/. - MCP tools for coding agents.
- Watchdog-based
watchcommand for continuous index updates. - Preview-only smart reorganization.
- Auto preservation into
.contextbox/CONTEXT.mdplus JSONL history.
Quick Start
uv sync
uv run contxtbox --help
uv run contxtbox init --root "S:\Papers"
uv run contxtbox config-show --root "S:\Papers"
uv run contxtbox index --root "S:\Papers"
uv run contxtbox health --root "S:\Papers"
uv run contxtbox search "computer vision" --root "S:\Papers"
Install the document/image engines:
uv sync --extra media
Extract one file with the strict default engine:
uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers"
Use Docling explicitly:
uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers" --engine docling
Watch a folder:
uv run contxtbox watch --root "S:\Papers"
Run production readiness checks:
uv run contxtbox health --root "S:\Papers" --fail-on-error
Show the effective workspace config:
uv run contxtbox config-show --root "S:\Papers"
Production and MCP setup guides:
How It Works
workspace/
`-- .contextbox/
|-- index.json
|-- config.toml
|-- CONTEXT.md
|-- preservation.jsonl
`-- history/
`-- media/
`-- sanitized__file__path.context.md
Indexing Rules
index, update_index, and watch always record:
rel_pathfilenamefolder_pathmtimesizefile_typeneighborsparent_folder_summarylast_indexedcontext_summary
The default summary is cheap and deterministic. It uses filename, folder name, and 5-7 nearby files. It does not open PDFs or images during indexing.
Configuration
init creates .contextbox/config.toml:
extraction_engine = "markitdown"
max_inline_bytes = 512000
large_file_bytes = 50000000
max_neighbors = 10
debounce_seconds = 2.0
ignored_dirs = [
".git",
".venv",
"node_modules",
]
priority_folders = [
"codebases/",
"research/",
"specs/",
"decisions/",
"assets/images/",
]
Use "docling" when you want Docling as the strict extraction engine.
Extraction Rules
Heavy extraction only happens when:
extract-media pathis called,- or an MCP client calls
get_file(path, depth="full").
The result is cached as Markdown in .contextbox/history/media/, and index.json receives:
extracted_atcontext_refextraction_methodextraction_statusextraction_warningsextraction_duration_seconds
Sidecars include the same audit header before extracted content. Status values are conservative:
success, partial, metadata-only, or cached.
MCP Tools
update_index()health()search(query, limit=10)get_file(path, depth="metadata" | "full")pull_context(task, limit=5)extract_media(path, force=false)reorganize(instruction)auto_preserve_context(summary, metadata=null)
Start the MCP server:
uv run contxtbox mcp --root "S:\Papers"
Attribution
- Model Context Protocol Python SDK, MIT.
- MarkItDown, MIT.
- Docling, MIT.
- watchdog, Apache-2.0.
- sentence-transformers, Apache-2.0 library with model-specific licenses.
- ChromaDB, Apache-2.0.
- gstack, MIT, as workflow inspiration.
- Ponytail, MIT, as minimal-agent behavior inspiration.
Roadmap
- Stronger semantic search over sidecars.
- Reorganization scoring based on folder summaries and neighbor cues.
- MCP client recipes for Claude Code, Codex, Cursor, and others.
- Safe apply/undo flow for reorganization.
- Configurable ignore rules and extraction engine policy.
License
MIT. See LICENSE.
Release
PyPI publishing is configured for Trusted Publishing through GitHub Actions. See Production readiness.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contxt_box-0.1.0.tar.gz.
File metadata
- Download URL: contxt_box-0.1.0.tar.gz
- Upload date:
- Size: 4.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8d6e2064276565f11e4bec8f7508a8d8ae6f39c39109f8ec43f9323eed6b318
|
|
| MD5 |
6959dd500251bcf37c7d024573e01c57
|
|
| BLAKE2b-256 |
8e58a78dd59fcd27c48242f0547b16af539b26e0763e68510bd21fc22b02c458
|
Provenance
The following attestation bundles were made for contxt_box-0.1.0.tar.gz:
Publisher:
publish-pypi.yml on Oshadha345/contxt-box
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
contxt_box-0.1.0.tar.gz -
Subject digest:
d8d6e2064276565f11e4bec8f7508a8d8ae6f39c39109f8ec43f9323eed6b318 - Sigstore transparency entry: 2011607404
- Sigstore integration time:
-
Permalink:
Oshadha345/contxt-box@94cc2ac4b004926a0d9fa9997727a47b111d80df -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Oshadha345
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@94cc2ac4b004926a0d9fa9997727a47b111d80df -
Trigger Event:
push
-
Statement type:
File details
Details for the file contxt_box-0.1.0-py3-none-any.whl.
File metadata
- Download URL: contxt_box-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42b850f62134b2400b4fe6a20b40abb704396f5f3734c5193b24d66c0fa0ee90
|
|
| MD5 |
9e619359978da0785b937ebcb0b4febe
|
|
| BLAKE2b-256 |
6ab8ca8f971646fc633e6e3b17419078dd2c6e686a069c7aa0a3f13f71b14540
|
Provenance
The following attestation bundles were made for contxt_box-0.1.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on Oshadha345/contxt-box
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
contxt_box-0.1.0-py3-none-any.whl -
Subject digest:
42b850f62134b2400b4fe6a20b40abb704396f5f3734c5193b24d66c0fa0ee90 - Sigstore transparency entry: 2011607664
- Sigstore integration time:
-
Permalink:
Oshadha345/contxt-box@94cc2ac4b004926a0d9fa9997727a47b111d80df -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Oshadha345
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@94cc2ac4b004926a0d9fa9997727a47b111d80df -
Trigger Event:
push
-
Statement type: