MCP server for read-only sensitive data: tools return metadata and aggregates only
Project description
Glovebox
Glovebox is a Model Context Protocol (MCP) server that exposes read-only access to sensitive files mounted at a fixed path inside a container. Tools return metadata and aggregates only (directory listings, file stats, regex match counts, line numbers, CSV dimensions, line counts)—not raw file contents.
The design follows the “glovebox / cleanroom” idea: data stays in a bounded environment; the model receives structured summaries through tool results, not a dump of secrets.
Full documentation: install the docs extra and run mkdocs serve (see docs/index.md), or browse the markdown under docs/.
Quick start (pip)
pip install mcp-glovebox
Set the root and start the MCP server on stdio:
GLOVEBOX_ROOT=/path/to/sensitive glovebox
Pre-flight check before wiring a client:
GLOVEBOX_ROOT=/path/to/sensitive glovebox --doctor
glovebox --print-config # resolved JSON snapshot for automation
Configure your MCP client to run glovebox with GLOVEBOX_ROOT set. Copy-paste JSON for Claude Desktop and Cursor are on the MCP client examples page.
Quick start (Docker)
Architecture note: the published image targets
linux/arm64(Apple Silicon, AWS Graviton). It runs natively on Mac M-series and arm64 Linux. For x86-64 hosts, build locally (see "Build it yourself" under Releases and versioning).
Pull the published image:
docker pull touchthesun/glovebox:0.1.1
Or build locally:
docker build -t glovebox:local .
Run the MCP server on stdio (required for most MCP clients). Mount your sensitive directory read-only at /glovebox/data. Use the hardened form for any deployment against real sensitive data:
docker run --rm -i \
--read-only \
--tmpfs /tmp \
--cap-drop ALL \
--security-opt no-new-privileges \
-v /path/to/sensitive:/glovebox/data:ro \
touchthesun/glovebox:0.1.1
-i keeps stdin open so the client can speak MCP over stdio. See Security defaults for a full audit of what the flags do and what a naive user gets without them.
Configure your MCP client to launch this command. Copy-paste JSON templates are in MCP client examples; broader integration notes live in Integration.
Before attaching a client locally, sanity-check your mount directory:
GLOVEBOX_ROOT=/path/to/sensitive glovebox --doctor
Environment variables
| Variable | Default | Meaning |
|---|---|---|
GLOVEBOX_ROOT |
/glovebox/data |
Directory all tool paths are relative to |
GLOVEBOX_MAX_OUTPUT_BYTES |
256000 |
Upper bound on JSON size for a tool result |
GLOVEBOX_MAX_SEARCH_FILE_BYTES |
1048576 |
Files larger than this are rejected for search/aggregate |
GLOVEBOX_SEARCH_BUDGET |
100 |
Per-session ceiling on search calls (<=0 disables); bounds oracle-style reconstruction |
GLOVEBOX_MIN_CELL |
5 |
Small-cell suppression threshold; counts below this are returned as "<k" to reduce re-identification risk |
GLOVEBOX_MIN_FILE_ROWS |
0 (off) |
Refuse search/aggregate on files below N rows/lines |
GLOVEBOX_REDACT_FILENAMES |
0 (off) |
Hash name fields in glovebox_list responses instead of returning real filenames. Enable when filenames in the mount are themselves sensitive (e.g. patient_HIV_positive.pdf). Trade-off: directory-listing navigation is disabled; directed-analysis workflows (explicit paths) are unaffected. |
GLOVEBOX_AUDIT_LOG |
(stderr only) | Append JSONL audit records to this file path in addition to stderr |
Built-in tools
glovebox_list— List directory entries (name, type, size, mtime). No file contents.glovebox_stat— Metadata for one path. No file contents.glovebox_search— Regex search:count_matchesorline_numbers_only. Never returns matching line text.glovebox_aggregate—csv: row and column counts;text: line count only. Never returns cell or line contents.
Paths are relative to GLOVEBOX_ROOT; absolute paths and escapes outside the mount are rejected.
Threat model (summary)
Primary defence — keep secrets out of filenames. Tool responses return metadata verbatim (filenames, directory structure, sizes, mtimes). Do not encode sensitive information in file or directory names; Glovebox protects file contents, not metadata. If you cannot control filenames, set GLOVEBOX_REDACT_FILENAMES=1 — see the env vars table above for the trade-off.
Glovebox reliably answers count and frequency questions (how many rows match this pattern? which lines contain this credential?) without field values entering model context. Segmented analysis over known categories is possible with multiple search calls. Statistical aggregates, value discovery, and open-ended exploration require the constrained-computation roadmap tier. See the use-case boundary analysis for a full ✓/≈/✗ breakdown across PII and code-audit scenarios.
Glovebox is one control in a larger compliance story: it minimizes what crosses into the model context but does not govern LLM providers, compromised hosts, or malicious MCP clients. See the full threat model and harness non-goals.
Development
python -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'
pytest # CI also runs scripts/export_tool_manifest.py --check
Run the MCP server locally (stdio):
GLOVEBOX_ROOT=/path/to/data python -m glovebox
Pre-flight diagnostics:
GLOVEBOX_ROOT=/path/to/data python -m glovebox --doctor
python -m glovebox --print-config # resolved JSON snapshot for automation
Documentation site locally:
pip install -e '.[docs]'
mkdocs serve
Releases and versioning
Releases follow Semantic Versioning. See CHANGELOG.md for the full history.
Tagging convention: a git tag v0.1.0 produces Docker images tagged 0.1.0 and latest. Users should pin to the versioned tag, not latest.
docker pull touchthesun/glovebox:0.1.1 # pinned — recommended
docker pull touchthesun/glovebox:latest # floating — only for local dev
Architecture: published images target
linux/arm64(Apple Silicon, AWS Graviton). x86-64 users should build locally.
To cut a release:
- Update
versioninpyproject.tomlandsrc/glovebox/_version.pyto match. - Add a release entry to
CHANGELOG.md. - Commit, then push a semver tag:
git tag v0.1.0 && git push origin v0.1.0
- In CI, manually trigger
docker_push_hub(anddocker_pushfor the GitLab registry). Both jobs requireDOCKERHUB_USER/DOCKERHUB_TOKENCI variables to be set.
Build it yourself (required for x86-64; always available as a fallback):
docker build -t glovebox:local .
docker tag glovebox:local touchthesun/glovebox:0.1.0
docker push touchthesun/glovebox:0.1.0
Adding tools
Use the glovebox-tool Cursor skill and the templates/tool template. New tools must preserve the no-leak contract and add contract tests. Run python scripts/validate_tools.py before committing, then python scripts/export_tool_manifest.py --write when tool schemas change.
Contributing and security
See CONTRIBUTING.md for development setup, the no-leak contract, and the PR checklist. To report a security vulnerability, follow the process in SECURITY.md — do not open a public issue.
Evaluation harness
The harness/ directory runs four layers of scenarios (tool surface, LLM behavior, inference, evidence). See Harness overview, harness roadmap, and CI semantics.
Optional Falco-sidecar notes: Hardening.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_glovebox-0.1.1.tar.gz.
File metadata
- Download URL: mcp_glovebox-0.1.1.tar.gz
- Upload date:
- Size: 43.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0e1819b600ef3b9195363d34e89d7db215d7f50d54ea24d694cbd084f3dc9d4
|
|
| MD5 |
379a03cf2703acc2c624b2c1ebea1312
|
|
| BLAKE2b-256 |
55d8f0743d4981d202d8aea2bdfd3abbaede9fd5555bf6e27a6c3a89b4506f8c
|
File details
Details for the file mcp_glovebox-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mcp_glovebox-0.1.1-py3-none-any.whl
- Upload date:
- Size: 31.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
788e7948c90e77f2b8d51cea917cbb29157203c91ea1adfc9d53cb42d8f45e7f
|
|
| MD5 |
723571d11517a1d584133d954ef30d7f
|
|
| BLAKE2b-256 |
3e0aee4642958d7e88dd9539662d857bca4071a8914475b745cdc9520bedf591
|