Skip to main content

MCP server providing a durable, fault-tolerant write surface for coding agents

Project description

resilient-write

An MCP server that gives coding agents a durable, fault-tolerant write surface so they can keep making forward progress when a tool call is blocked by a content filter, a size cap, or an opaque transport error.

This repo is a design + spec at first. Code lands after the spec is frozen.

Why this exists

Coding agents today write through a single tool like Write or edit_file. When that tool rejects a payload (for any reason), the failure mode is usually:

  1. The agent loses the content it was trying to write (it only exists in model memory).
  2. There is no structured error, so the agent can't reason about what to change.
  3. Retries thrash because the agent re-sends the exact same rejected content.
  4. Any downstream step that depended on the file is now operating on half-broken state.
  5. No handoff mechanism exists, so a fresh agent or sibling agent has to re-derive the context.

This project addresses those five failure modes as five orthogonal layers.

Layered architecture (sixteen MCP tools, one convention file)

Layer Tool name What it does Catches
L0 rw.risk_score Static pre-flight classifier over draft content (regexes + length + binary heuristics). Returns a numeric risk score plus a list of detected patterns. Secret-shaped strings, long lines, encoding issues before they hit the sink.
L1 rw.safe_write Transactional write: temp file + hash verify + atomic rename + journal append. Half-written files, lost prior content, no audit trail.
L2 rw.chunk_write Write one numbered chunk to a session directory via safe_write; idempotent retries. Single-chunk failures without losing prior chunks.
L2 rw.chunk_append Auto-incrementing chunk write — detects the highest index and writes index+1. Misnumbered chunks; lets the agent stream sections without tracking indices.
L2 rw.chunk_compose Concatenate a session's chunks in order and write the result through safe_write. Any single-call write that is too large or too risky; allows incremental progress with rollback.
L2 rw.chunk_reset Destructively wipe an in-progress chunk session. Stale session state after an abandoned compose.
L2 rw.chunk_status Report which chunk indices are present and what total was declared. Missing or duplicate chunks before compose.
L2 rw.chunk_preview Dry-run compose — returns concatenated content without writing to disk. Pre-write validation; pair with rw.validate to catch errors before commit.
L3 rw.typed_error (schema) Specification for structured tool errors {reason_hint, detected_patterns, suggested_action, retry_budget}. Wraps safe_write and chunk_compose. Opaque tool-harness errors that agents cannot reason about.
L4 rw.scratch_put Store raw material out-of-band, content-addressed by SHA-256. Cases where the content legitimately does not belong in the workspace.
L4 rw.scratch_ref Look up a scratchpad entry by hash or label without returning content. Verify what is stored before deciding to surface it.
L4 rw.scratch_get Retrieve raw content by hash (disableable via $RW_SCRATCH_DISABLE_GET). Controlled retrieval; supports write-only mode.
L5 rw.handoff_write Write a HANDOFF.md continuity envelope (YAML front-matter + body). Reports drift warnings. Cross-agent and cross-session continuity when a task is interrupted.
L5 rw.handoff_read Parse a HANDOFF.md envelope and return structured front-matter plus body. Picking up where another agent left off.
rw.journal_tail Inspection helper — last N rows of the L1 write journal, with optional filters. Debugging write history and audit.
rw.validate Format-aware syntax validator (LaTeX, JSON, Python, YAML). Structural errors (unbalanced braces, bad parses) before they reach disk.
rw.analytics Journal analytics — write counts, timing, hot paths, session summaries. Understanding agent write patterns and diagnosing performance issues.

Layers can be adopted independently. The minimum useful install is L1 + L5.

Repo layout (planned)

resilient-write/
├── README.md                    # this file
├── docs/
│   ├── ARCHITECTURE.md          # deep dive on each layer
│   ├── API.md                   # MCP tool schemas (input/output)
│   ├── POLICY.md                # default L0 classifier patterns + thresholds
│   ├── HANDOFF_SCHEMA.md        # envelope format for L5
│   └── SCENARIOS.md             # walk-through of real failure modes
├── spec/
│   ├── tools.schema.json        # JSON Schema for the sixteen tools
│   └── handoff.schema.json      # JSON Schema for HANDOFF.md front-matter
├── src/                         # Python implementation (post-spec)
│   └── resilient_write/
│       ├── __init__.py
│       ├── server.py            # MCP entrypoint
│       ├── safe_write.py        # L1
│       ├── risk_score.py        # L0
│       ├── chunk_compose.py     # L2
│       ├── scratchpad.py        # L4
│       ├── handoff.py           # L5
│       └── errors.py            # L3 typed error envelope
└── tests/
    └── scenarios/               # replay fixtures for SCENARIOS.md

Install (planned)

uvx resilient-write              # run the MCP server
# or
pipx install resilient-write

MCP config for Claude Code / Cursor / Codex / Copilot clients lives in docs/INSTALL.md.

Status

  • Architecture document
  • Per-layer specs
  • JSON schemas (spec/errors.schema.json)
  • Reference Python implementation (all six layers, 16 tools)
  • Test fixtures (186 tests, all green)
  • Published MCP config snippets (docs/INSTALL.md)
  • Published to PyPI

Origin

This project was spun out of a concrete failure observed while producing an LLM-CLI telemetry analysis report. A Write tool call was silently rejected when the draft contained redacted-looking credential strings; the agent recovered only after five retries and a hand-written chunked-append workaround. The five layers here correspond to the five things that would have caught that failure before it wasted cycles. See docs/SCENARIOS.md for the full postmortem.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resilient_write-0.1.0.tar.gz (131.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

resilient_write-0.1.0-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file resilient_write-0.1.0.tar.gz.

File metadata

  • Download URL: resilient_write-0.1.0.tar.gz
  • Upload date:
  • Size: 131.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for resilient_write-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b0015cda593d7d0909f20051c2169356f6c4f85a1bba89e67547987241765333
MD5 e28b6ba4caa541fc6b8004921fb6a694
BLAKE2b-256 d4ef13feba616f82e17db0ae0d5bb99d7e9a5408b6f91aaa234377cd142781fd

See more details on using hashes here.

File details

Details for the file resilient_write-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for resilient_write-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20b0887e672911fc3ca8409962c6645f681f3587c1c3f34bca35fb84fb38870e
MD5 54791f4e2bd66088368365a95a66de8b
BLAKE2b-256 bc768d68c726a03a6705678472812a879e4f9b1c5158d7b156354e7d3f3c9b3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page