Skip to main content

Agent-first CLI for structured document editing

Project description

docweave

The document editing CLI that makes AI agents 16x cheaper and 27x faster.

Docweave gives AI agents structured, surgical access to Markdown and Word documents. Instead of dumping an entire file into context, agents call inspect to see a document's structure with hidden metadata, then drill into exactly the sections they need. The result: fewer tokens, fewer tool calls, better edits.

Without docweave:  "Here's the entire 40KB document. Find the security section and update it."
With docweave:     inspect --tag security → view --section "Token Management" → apply --patch edit.yaml

Why Annotations Matter: The Numbers

We benchmarked six realistic agent tasks against a 55-heading architecture document — with and without docweave annotations:

Metric Without Annotations With Annotations Improvement
Tokens consumed 323,152 20,282 93.7% fewer
Tool calls 189 7 96.3% fewer
Cost per run $0.97 $0.06 16x cheaper

Task-by-task breakdown

Task Plain (tokens) Annotated (tokens) Saved Calls: Plain → Annotated
Find performance sections 99,613 1,119 98.9% 56 → 1
Find draft sections 99,613 4,442 95.5% 56 → 1
Edit token management 7,154 1,502 79.0% 4 → 2
Find ops-audience sections 99,613 4,442 95.5% 56 → 1
Dependency analysis 9,524 4,442 53.4% 11 → 1
View compliance content 7,635 4,335 43.2% 6 → 1

It's not just speed — it's accuracy

Without annotations, agents must guess. With annotations, they know.

  • Status detection: A plain agent guessed 19 draft sections using a content-length heuristic. The annotated agent found exactly 20 — the correct answer. You can't infer status from content.
  • Audience filtering: Keyword matching flagged 46 of 55 sections as "ops-relevant" (84% false positive rate). Annotations identified exactly 20.
  • Dependency analysis: Text search found 8 sections mentioning "auth." Annotations captured exactly 6 with explicit dependency declarations — no noise, no false positives.

What this means at scale

At $3/MTok (Claude Sonnet input pricing):

Scale Savings
100 agent runs/day $91/day
1,000 agent runs/day $909/day
10,000 agent runs/day $9,086/day

The benchmark code is in benchmarks/. Run it yourself: python benchmarks/generate_agentic_doc.py && python benchmarks/agentic_benchmark.py

How It Works

Docweave parses documents into a normalized block model, resolves structural anchors, and applies targeted edits through a declarative YAML patch format. Every command returns a stable JSON envelope on stdout.

The agent workflow

1. inspect doc.md              → headings + annotations (tags, status, audience, summaries)
2. inspect doc.md --tag X      → filter to sections you care about
3. view doc.md --tag X         → read only the content of those sections
4. apply ... --patch ...       → make targeted edits
5. apply ... --patch ...       → set_context to update annotations after edits

One inspect call on a 55-heading document costs ~4,400 tokens. That single call gives the agent the summary, status, audience, tags, and dependencies for every section — enough to decide exactly where to look next without reading any content.

Annotations: the key innovation

Add a single HTML comment before any heading:

<!-- docweave: {"summary": "OAuth2 flow with PKCE", "tags": ["security", "api"], "status": "draft"} -->
## Authentication

This comment is invisible when rendered but surfaced by inspect. The agent sees:

{
  "text": "Authentication",
  "level": 2,
  "block_id": "blk_003",
  "annotations": {
    "summary": "OAuth2 flow with PKCE",
    "tags": ["security", "api"],
    "status": "draft"
  }
}

Now it can filter by tag (--tag security), check status without reading content, and understand section relationships through dependencies — all in a single tool call.

Features

  • Structured JSON output — Every response is a Pydantic-validated Envelope with ok, errors, warnings, and metrics fields. Parse one schema regardless of success or failure.
  • Multi-format support — Native backends for Markdown and Word (.docx) with automatic detection.
  • Anchor-based editing — Target blocks by heading, content search, or contextual clues instead of fragile line numbers.
  • Progressive discovery — Embed hidden annotations (summaries, tags, status) in documents. Agents call inspect to see structure + context, then drill into sections with --tag or --section.
  • Atomic writes — Fingerprint-based conflict detection prevents lost updates. Optional backups on every mutation.
  • Semantic diffs — Compare documents at the block level, not just line-by-line.
  • Evidence bundles — Generate before/after snapshots, diffs, and validation reports for audit trails.
  • Transaction journal — Every apply is recorded with full provenance for rollback and review.

Installation

Requires Python 3.12+.

With uv (after the PyPI release)

uv tool install docweave

From the repository

uv tool install git+https://github.com/ThomasRohde/docweave.git

Development install

git clone https://github.com/ThomasRohde/docweave.git
cd docweave
pip install -e ".[dev]"

Or with uv:

uv sync --extra dev

Verify the installation:

docweave --version
docweave guide

Quick Start

# Inspect a document's structure
docweave inspect README.md

# Inspect only sections tagged "security"
docweave inspect doc.md --tag security

# View all blocks as normalized JSON
docweave view README.md

# View blocks from tagged sections
docweave view doc.md --tag api

# Search for blocks containing text
docweave find README.md "installation"

# Resolve a specific anchor
docweave anchor README.md "heading:Quick Start"

# Preview a patch plan (no changes written)
docweave plan doc.md --patch edits.yaml

# Apply a patch with backup and evidence
docweave apply doc.md --patch edits.yaml --backup --evidence-dir ./evidence

# Dry-run an apply (shows plan, writes nothing)
docweave apply doc.md --patch edits.yaml --dry-run

# Compare two document versions
docweave diff before.md after.md

# Validate document structure
docweave validate doc.md

# Review transaction history
docweave journal --file doc.md

Commands

Command Description
guide Show command catalog, error codes, and exit codes
inspect Return structural metadata, headings with annotations
view Return the full normalized block list
find Search blocks for a text query
anchor Resolve an anchor spec to a specific block
plan Preview an execution plan from a YAML patch file
apply Apply a patch to a document with conflict detection
diff Compute raw and semantic diff between two documents
validate Validate structural integrity of a document
journal List or retrieve transaction journal entries

Run docweave guide for the full machine-readable command reference.

JSON Envelope

Every command emits a JSON envelope to stdout:

{
  "ok": true,
  "request_id": "req_20260310_143022_a1b2",
  "command": "inspect",
  "target": "doc.md",
  "result": { /* command-specific payload */ },
  "errors": [],
  "warnings": [],
  "metrics": { "duration_ms": 12 },
  "version": "x.y.z"
}

On failure, ok is false and errors contains structured error details:

{
  "ok": false,
  "command": "inspect",
  "errors": [
    {
      "code": "ERR_IO_FILE_NOT_FOUND",
      "message": "File not found: missing.md"
    }
  ]
  // ...
}

Patch Format

Edits are described in YAML patch files:

version: 1
target:
  file: doc.md
  backend: auto
operations:
  - id: op_001
    op: insert_after
    anchor:
      by: heading
      value: Purpose
    content:
      kind: markdown
      value: |
        New paragraph inserted after the Purpose heading.

  - id: op_002
    op: replace
    anchor:
      by: heading
      value: Scope
    content:
      kind: markdown
      value: |
        Updated scope section content.

Supported Operations

Operation Description
insert_after Insert content after the anchored block
insert_before Insert content before the anchored block
replace Replace the anchored block's content
delete Remove the anchored block
set_heading Change a heading's text
set_context Set hidden annotations on a heading

Anchor Types

Anchor (by) Description
heading Match by heading text
content Match by block content substring
index Match by block index
hash Match by stable content hash

Anchors can be refined with --section, --context-before, --context-after, and --occurrence for precise targeting.

Annotations Reference

Format by backend

Markdown — HTML comments placed before a heading:

<!-- docweave: {"summary": "Authentication flow overview", "tags": ["security", "api"], "status": "draft"} -->
## Authentication

Word (.docx) — Custom XML part inside the archive, invisible to Word users.

Common annotation keys

Key Type Purpose
summary string One-line description of the section
tags string[] Categorical labels for filtering
status string Editing status (draft, review, final)
audience string Target reader
dependencies string[] Sections this one depends on

Setting annotations via patches

Use the set_context operation to add or merge annotations:

operations:
  - id: op_annotate
    op: set_context
    anchor:
      by: heading
      value: Authentication
    context:
      summary: "OAuth2 flow with PKCE"
      tags: ["security", "api"]
      status: "draft"

Merge semantics: new keys are added, existing keys are overwritten.

Querying by tag

# Show only headings tagged "security"
docweave inspect doc.md --tag security

# View blocks from all sections tagged "api"
docweave view doc.md --tag api

Exit Codes

Code Meaning
0 Success
10 Validation error
20 Permission error
40 Conflict error
50 I/O error
90 Internal error

Error Codes

Code Description
ERR_VALIDATION Input failed validation (bad args, schema error)
ERR_PERMISSION Insufficient permissions to read or write target
ERR_CONFLICT Fingerprint mismatch — file changed since read
ERR_IO File-system I/O failure
ERR_INTERNAL_UNHANDLED Unexpected internal error

Architecture

src/docweave/
├── cli.py              # Typer app, all commands, entry point
├── envelope.py         # JSON envelope model & emit()
├── config.py           # ExitCode constants, RuntimeConfig
├── models.py           # Block, NormalizedDocument, SourceSpan
├── anchors.py          # Anchor parsing & resolution
├── validation.py       # Structural validation rules
├── journal.py          # Transaction journal (append-only log)
├── backends/
│   ├── base.py         # BackendAdapter ABC
│   ├── registry.py     # Backend auto-detection & registry
│   ├── markdown_native.py  # Markdown parser (markdown-it-py)
│   ├── docx_backend.py     # Word (.docx) backend (python-docx)
│   └── docx_annotations.py # Custom XML annotation storage
├── plan/
│   ├── schema.py       # PatchFile, OperationSpec (YAML → Pydantic)
│   ├── planner.py      # Anchor resolution → ExecutionPlan
│   ├── applier.py      # Atomic file writes with fingerprinting
│   └── applier_docx.py # Word-specific plan applier
├── diff/
│   ├── raw.py          # Line-level unified diff
│   └── semantic.py     # Block-level semantic diff
└── evidence/
    └── bundle.py       # Before/after snapshot bundles

Development

# Install dev dependencies
pip install -e ".[dev]"

# Or with uv
uv sync --extra dev

# Run tests
pytest tests/ -v

# Run linter
ruff check src/ tests/

# Run tests with coverage
pytest tests/ --cov=docweave --cov-report=term-missing

Tech Stack

Component Library
CLI framework Typer
Data models Pydantic v2
JSON output orjson
Markdown parse markdown-it-py
Word docs python-docx
Patch files PyYAML
Terminal UI Rich
Build system Hatchling

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feat/my-feature)
  3. Write tests for your changes
  4. Ensure pytest tests/ -v and ruff check src/ tests/ pass
  5. Submit a pull request

License

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docweave-0.10.1.tar.gz (603.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docweave-0.10.1-py3-none-any.whl (47.8 kB view details)

Uploaded Python 3

File details

Details for the file docweave-0.10.1.tar.gz.

File metadata

  • Download URL: docweave-0.10.1.tar.gz
  • Upload date:
  • Size: 603.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docweave-0.10.1.tar.gz
Algorithm Hash digest
SHA256 084309c630d5f447dcdc0a325716f061047595a01983fbd3b3d2aabea989f722
MD5 1c895f357124cf4faff05c28d9e883d3
BLAKE2b-256 91c12ed8aff25e51190f142bbf42ab540c812d9d4814f2eb8241ad2dbfca929a

See more details on using hashes here.

Provenance

The following attestation bundles were made for docweave-0.10.1.tar.gz:

Publisher: publish.yml on ThomasRohde/docweave

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docweave-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: docweave-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 47.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docweave-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3f2d980366484f1b8c10a893eca8c9dc8435839b6174d2edfa50c69c777f9583
MD5 8c8870a2a6e1e326e15c3a659608c96b
BLAKE2b-256 9bb85fd2e756e0a1b16fe24fec949ee5b2bd668948f9828f49ece07e36b21fdb

See more details on using hashes here.

Provenance

The following attestation bundles were made for docweave-0.10.1-py3-none-any.whl:

Publisher: publish.yml on ThomasRohde/docweave

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page