Skip to main content

Little tools for dealing with docx files, useful for lawyers and their LLMs

Project description

lawdocx

Little tools for dealing with docx files, useful for lawyers and their LLMs.

Command-line micro-tools for surfacing the mechanical artifacts that linger inside .docx contracts. Each subcommand reads one or more files (or - for stdin) and emits a newline-terminated JSON envelope that is easy for humans, scripts, or LLM prompts to consume.

Quickstart

pip install lawdocx
lawdocx comments draft.docx --verbose

Every command accepts:

  • One or more PATH arguments (globs expanded, duplicates removed; - reads stdin).
  • --output/-o to write to a file instead of stdout.
  • --severity (info/warning/error) to drop lower-severity findings.
  • --fail-on-findings/-f to exit non-zero when any warning or error remains after filtering.
  • --verbose/-v for progress plus a severity summary.

JSON shape (stable across tools)

  • Envelope: { lawdocx_version, tool, generated_at, files: [...] }.
  • File entries: { path, sha256, items: [...] } where path is the CLI display name (or stdin).
  • Findings: { id, type, severity, location, context, details } with tool-specific details.
  • Context windows come from the triggering text span; location always includes story plus paragraph indices, with extra fields per tool.

Tools (what they actually scan)

  • lawdocx comments – DOCX comment threads with resolved/done markers and optional parent pointers.
  • lawdocx changes – Tracked insertions/deletions/moves across body, headers, footers, footnotes, and endnotes. Captures author/date attributes when present.
  • lawdocx brackets – Finds balanced square brackets by default. Pass -p/--pattern multiple times to supply custom regex (DOTALL + MULTILINE). Scans body, headers, footers, footnotes, and endnotes.
  • lawdocx todos – TODO/NTD/TBD/placeholder markers using a fixed regex list (e.g., TODO, FIXME, [?], "client to confirm") across body, headers, and footers.
  • lawdocx boilerplate – Header/footer boilerplate (draft legends, firm footers, page-number artifacts, placeholder dates, temp paths) using the built-in regex catalog. Records section number and header/footer type when available.
  • lawdocx metadata – Core, extended, and custom properties plus custom XML part references. Marks extraction failures as errors.
  • lawdocx footnotes – Footnote/endnote references wherever they appear (body, headers, footers, note stories). Includes rendered note text when present; flags missing text in details.status.
  • lawdocx highlights – Highlighted runs and their colors across body, headers, footers, footnotes, and endnotes.
  • lawdocx outline – Flags manual numbering (error) or suspicious numbering (warning) in body paragraphs that are not styled as headings. Uses simple pattern checks rather than rebuilding the full outline.
  • lawdocx audit – Runs a selected subset of the above (--only/--exclude). Produces an outer envelope with tools: [...] containing each subcommand’s filtered output and totals across all tools.

How humans might use it

  • Pre-send scrub: Run lawdocx boilerplate and lawdocx comments before emailing a draft; combine with --fail-on-findings in a CI job to block warnings/errors.
  • Redline triage: Nightly lawdocx changes deal/*.docx > changes.json to list insertions/deletions with context and authors for partner review.
  • Placeholder sweep: lawdocx brackets and lawdocx todos over a folder to collect unresolved variables and TODOs into a single JSON line per file.
  • Numbering sanity check: lawdocx outline on inbound paper to spot manual numbering in non-heading paragraphs that may break cross-references.

How LLM workflows can consume it

  • Use severity to gate reasoning (e.g., summarize only warning/error).
  • Ground prompts with context.target for verbatim snippets and context.before/after for surrounding text; location.story + paragraph indices help build human-readable pointers.
  • When reading audit output, iterate envelope["tools"]—each nested tool already filtered by the CLI --severity value.

Contributing

  • Follow the existing pattern: a small collect_* helper plus a run_* wrapper that hashes inputs, supports stdin, and returns a build_envelope(...) result.
  • Keep severities consistent with the tool’s intent (current tools mostly emit info for informational data and warning when action may be needed).
  • Add new commands to TOOL_RUNNERS in src/lawdocx/cli.py so audit mode can orchestrate them, and prefer deterministic outputs for easy downstream use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lawdocx-0.2.0.tar.gz (39.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lawdocx-0.2.0-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file lawdocx-0.2.0.tar.gz.

File metadata

  • Download URL: lawdocx-0.2.0.tar.gz
  • Upload date:
  • Size: 39.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lawdocx-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7e13cc1782809219d11b907aef3aefdcfe8de351269d3b83db3a4a02146d15c8
MD5 9d02035b6927170e0b4e736cb61200f8
BLAKE2b-256 38b2740ef45bd5ef25675e9ea4018caa63ea90cf9add5b5a08dd5131b8063c29

See more details on using hashes here.

Provenance

The following attestation bundles were made for lawdocx-0.2.0.tar.gz:

Publisher: publish-to-pypi.yml on splittist/lawdocx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lawdocx-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: lawdocx-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lawdocx-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b4257cd5a633b91b087d8a59275bbd2dfa6330061e345166a6836604fe72e52
MD5 841b1c0d3daa1abb14807d1dd10b97b2
BLAKE2b-256 978f0e265bd83bc9ecc4e3ece290546ae7ec734e0675a47ec7e436fa606d1b12

See more details on using hashes here.

Provenance

The following attestation bundles were made for lawdocx-0.2.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on splittist/lawdocx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page