Little tools for dealing with docx files, useful for lawyers and their LLMs
Project description
lawdocx
Little tools for dealing with docx files, useful for lawyers and their LLMs.
Command-line micro-tools for surfacing the mechanical artifacts that linger inside .docx contracts. Each subcommand reads one or more files (or - for stdin) and emits a newline-terminated JSON envelope that is easy for humans, scripts, or LLM prompts to consume.
Quickstart
pip install lawdocx
lawdocx comments draft.docx --verbose
Every command accepts:
- One or more
PATHarguments (globs expanded, duplicates removed;-reads stdin). --output/-oto write to a file instead of stdout.--severity(info/warning/error) to drop lower-severity findings.--fail-on-findings/-fto exit non-zero when any warning or error remains after filtering.--verbose/-vfor progress plus a severity summary.
JSON shape (stable across tools)
- Envelope:
{ lawdocx_version, tool, generated_at, files: [...] }. - File entries:
{ path, sha256, items: [...] }wherepathis the CLI display name (orstdin). - Findings:
{ id, type, severity, location, context, details }with tool-specificdetails. - Context windows come from the triggering text span;
locationalways includesstoryplus paragraph indices, with extra fields per tool.
Tools (what they actually scan)
lawdocx comments– DOCX comment threads with resolved/done markers and optional parent pointers.lawdocx changes– Tracked insertions/deletions/moves across body, headers, footers, footnotes, and endnotes. Captures author/date attributes when present.lawdocx brackets– Finds balanced square brackets by default. Pass-p/--patternmultiple times to supply custom regex (DOTALL + MULTILINE). Scans body, headers, footers, footnotes, and endnotes.lawdocx todos– TODO/NTD/TBD/placeholder markers using a fixed regex list (e.g.,TODO,FIXME,[?], "client to confirm") across body, headers, and footers.lawdocx boilerplate– Header/footer boilerplate (draft legends, firm footers, page-number artifacts, placeholder dates, temp paths) using the built-in regex catalog. Records section number and header/footer type when available.lawdocx metadata– Core, extended, and custom properties plus custom XML part references. Marks extraction failures as errors.lawdocx footnotes– Footnote/endnote references wherever they appear (body, headers, footers, note stories). Includes rendered note text when present; flags missing text indetails.status.lawdocx highlights– Highlighted runs and their colors across body, headers, footers, footnotes, and endnotes.lawdocx outline– Flags manual numbering (error) or suspicious numbering (warning) in body paragraphs that are not styled as headings. Uses simple pattern checks rather than rebuilding the full outline.lawdocx audit– Runs a selected subset of the above (--only/--exclude). Produces an outer envelope withtools: [...]containing each subcommand’s filtered output and totals across all tools.
How humans might use it
- Pre-send scrub: Run
lawdocx boilerplateandlawdocx commentsbefore emailing a draft; combine with--fail-on-findingsin a CI job to block warnings/errors. - Redline triage: Nightly
lawdocx changes deal/*.docx > changes.jsonto list insertions/deletions with context and authors for partner review. - Placeholder sweep:
lawdocx bracketsandlawdocx todosover a folder to collect unresolved variables and TODOs into a single JSON line per file. - Numbering sanity check:
lawdocx outlineon inbound paper to spot manual numbering in non-heading paragraphs that may break cross-references.
How LLM workflows can consume it
- Use
severityto gate reasoning (e.g., summarize onlywarning/error). - Ground prompts with
context.targetfor verbatim snippets andcontext.before/afterfor surrounding text;location.story+ paragraph indices help build human-readable pointers. - When reading audit output, iterate
envelope["tools"]—each nested tool already filtered by the CLI--severityvalue.
Contributing
- Follow the existing pattern: a small
collect_*helper plus arun_*wrapper that hashes inputs, supports stdin, and returns abuild_envelope(...)result. - Keep severities consistent with the tool’s intent (current tools mostly emit
infofor informational data andwarningwhen action may be needed). - Add new commands to
TOOL_RUNNERSinsrc/lawdocx/cli.pyso audit mode can orchestrate them, and prefer deterministic outputs for easy downstream use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lawdocx-0.2.0.tar.gz.
File metadata
- Download URL: lawdocx-0.2.0.tar.gz
- Upload date:
- Size: 39.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e13cc1782809219d11b907aef3aefdcfe8de351269d3b83db3a4a02146d15c8
|
|
| MD5 |
9d02035b6927170e0b4e736cb61200f8
|
|
| BLAKE2b-256 |
38b2740ef45bd5ef25675e9ea4018caa63ea90cf9add5b5a08dd5131b8063c29
|
Provenance
The following attestation bundles were made for lawdocx-0.2.0.tar.gz:
Publisher:
publish-to-pypi.yml on splittist/lawdocx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lawdocx-0.2.0.tar.gz -
Subject digest:
7e13cc1782809219d11b907aef3aefdcfe8de351269d3b83db3a4a02146d15c8 - Sigstore transparency entry: 743475730
- Sigstore integration time:
-
Permalink:
splittist/lawdocx@c8037846f0c7f7ac10810f8bbeb52c1b14abc616 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/splittist
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@c8037846f0c7f7ac10810f8bbeb52c1b14abc616 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lawdocx-0.2.0-py3-none-any.whl.
File metadata
- Download URL: lawdocx-0.2.0-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b4257cd5a633b91b087d8a59275bbd2dfa6330061e345166a6836604fe72e52
|
|
| MD5 |
841b1c0d3daa1abb14807d1dd10b97b2
|
|
| BLAKE2b-256 |
978f0e265bd83bc9ecc4e3ece290546ae7ec734e0675a47ec7e436fa606d1b12
|
Provenance
The following attestation bundles were made for lawdocx-0.2.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on splittist/lawdocx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lawdocx-0.2.0-py3-none-any.whl -
Subject digest:
0b4257cd5a633b91b087d8a59275bbd2dfa6330061e345166a6836604fe72e52 - Sigstore transparency entry: 743475733
- Sigstore integration time:
-
Permalink:
splittist/lawdocx@c8037846f0c7f7ac10810f8bbeb52c1b14abc616 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/splittist
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@c8037846f0c7f7ac10810f8bbeb52c1b14abc616 -
Trigger Event:
push
-
Statement type: