Skip to main content

Scan a word document for phrases.

Project description

📄 checkdoc

checkdoc is a lightweight Python utility for scanning .docx Word documents for specified regex patterns. It can be used as a command-line tool or imported as a module in larger projects.


🔍 Features

  • Search for multiple regex patterns (supports full Python regex syntax)
  • Scans both paragraphs and tables
  • Case-insensitive matching by default
  • Simple CLI interface

📦 Installation

The recommended usage is with uv:

uvx checkdoc

You can install the required dependencies using pip:

pip install python-docx

or uv:

uv add checkdoc

🚀 Usage

Command Line

uvx checkdoc path/to/document.docx pattern1 pattern2 ...

Examples:

Search for email addresses and phone numbers:

uvx checkdoc report.docx "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" "\d{3}-\d{3}-\d{4}"

Search for any 4-digit year:

uvx checkdoc contract.docx "\b(19|20)\d{2}\b"

Search for the exact word "confidential" (not part of other words):

uvx checkdoc memo.docx "\bconfidential\b"

As a Module

from checkdoc import find_phrases_in_docx

* Use raw strings (r"") to avoid escaping issues
matches = find_phrases_in_docx("test.docx", [r"\bemail\b", r"\d{4}"])
for line in matches:
    print(line)

🧠 How It Works

  • Loads the .docx file using python-docx
  • Scans all paragraphs and table cells
  • Uses Python’s re module to match regex patterns case-insensitively
  • Returns matching text snippets (full paragraph/cell content)

📝 Regex Tips

  • Use raw strings (r"pattern") in Python code to avoid backslash escaping issues.
  • Patterns are automatically matched case-insensitively.
  • Use \b for word boundaries to avoid partial matches.
  • Use ^ and $ to match entire lines if needed.
  • Test complex patterns with tools like regex101.com first.

🛠 Requirements

  • Python 3.13+
  • python-docx

📃 License

GNU GPLv3 License. See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

checkdoc-0.2.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

checkdoc-0.2.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file checkdoc-0.2.0.tar.gz.

File metadata

  • Download URL: checkdoc-0.2.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.2

File hashes

Hashes for checkdoc-0.2.0.tar.gz
Algorithm Hash digest
SHA256 188d465395c58de3bde7ffbf4e1752c852b15a7bf8996d414be1589e91fc502a
MD5 c5a898c499c1b2a08503ad801701a88b
BLAKE2b-256 a64e0426a72327050a1bfaabf55d8b19de5e871116cfe0a70e02e1a0a9498618

See more details on using hashes here.

File details

Details for the file checkdoc-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: checkdoc-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.2

File hashes

Hashes for checkdoc-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 173eb1f2213dce9112317838572cd887866b693ec39610786f5ac90f1a028200
MD5 d635e55b5ef35feb84183678dbddfe62
BLAKE2b-256 1e8f2fe13384890f1a3978c3d74fa2e0b29fc7ab6cd905c93b179f3526d9843f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page