Scan a word document for phrases.
Project description
📄 checkdoc
checkdoc is a lightweight Python utility for scanning .docx Word documents for specified regex patterns. It can be used as a command-line tool or imported as a module in larger projects.
🔍 Features
- Search for multiple regex patterns (supports full Python regex syntax)
- Scans both paragraphs and tables
- Case-insensitive matching by default
- Simple CLI interface
📦 Installation
The recommended usage is with uv:
uvx checkdoc
You can install the required dependencies using pip:
pip install python-docx
or uv:
uv add checkdoc
🚀 Usage
Command Line
uvx checkdoc path/to/document.docx pattern1 pattern2 ...
Examples:
Search for email addresses and phone numbers:
uvx checkdoc report.docx "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" "\d{3}-\d{3}-\d{4}"
Search for any 4-digit year:
uvx checkdoc contract.docx "\b(19|20)\d{2}\b"
Search for the exact word "confidential" (not part of other words):
uvx checkdoc memo.docx "\bconfidential\b"
As a Module
from checkdoc import find_phrases_in_docx
* Use raw strings (r"") to avoid escaping issues
matches = find_phrases_in_docx("test.docx", [r"\bemail\b", r"\d{4}"])
for line in matches:
print(line)
🧠 How It Works
- Loads the
.docxfile usingpython-docx - Scans all paragraphs and table cells
- Uses Python’s
remodule to match regex patterns case-insensitively - Returns matching text snippets (full paragraph/cell content)
📝 Regex Tips
- Use raw strings (
r"pattern") in Python code to avoid backslash escaping issues. - Patterns are automatically matched case-insensitively.
- Use
\bfor word boundaries to avoid partial matches. - Use
^and$to match entire lines if needed. - Test complex patterns with tools like regex101.com first.
🛠 Requirements
- Python 3.13+
python-docx
📃 License
GNU GPLv3 License. See LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file checkdoc-0.2.0.tar.gz.
File metadata
- Download URL: checkdoc-0.2.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
188d465395c58de3bde7ffbf4e1752c852b15a7bf8996d414be1589e91fc502a
|
|
| MD5 |
c5a898c499c1b2a08503ad801701a88b
|
|
| BLAKE2b-256 |
a64e0426a72327050a1bfaabf55d8b19de5e871116cfe0a70e02e1a0a9498618
|
File details
Details for the file checkdoc-0.2.0-py3-none-any.whl.
File metadata
- Download URL: checkdoc-0.2.0-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
173eb1f2213dce9112317838572cd887866b693ec39610786f5ac90f1a028200
|
|
| MD5 |
d635e55b5ef35feb84183678dbddfe62
|
|
| BLAKE2b-256 |
1e8f2fe13384890f1a3978c3d74fa2e0b29fc7ab6cd905c93b179f3526d9843f
|