GroupDocs.Redaction for Python via .NET - Redact sensitive content from documents
Project description
Product Page | Docs | Demos | API Reference | Blog | Free Support | Temporary License
GroupDocs.Redaction for Python via .NET is a document-sanitization API for permanently removing sensitive information. Redact text by exact phrase or regular expression, scrub or replace metadata, remove or rewrite annotations, black out image regions, delete whole pages, and rasterize the result so nothing redacted can be recovered — across Word, Excel, PowerPoint, PDF, images, and text formats through one unified API, with no MS Office, OpenOffice, or other external software required.
Get Started
pip install groupdocs-redaction-net
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions
with Redactor("document.docx") as redactor:
redactor.apply(ExactPhraseRedaction("confidential", ReplacementOptions("[REDACTED]")))
redactor.save()
How It Works
The package is a self-contained Python wheel that bundles the embedded .NET runtime and every native dependency (SkiaSharp, Aspose.Drawing) needed to load, redact, and save documents. No external software installation is required — just pip install and start redacting. The wheel works across Python 3.5 – 3.14 on Windows, Linux, and macOS (Intel + Apple Silicon).
Features
- Text redaction — remove or replace text by exact phrase (case-sensitive or not) or by regular expression; draw a colored box over the match instead of replacing it.
- Metadata redaction — erase or rewrite document metadata by filter (author, company, comments, …) or by key/value pattern.
- Annotation redaction — replace annotation text by pattern or delete matching annotations entirely.
- Image-area redaction — black out a rectangular region of a page; combine text + image redaction for page areas.
- Page redaction — remove pages from the start or end of a document.
- Rasterization — flatten the output to a PDF (optionally with tilt / noise / border / grayscale) so redacted content cannot be extracted.
- Custom rules & callbacks — plug in a custom redaction handler or an
IRedactionCallbackto accept/reject each match. - Document introspection — read format, page count, size, and per-page dimensions before processing.
- Cross-Platform — Windows x64/x86, Linux x64, macOS x64/ARM64.
Common Tasks
- Strip personal data (names, SSNs, emails, phone numbers) from a contract before sharing
- Black out an exact phrase or a regex match across every page of a document
- Erase all metadata (or just author/company) from a DOCX, XLSX, or PDF
- Remove or rewrite annotations/comments left in a reviewed document
- Redact a fixed image region (a logo, a signature, a photo) on a page
- Rasterize a redacted document to PDF so nothing can be copied back out
Supported File Formats
For a complete list, see supported formats.
| Category | Formats |
|---|---|
| Word Processing | DOC, DOCX, DOCM, DOT, DOTX, DOTM, RTF, TXT, ODT, OTT |
| Spreadsheets | XLS, XLSX, XLSM, XLSB, CSV, TSV, ODS, OTS, NUMBERS |
| Presentations | PPT, PPTX, ODP |
| Fixed-Layout | |
| Images | BMP, JPG, JPEG, PNG, GIF, TIF, TIFF, JP2 |
| Web & Markdown | HTM, HTML, MD |
Examples
Redact an exact phrase
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions
with Redactor("document.docx") as redactor:
# case-insensitive by default; pass True for case-sensitive
redactor.apply(ExactPhraseRedaction("John Doe", True, ReplacementOptions("[CUSTOMER]")))
redactor.save()
Redact by regular expression
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import RegexRedaction, ReplacementOptions
with Redactor("document.docx") as redactor:
# numbers (e.g. SSNs, IDs)
redactor.apply(RegexRedaction(r"\d{2,}", ReplacementOptions("[NUMBER]")))
# email addresses
redactor.apply(RegexRedaction(r"[\w.%+-]+@[\w.-]+\.[A-Za-z]{2,}", ReplacementOptions("[EMAIL]")))
redactor.save()
Draw a black box instead of replacing text
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions
from groupdocs.pydrawing import Color
with Redactor("document.pdf") as redactor:
redactor.apply(ExactPhraseRedaction("Top Secret", ReplacementOptions(Color.BLACK)))
redactor.save()
Scrub metadata
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import EraseMetadataRedaction, MetadataSearchRedaction, MetadataFilters
with Redactor("document.docx") as redactor:
redactor.apply(EraseMetadataRedaction(MetadataFilters.ALL)) # wipe everything
redactor.apply(MetadataSearchRedaction(".*@acme\\.com", "[EMAIL]")) # rewrite by value pattern
redactor.save()
Black out an image region
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ImageAreaRedaction, RegionReplacementOptions
from groupdocs.pydrawing import Point, Size, Color
with Redactor("scan.pdf") as redactor:
redactor.apply(ImageAreaRedaction(Point(50, 60), RegionReplacementOptions(Color.BLACK, Size(200, 80))))
redactor.save()
Save in the original format (no rasterization)
By default save() rasterizes the result to a PDF and appends a _Redacted suffix. To keep the source format instead:
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import RegexRedaction, ReplacementOptions
from groupdocs.redaction.options import SaveOptions
with Redactor("document.docx") as redactor:
redactor.apply(RegexRedaction(r"\d+", ReplacementOptions("[NUM]")))
redactor.save(SaveOptions(rasterize_to_pdf=False)) # writes document_Redacted.docx
Get document info
from groupdocs.redaction import Redactor
with Redactor("document.pdf") as redactor:
info = redactor.get_document_info()
print("Type:", info.file_type.file_format)
print("Pages:", info.page_count)
print("Size:", info.size, "bytes")
Redact from / to binary streams
import io
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions
from groupdocs.redaction.options import RasterizationOptions
with open("document.docx", "rb") as src:
with Redactor(src) as redactor:
redactor.apply(ExactPhraseRedaction("secret", ReplacementOptions("[X]")))
ro = RasterizationOptions(); ro.enabled = False # keep original format
buffer = io.BytesIO()
redactor.save(buffer, ro) # BytesIO is updated after save
data = buffer.getvalue()
AI Agent & LLM Friendly
This package is designed for seamless integration with AI agents, LLMs, and automated code generation tools.
AGENTS.mdin the package — AI coding assistants (Claude Code, Cursor, GitHub Copilot) auto-discover the API surface, usage patterns, and troubleshooting tips from the installed package- MCP server — connect your AI tool to GroupDocs documentation for on-demand API lookups:
{ "mcpServers": { "groupdocs-docs": { "url": "https://docs.groupdocs.com/mcp" } } }
- Machine-readable docs — full documentation available as plain text for RAG and LLM context:
- Single file:
https://docs.groupdocs.com/redaction/python-net/llms-full.txt - Per page: append
.mdto any docs URL
- Single file:
Evaluation Mode
The API works without a license in evaluation mode, with these limitations:
- Only one document may be opened per process.
- Output is restricted: PDF output carries an evaluation watermark and other formats show an equivalent evaluation mark.
To remove these limitations, apply a license or request a temporary license:
from groupdocs.redaction import License
License().set_license("path/to/license.lic")
Or set the environment variable (auto-applied at import):
export GROUPDOCS_LIC_PATH="path/to/license.lic"
Troubleshooting
| Issue | Platform | Fix |
|---|---|---|
Trial mode allows only 1 document to open |
All | Apply a license — License().set_license(...) or set GROUPDOCS_LIC_PATH |
System.Drawing.Common is not supported |
Linux/macOS | apt-get install libgdiplus (Linux) or brew install mono-libgdiplus (macOS) |
The type initializer for 'Gdip' threw an exception |
macOS | brew install mono-libgdiplus |
| Garbled text / missing fonts in output | Linux | apt-get install ttf-mscorefonts-installer fontconfig && fc-cache -f |
DOTNET_SYSTEM_GLOBALIZATION_INVARIANT errors |
Linux | Do NOT set this variable. ICU must be available. |
IncorrectPasswordException / PasswordRequiredException |
All | Open with Redactor(path, LoadOptions(password="...")) |
System Requirements
- Python 3.5 - 3.14
- Windows x64/x86, Linux x64, macOS x64/ARM64
- No additional software required
More Resources
Also available for other platforms: .NET | Java
Product Page | Docs | Demos | API Reference | Blog | Free Support | Temporary License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file groupdocs_redaction_net-26.6.0-py3-none-win_amd64.whl.
File metadata
- Download URL: groupdocs_redaction_net-26.6.0-py3-none-win_amd64.whl
- Upload date:
- Size: 119.3 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2db68fd04e17ad6cd56bc318171291d9c66cfed0911d6f4662163025b40f401
|
|
| MD5 |
571694a46627a454b205bae9bd932b84
|
|
| BLAKE2b-256 |
c1b6f167d81bf9c554d7ce80d70afb9a399709dcb2cad7e2147f60053ba81b74
|
File details
Details for the file groupdocs_redaction_net-26.6.0-py3-none-manylinux1_x86_64.whl.
File metadata
- Download URL: groupdocs_redaction_net-26.6.0-py3-none-manylinux1_x86_64.whl
- Upload date:
- Size: 118.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6546caa833157800f82c224d5c1c14392460fd393a3e422bb35c29997df1b1d7
|
|
| MD5 |
3f5691ce041ad062aae98b11192f9a74
|
|
| BLAKE2b-256 |
78d488fe1d8ffd4358f3ec056f7b095abdc5b2a9c1d538a0cd37e1b3313a90ae
|
File details
Details for the file groupdocs_redaction_net-26.6.0-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: groupdocs_redaction_net-26.6.0-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 118.2 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bc00ff3165ee6ebce067431c07b007bf4a10fd0529c44c53a94ffd742b197c8
|
|
| MD5 |
863e678d77aa5f60df741a2b8e032793
|
|
| BLAKE2b-256 |
3497331c166f12a1f03568be9c96ed6f31804cac7dd1963e3b441a87d47ba64a
|
File details
Details for the file groupdocs_redaction_net-26.6.0-py3-none-macosx_10_14_x86_64.whl.
File metadata
- Download URL: groupdocs_redaction_net-26.6.0-py3-none-macosx_10_14_x86_64.whl
- Upload date:
- Size: 120.4 MB
- Tags: Python 3, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
341f40aae6b07a41cdabc23c3b59e6055f9a2696cbc737e1795eb181fa76bc9b
|
|
| MD5 |
b7641467a8167a31a06f6ef20f4a3e33
|
|
| BLAKE2b-256 |
19a309a6160b971a98b5659946b938f10d7f31231bcec171f6ce817cb3a44b63
|