Skip to main content

GroupDocs.Redaction for Python via .NET - Redact sensitive content from documents

Project description

banner

PyPI PyPI - Python Version

Product Page | Docs | Demos | API Reference | Blog | Free Support | Temporary License

GroupDocs.Redaction for Python via .NET is a document-sanitization API for permanently removing sensitive information. Redact text by exact phrase or regular expression, scrub or replace metadata, remove or rewrite annotations, black out image regions, delete whole pages, and rasterize the result so nothing redacted can be recovered — across Word, Excel, PowerPoint, PDF, images, and text formats through one unified API, with no MS Office, OpenOffice, or other external software required.

Get Started

pip install groupdocs-redaction-net
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions

with Redactor("document.docx") as redactor:
    redactor.apply(ExactPhraseRedaction("confidential", ReplacementOptions("[REDACTED]")))
    redactor.save()

How It Works

The package is a self-contained Python wheel that bundles the embedded .NET runtime and every native dependency (SkiaSharp, Aspose.Drawing) needed to load, redact, and save documents. No external software installation is required — just pip install and start redacting. The wheel works across Python 3.5 – 3.14 on Windows, Linux, and macOS (Intel + Apple Silicon).

Features

  • Text redaction — remove or replace text by exact phrase (case-sensitive or not) or by regular expression; draw a colored box over the match instead of replacing it.
  • Metadata redaction — erase or rewrite document metadata by filter (author, company, comments, …) or by key/value pattern.
  • Annotation redaction — replace annotation text by pattern or delete matching annotations entirely.
  • Image-area redaction — black out a rectangular region of a page; combine text + image redaction for page areas.
  • Page redaction — remove pages from the start or end of a document.
  • Rasterization — flatten the output to a PDF (optionally with tilt / noise / border / grayscale) so redacted content cannot be extracted.
  • Custom rules & callbacks — plug in a custom redaction handler or an IRedactionCallback to accept/reject each match.
  • Document introspection — read format, page count, size, and per-page dimensions before processing.
  • Cross-Platform — Windows x64/x86, Linux x64, macOS x64/ARM64.

Common Tasks

  • Strip personal data (names, SSNs, emails, phone numbers) from a contract before sharing
  • Black out an exact phrase or a regex match across every page of a document
  • Erase all metadata (or just author/company) from a DOCX, XLSX, or PDF
  • Remove or rewrite annotations/comments left in a reviewed document
  • Redact a fixed image region (a logo, a signature, a photo) on a page
  • Rasterize a redacted document to PDF so nothing can be copied back out

Supported File Formats

For a complete list, see supported formats.

Category Formats
Word Processing DOC, DOCX, DOCM, DOT, DOTX, DOTM, RTF, TXT, ODT, OTT
Spreadsheets XLS, XLSX, XLSM, XLSB, CSV, TSV, ODS, OTS, NUMBERS
Presentations PPT, PPTX, ODP
Fixed-Layout PDF
Images BMP, JPG, JPEG, PNG, GIF, TIF, TIFF, JP2
Web & Markdown HTM, HTML, MD

Examples

Redact an exact phrase

from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions

with Redactor("document.docx") as redactor:
    # case-insensitive by default; pass True for case-sensitive
    redactor.apply(ExactPhraseRedaction("John Doe", True, ReplacementOptions("[CUSTOMER]")))
    redactor.save()

Redact by regular expression

from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import RegexRedaction, ReplacementOptions

with Redactor("document.docx") as redactor:
    # numbers (e.g. SSNs, IDs)
    redactor.apply(RegexRedaction(r"\d{2,}", ReplacementOptions("[NUMBER]")))
    # email addresses
    redactor.apply(RegexRedaction(r"[\w.%+-]+@[\w.-]+\.[A-Za-z]{2,}", ReplacementOptions("[EMAIL]")))
    redactor.save()

Draw a black box instead of replacing text

from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions
from groupdocs.pydrawing import Color

with Redactor("document.pdf") as redactor:
    redactor.apply(ExactPhraseRedaction("Top Secret", ReplacementOptions(Color.BLACK)))
    redactor.save()

Scrub metadata

from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import EraseMetadataRedaction, MetadataSearchRedaction, MetadataFilters

with Redactor("document.docx") as redactor:
    redactor.apply(EraseMetadataRedaction(MetadataFilters.ALL))           # wipe everything
    redactor.apply(MetadataSearchRedaction(".*@acme\\.com", "[EMAIL]"))   # rewrite by value pattern
    redactor.save()

Black out an image region

from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ImageAreaRedaction, RegionReplacementOptions
from groupdocs.pydrawing import Point, Size, Color

with Redactor("scan.pdf") as redactor:
    redactor.apply(ImageAreaRedaction(Point(50, 60), RegionReplacementOptions(Color.BLACK, Size(200, 80))))
    redactor.save()

Save in the original format (no rasterization)

By default save() rasterizes the result to a PDF and appends a _Redacted suffix. To keep the source format instead:

from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import RegexRedaction, ReplacementOptions
from groupdocs.redaction.options import SaveOptions

with Redactor("document.docx") as redactor:
    redactor.apply(RegexRedaction(r"\d+", ReplacementOptions("[NUM]")))
    redactor.save(SaveOptions(rasterize_to_pdf=False))   # writes document_Redacted.docx

Get document info

from groupdocs.redaction import Redactor

with Redactor("document.pdf") as redactor:
    info = redactor.get_document_info()
    print("Type:", info.file_type.file_format)
    print("Pages:", info.page_count)
    print("Size:", info.size, "bytes")

Redact from / to binary streams

import io
from groupdocs.redaction import Redactor
from groupdocs.redaction.redactions import ExactPhraseRedaction, ReplacementOptions
from groupdocs.redaction.options import RasterizationOptions

with open("document.docx", "rb") as src:
    with Redactor(src) as redactor:
        redactor.apply(ExactPhraseRedaction("secret", ReplacementOptions("[X]")))
        ro = RasterizationOptions(); ro.enabled = False     # keep original format
        buffer = io.BytesIO()
        redactor.save(buffer, ro)                            # BytesIO is updated after save
        data = buffer.getvalue()

AI Agent & LLM Friendly

This package is designed for seamless integration with AI agents, LLMs, and automated code generation tools.

  • AGENTS.md in the package — AI coding assistants (Claude Code, Cursor, GitHub Copilot) auto-discover the API surface, usage patterns, and troubleshooting tips from the installed package
  • MCP server — connect your AI tool to GroupDocs documentation for on-demand API lookups:
    { "mcpServers": { "groupdocs-docs": { "url": "https://docs.groupdocs.com/mcp" } } }
    
  • Machine-readable docs — full documentation available as plain text for RAG and LLM context:
    • Single file: https://docs.groupdocs.com/redaction/python-net/llms-full.txt
    • Per page: append .md to any docs URL

Evaluation Mode

The API works without a license in evaluation mode, with these limitations:

  • Only one document may be opened per process.
  • Output is restricted: PDF output carries an evaluation watermark and other formats show an equivalent evaluation mark.

To remove these limitations, apply a license or request a temporary license:

from groupdocs.redaction import License
License().set_license("path/to/license.lic")

Or set the environment variable (auto-applied at import):

export GROUPDOCS_LIC_PATH="path/to/license.lic"

Troubleshooting

Issue Platform Fix
Trial mode allows only 1 document to open All Apply a license — License().set_license(...) or set GROUPDOCS_LIC_PATH
System.Drawing.Common is not supported Linux/macOS apt-get install libgdiplus (Linux) or brew install mono-libgdiplus (macOS)
The type initializer for 'Gdip' threw an exception macOS brew install mono-libgdiplus
Garbled text / missing fonts in output Linux apt-get install ttf-mscorefonts-installer fontconfig && fc-cache -f
DOTNET_SYSTEM_GLOBALIZATION_INVARIANT errors Linux Do NOT set this variable. ICU must be available.
IncorrectPasswordException / PasswordRequiredException All Open with Redactor(path, LoadOptions(password="..."))

System Requirements

  • Python 3.5 - 3.14
  • Windows x64/x86, Linux x64, macOS x64/ARM64
  • No additional software required

More Resources

Also available for other platforms: .NET | Java


Product Page | Docs | Demos | API Reference | Blog | Free Support | Temporary License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

groupdocs_redaction_net-26.6.0-py3-none-win_amd64.whl (119.3 MB view details)

Uploaded Python 3Windows x86-64

groupdocs_redaction_net-26.6.0-py3-none-macosx_11_0_arm64.whl (118.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

groupdocs_redaction_net-26.6.0-py3-none-macosx_10_14_x86_64.whl (120.4 MB view details)

Uploaded Python 3macOS 10.14+ x86-64

File details

Details for the file groupdocs_redaction_net-26.6.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for groupdocs_redaction_net-26.6.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 e2db68fd04e17ad6cd56bc318171291d9c66cfed0911d6f4662163025b40f401
MD5 571694a46627a454b205bae9bd932b84
BLAKE2b-256 c1b6f167d81bf9c554d7ce80d70afb9a399709dcb2cad7e2147f60053ba81b74

See more details on using hashes here.

File details

Details for the file groupdocs_redaction_net-26.6.0-py3-none-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for groupdocs_redaction_net-26.6.0-py3-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6546caa833157800f82c224d5c1c14392460fd393a3e422bb35c29997df1b1d7
MD5 3f5691ce041ad062aae98b11192f9a74
BLAKE2b-256 78d488fe1d8ffd4358f3ec056f7b095abdc5b2a9c1d538a0cd37e1b3313a90ae

See more details on using hashes here.

File details

Details for the file groupdocs_redaction_net-26.6.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for groupdocs_redaction_net-26.6.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5bc00ff3165ee6ebce067431c07b007bf4a10fd0529c44c53a94ffd742b197c8
MD5 863e678d77aa5f60df741a2b8e032793
BLAKE2b-256 3497331c166f12a1f03568be9c96ed6f31804cac7dd1963e3b441a87d47ba64a

See more details on using hashes here.

File details

Details for the file groupdocs_redaction_net-26.6.0-py3-none-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for groupdocs_redaction_net-26.6.0-py3-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 341f40aae6b07a41cdabc23c3b59e6055f9a2696cbc737e1795eb181fa76bc9b
MD5 b7641467a8167a31a06f6ef20f4a3e33
BLAKE2b-256 19a309a6160b971a98b5659946b938f10d7f31231bcec171f6ce817cb3a44b63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page