Offline document anonymizer for legal teams
Project description
anonymizer
Offline document anonymizer for legal teams. Replaces personally identifiable information (PII) in documents with structured tokens before sending them to external AI services.
Status: MVP-0 release candidate.
What it does
Drag a file (docx / pdf with text layer / xlsx) into the local web UI and get an anonymized document where:
- Names, companies, financial details, addresses, emails, phones are replaced with structured tokens like
[Person_1],[Company_1],[ADDRESS_1], ... - Document metadata is cleared
- No network calls during processing — runs entirely on your machine
Then send the result to your AI tool of choice.
MVP-0 scope
- Formats:
docx,pdfwith text layer,xlsx - Languages: Russian, English (NER); language-agnostic detectors for emails, phones, IBAN, cards, IP/MAC/URL, dates, geocoordinates
- Platforms: Windows + macOS
- UI: local web app at
127.0.0.1in your browser - Install: single curl one-liner →
uv tool install docs-anonymizer
OCR for scanned PDFs, password-protected files, additional languages — planned for later iterations (MVP-1+).
Installation
# macOS / Linux
curl -fsSL https://anonymizer.site/install.sh | sh
# Windows (PowerShell)
iwr -useb https://anonymizer.site/install.ps1 | iex
Then run anonymize — your browser will open at http://127.0.0.1:<port>.
Stack
Python 3.11+, FastAPI + htmx, spaCy + Natasha, PyMuPDF, python-docx, openpyxl, lxml. Full details in the technical spec.
Architecture
Three-layer design — core (headless Python library), cli, webapp (FastAPI on loopback) — plus testkit for synthetic test corpus generation and feedback loop tooling. Detectors are pluggable; language packs are drop-in. Manual masking + audit logging without PII leakage.
Licenses
The project is released under AGPL-3.0 because it depends on PyMuPDF (AGPL). All other dependencies are permissive open-source (MIT / Apache 2.0 / BSD / MPL). The source distribution published with each release contains the project source needed to satisfy AGPL source-availability obligations.
A page in the application UI will list all bundled libraries and models with their individual licenses.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docs_anonymizer-0.2.13.tar.gz.
File metadata
- Download URL: docs_anonymizer-0.2.13.tar.gz
- Upload date:
- Size: 503.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e767515c5ecee02b72964238f9a54dc8a12169f8e7dcc2bcc001f936ee21b18c
|
|
| MD5 |
e7c8ec8d4d1dca9ef12f45fdf8a56304
|
|
| BLAKE2b-256 |
b1bad6f20d49a049a6973c6b5fd581c23520f030c33a6bbf9203d4406d6d61b7
|
File details
Details for the file docs_anonymizer-0.2.13-py3-none-any.whl.
File metadata
- Download URL: docs_anonymizer-0.2.13-py3-none-any.whl
- Upload date:
- Size: 266.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
107c622b68807da688e1370b7686d23c93166299eb26b4ae2d89b3aaeb9fa6c4
|
|
| MD5 |
3c4c333a81f782579a12aa8aa5df9f93
|
|
| BLAKE2b-256 |
71dcf1c0ed84555ed878804c6ed7230533f26b9f23806c5e7755adf10cfd097d
|