CLI for a tool to anonymize PDF, Markdown, and plain text files using LLMs.
Project description
🦉🫥 PDF Anonymizer CLI
A command-line interface for anonymizing PDF, Markdown, and plain text files using LLMs.
- High-Quality Anonymization: Leverages LLMs to identify and replace Personally Identifiable Information (PII) with high accuracy.
- Large File Support: Consistently anonymizes large files (tested up to 1GB).
- Multi-Provider & Cost-Effective: Free to use with local Ollama models. It also supports major providers like OpenAI, Anthropic, Google, Hugging Face, and OpenRouter.
- Reversible: Supports deanonymization to recover original data when needed.
- Multi-Format: Works with PDF, Markdown, and plain text files.
Installation
Install the CLI with your favorite package manager. To use a specific LLM provider, you must install the corresponding extra.
- Google:
pip install "pdf-anonymizer-cli[google]" - Ollama:
pip install "pdf-anonymizer-cli[ollama]" - Hugging Face:
pip install "pdf-anonymizer-cli[huggingface]" - OpenRouter:
pip install "pdf-anonymizer-cli[openrouter]" - OpenAI:
pip install "pdf-anonymizer-cli[openai]" - Anthropic:
pip install "pdf-anonymizer-cli[anthropic]"
You can also install multiple extras at once:
pip install "pdf-anonymizer-cli[google,openrouter]"
This installs the pdf-anonymizer executable.
Environment Variables
The CLI will automatically load a .env file from the current directory or any parent directory. For consistency, it's recommended to place a single .env file at the root of the repository.
GOOGLE_API_KEY: Required when using Google models.HUGGING_FACE_TOKEN: Required when using Hugging Face models. You can get a token from here.OPENROUTER_API_KEY: Required when using OpenRouter models.OPENAI_API_KEY: Required when using OpenAI models.ANTHROPIC_API_KEY: Required when using Anthropic models.OLLAMA_HOST: Optional, defaults tohttp://localhost:11434when using Ollama models.
Example .env file:
GOOGLE_API_KEY="YOUR_API_KEY_HERE"
HUGGING_FACE_TOKEN="YOUR_HF_TOKEN_HERE"
OPENROUTER_API_KEY="YOUR_OPENROUTER_KEY"
Usage
Anonymize
The run command anonymizes one or more files.
pdf-anonymizer run FILE_PATH [FILE_PATH ...] \
[--characters-to-anonymize INTEGER] \
[--prompt-name {simple|detailed}] \
[--model-name TEXT] \
[--anonymized-entities PATH]
Arguments:
FILE_PATH: Path to one or several PDF, Markdown, or text files for anonymization.
Options:
--characters-to-anonymize INTEGER: Number of characters to process in each chunk (default:100000).--prompt-name [simple|detailed]: The prompt template to use (default:detailed).--model-name TEXT: The language model to use.--anonymized-entities PATH: Path to a file with a list of entities to anonymize.
Models:
You can use any of the predefined models below, or specify a new model using the format "provider/model-name".
For example: --model-name "google/gemini-flash-latest".
- Google:
gemini-2.5-pro,gemini-2.5-flash(default),gemini-2.5-flash-lite. - Ollama:
gemma:7b,phi4-mini. - Hugging Face:
openai/gpt-oss-20b,mistralai/Mistral-7B-Instruct-v0.1,HuggingFaceH4/zephyr-7b-beta. - OpenRouter:
openai/gpt-4o,google/gemini-pro. - OpenAI:
gpt-4o,gpt-5. - Anthropic:
claude-4-sonet,claude-4.5-sonet.
Examples
Basic anonymization with the default model (Google):
pdf-anonymizer run document.pdf
A new model (Google) and a simple prompt:
pdf-anonymizer run notes.md --model-name "google/gemini-flash-latest" --prompt-name simple
Using an OpenRouter model:
pdf-anonymizer run report.pdf --model-name "openai/gpt-4o"
Deanonymize
The deanonymize command reverts anonymization using a mapping file.
pdf-anonymizer deanonymize ANONYMIZED_FILE MAPPING_FILE
Arguments:
ANONYMIZED_FILE: Path to the anonymized text file.MAPPING_FILE: Path to the JSON mapping file.
Example:
pdf-anonymizer deanonymize \
data/anonymized/document.anonymized.md \
data/mappings/document.mapping.json
This will create a deanonymized version of the file at data/deanonymized/document.deanonymized.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_anonymizer_cli-0.3.2.tar.gz.
File metadata
- Download URL: pdf_anonymizer_cli-0.3.2.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf5d4b172884f65059843b919b141297620745bb39c9d85389e78aacdeacabea
|
|
| MD5 |
2603cd46a7c41404eb8386bdf71046ea
|
|
| BLAKE2b-256 |
f0b57052235d9e1f2d4b1c8cb149dea168d058a73fc5058ee9f3fcb93d6bc096
|
File details
Details for the file pdf_anonymizer_cli-0.3.2-py3-none-any.whl.
File metadata
- Download URL: pdf_anonymizer_cli-0.3.2-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7504b6e3fe57a0b2466d8ec767432fe1fbb20cc0e7ae156e66c9b271d0200587
|
|
| MD5 |
fa997b64dcf122eb211aa29ca76713b7
|
|
| BLAKE2b-256 |
fe5a5a03d327df2695d481af7dc878629a52df377a67be41798675f30cb77016
|