CLI for a tool to anonymize PDF, Markdown, and plain text files using LLMs.
Project description
🦉🫥 PDF Anonymizer CLI
A command-line interface for anonymizing PDF, Markdown, and plain text files using LLMs.
Installation
This project uses uv and is structured as a monorepo. The dependencies for the CLI and its core library are managed at the root of the project.
- Install
uv: Follow the official installation instructions. - Install dependencies from the repository root:
# From the repository root uv sync
This installs thepdf-anonymizerexecutable.
Environment Variables
The CLI will automatically load a .env file from the current directory or any parent directory. For consistency, it's recommended to place a single .env file at the root of the repository.
GOOGLE_API_KEY: Required when using Google's Gemini models.OLLAMA_HOST: Optional, defaults tohttp://localhost:11434when using local Ollama models.HUGGING_FACE_TOKEN: Required when using Hugging Face models. You can get a token from here.
Example .env file:
GOOGLE_API_KEY="YOUR_API_KEY_HERE"
HUGGING_FACE_TOKEN="YOUR_HF_TOKEN_HERE"
Usage
Anonymize
The run command anonymizes one or more files.
pdf-anonymizer run FILE_PATH [FILE_PATH ...] \
[--characters-to-anonymize INTEGER] \
[--prompt-name {simple|detailed}] \
[--model-name TEXT] \
[--anonymized-entities PATH]
Arguments:
FILE_PATH: Path to one or several PDF, Markdown, or text files for anonymization.
Options:
--characters-to-anonymize INTEGER: Number of characters to process in each chunk (default:100000).--prompt-name [simple|detailed]: The prompt template to use (default:detailed).--model-name TEXT: The language model to use.--anonymized-entities PATH: Path to a file with a list of entities to anonymize.
Models:
- Google:
gemini-2.5-pro,gemini-2.5-flash(default),gemini-2.5-flash-lite. - Ollama:
gemma:7b,phi4-mini. - Hugging Face:
openai/gpt-oss-20b,mistralai/Mistral-7B-Instruct-v0.1,HuggingFaceH4/zephyr-7b-beta.
Examples
Basic anonymization:
pdf-anonymizer run document.pdf
Custom model and prompt:
pdf-anonymizer run notes.md --model-name phi4-mini --prompt-name simple
Deanonymize
The deanonymize command reverts anonymization using a mapping file.
pdf-anonymizer deanonymize ANONYMIZED_FILE MAPPING_FILE
Arguments:
ANONYMIZED_FILE: Path to the anonymized text file.MAPPING_FILE: Path to the JSON mapping file.
Example:
pdf-anonymizer deanonymize \
data/anonymized/document.anonymized.md \
data/mappings/document.mapping.json
This will create a deanonymized version of the file at data/deanonymized/document.deanonymized.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_anonymizer_cli-0.3.1.tar.gz.
File metadata
- Download URL: pdf_anonymizer_cli-0.3.1.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f5e0f9038defd7a71b137707659b747b7ebd302847189c88a2c32d19c742ee6
|
|
| MD5 |
40f9e638c0fda200df2e82454bc9a97a
|
|
| BLAKE2b-256 |
f8618c7cb0ec78bd7c8e40bfae58a3c3b4bd319051f09630d134bfaacf7756d5
|
File details
Details for the file pdf_anonymizer_cli-0.3.1-py3-none-any.whl.
File metadata
- Download URL: pdf_anonymizer_cli-0.3.1-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02a85009fbe9c6df9a97d497d0d39f2a7ec713fbee4657a179bd0673e06ac728
|
|
| MD5 |
c5e0aa3172db837487bc4cd29ec49129
|
|
| BLAKE2b-256 |
f4c580f2fb8a961f7f219b342be420e3c9375ed76963c3d8065bf3d770a10668
|