Skip to main content

CLI for a tool to anonymize PDF, Markdown, and plain text files using LLMs.

Project description

🦉🫥 PDF Anonymizer CLI

A command-line interface for anonymizing PDF, Markdown, and plain text files using LLMs.

Installation

This project uses uv and is structured as a monorepo. The dependencies for the CLI and its core library are managed at the root of the project.

  1. Install uv: Follow the official installation instructions.
  2. Install dependencies from the repository root:
    # From the repository root
    uv sync
    
    This installs the pdf-anonymizer executable.

Environment Variables

The CLI will automatically load a .env file from the current directory or any parent directory. For consistency, it's recommended to place a single .env file at the root of the repository.

  • GOOGLE_API_KEY: Required when using Google's Gemini models.
  • OLLAMA_HOST: Optional, defaults to http://localhost:11434 when using local Ollama models.
  • HUGGING_FACE_TOKEN: Required when using Hugging Face models. You can get a token from here.

Example .env file:

GOOGLE_API_KEY="YOUR_API_KEY_HERE"
HUGGING_FACE_TOKEN="YOUR_HF_TOKEN_HERE"

Usage

Anonymize

The run command anonymizes one or more files.

pdf-anonymizer run FILE_PATH [FILE_PATH ...] \
  [--characters-to-anonymize INTEGER] \
  [--prompt-name {simple|detailed}] \
  [--model-name TEXT] \
  [--anonymized-entities PATH]

Arguments:

  • FILE_PATH: Path to one or several PDF, Markdown, or text files for anonymization.

Options:

  • --characters-to-anonymize INTEGER: Number of characters to process in each chunk (default: 100000).
  • --prompt-name [simple|detailed]: The prompt template to use (default: detailed).
  • --model-name TEXT: The language model to use.
  • --anonymized-entities PATH: Path to a file with a list of entities to anonymize.

Models:

  • Google: gemini-2.5-pro, gemini-2.5-flash (default), gemini-2.5-flash-lite.
  • Ollama: gemma:7b, phi4-mini.
  • Hugging Face: openai/gpt-oss-20b, mistralai/Mistral-7B-Instruct-v0.1, HuggingFaceH4/zephyr-7b-beta.

Examples

Basic anonymization:

pdf-anonymizer run document.pdf

Custom model and prompt:

pdf-anonymizer run notes.md --model-name phi4-mini --prompt-name simple

Deanonymize

The deanonymize command reverts anonymization using a mapping file.

pdf-anonymizer deanonymize ANONYMIZED_FILE MAPPING_FILE

Arguments:

  • ANONYMIZED_FILE: Path to the anonymized text file.
  • MAPPING_FILE: Path to the JSON mapping file.

Example:

pdf-anonymizer deanonymize \
    data/anonymized/document.anonymized.md \
    data/mappings/document.mapping.json

This will create a deanonymized version of the file at data/deanonymized/document.deanonymized.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_anonymizer_cli-0.3.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_anonymizer_cli-0.3.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file pdf_anonymizer_cli-0.3.1.tar.gz.

File metadata

  • Download URL: pdf_anonymizer_cli-0.3.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pdf_anonymizer_cli-0.3.1.tar.gz
Algorithm Hash digest
SHA256 1f5e0f9038defd7a71b137707659b747b7ebd302847189c88a2c32d19c742ee6
MD5 40f9e638c0fda200df2e82454bc9a97a
BLAKE2b-256 f8618c7cb0ec78bd7c8e40bfae58a3c3b4bd319051f09630d134bfaacf7756d5

See more details on using hashes here.

File details

Details for the file pdf_anonymizer_cli-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_anonymizer_cli-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 02a85009fbe9c6df9a97d497d0d39f2a7ec713fbee4657a179bd0673e06ac728
MD5 c5e0aa3172db837487bc4cd29ec49129
BLAKE2b-256 f4c580f2fb8a961f7f219b342be420e3c9375ed76963c3d8065bf3d770a10668

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page