Skip to main content

CLI for a tool to anonymize PDF, Markdown, and plain text files using LLMs.

Project description

PDF Anonymizer CLI

A command-line interface for anonymizing PDF, Markdown, and plain text files using LLMs.

Installation

This project uses uv and is structured as a monorepo. The dependencies for the CLI and its core library are managed at the root of the project.

  1. Install uv: Follow the official installation instructions.
  2. Install dependencies from the repository root:
    # From the repository root
    uv sync
    
    This installs the pdf-anonymizer executable.

Environment Variables

The CLI will automatically load a .env file from the current directory or any parent directory. For consistency, it's recommended to place a single .env file at the root of the repository.

  • GOOGLE_API_KEY: Required when using Google's Gemini models.
  • OLLAMA_HOST: Optional, defaults to http://localhost:11434 when using local Ollama models.

Example .env file:

GOOGLE_API_KEY="YOUR_API_KEY_HERE"

Usage

Anonymize

The run command anonymizes one or more files.

pdf-anonymizer run FILE_PATH [FILE_PATH ...] \
  [--characters-to-anonymize INTEGER] \
  [--prompt-name {simple|detailed}] \
  [--model-name TEXT] \
  [--anonymized-entities PATH]

Arguments:

  • FILE_PATH: Path to one or several PDF, Markdown, or text files for anonymization.

Options:

  • --characters-to-anonymize INTEGER: Number of characters to process in each chunk (default: 100000).
  • --prompt-name [simple|detailed]: The prompt template to use (default: detailed).
  • --model-name TEXT: The language model to use.
  • --anonymized-entities PATH: Path to a file with a list of entities to anonymize.

Models:

  • Google: gemini-2.5-pro, gemini-2.5-flash (default), gemini-2.5-flash-lite.
  • Ollama: gemma:7b, phi4-mini.

Examples

Basic anonymization:

pdf-anonymizer run document.pdf

Custom model and prompt:

pdf-anonymizer run notes.md --model-name phi4-mini --prompt-name simple

Deanonymize

The deanonymize command reverts anonymization using a mapping file.

pdf-anonymizer deanonymize ANONYMIZED_FILE MAPPING_FILE

Arguments:

  • ANONYMIZED_FILE: Path to the anonymized text file.
  • MAPPING_FILE: Path to the JSON mapping file.

Example:

pdf-anonymizer deanonymize \
    data/anonymized/document.anonymized.md \
    data/mappings/document.mapping.json

This will create a deanonymized version of the file at data/deanonymized/document.deanonymized.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_anonymizer_cli-0.3.0.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_anonymizer_cli-0.3.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file pdf_anonymizer_cli-0.3.0.tar.gz.

File metadata

  • Download URL: pdf_anonymizer_cli-0.3.0.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pdf_anonymizer_cli-0.3.0.tar.gz
Algorithm Hash digest
SHA256 77b67742410aafb174a434d79be1c788b63eaca0bb5446ae4d95b6ef88a65ba4
MD5 fb830dd585a6995ad91ffaa0a9d2b5df
BLAKE2b-256 38b2428f96add78d46609f9ad629f1c588eec68de4583b9678812ff48d42a4a3

See more details on using hashes here.

File details

Details for the file pdf_anonymizer_cli-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_anonymizer_cli-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 79ae32a6c8f4af1c040d8ffa1bb28c0744918ccde8a52d1ab4c95323d793597d
MD5 05fc005c16bd31349a0b73e2b4aabd1e
BLAKE2b-256 5f3dfabef9746edc40996716189d7d4c6d9ac5ffda155c63e54fc0421bccf222

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page