Skip to main content

A privacy-focused CLI tool that removes sensitive metadata from image files

Project description

๐Ÿ”’ Metadata Scrubber Tool

A privacy-focused CLI tool that removes sensitive metadata (EXIF, GPS, author info) from image files. Perfect for protecting your privacy before sharing photos online.

โœจ Features

  • Multi-format support - JPEG, PNG (with PDF/Office planned)
  • Concurrent processing - Process 1000+ files efficiently with ThreadPoolExecutor
  • Dry-run mode - Preview what would be scrubbed without making changes
  • Smart format detection - Uses Pillow's format detection, not just file extensions
  • Beautiful CLI - Rich progress bars and formatted output
  • Privacy-first - Removes GPS coordinates, camera info, timestamps, author data

๐Ÿ“š Educational Value

This project demonstrates:

  • Factory pattern for extensible file type handling
  • Abstract base classes for consistent handler interfaces
  • Concurrent processing with thread-safe operations
  • CLI development with Typer and Rich
  • Image metadata handling with Pillow and piexif

๐Ÿ“‹ Prerequisites

  • Python 3.10+
  • uv (recommended) or pip

๐Ÿš€ Installation

# Clone the repository
git clone https://github.com/Heritage-XioN/metadata-scrubber-tool.git
cd metadata-scrubber-tool

# Create virtual environment and install dependencies
uv venv
.venv\Scripts\activate  # Windows
# source .venv/bin/activate  # Linux/Mac

uv pip install -r requirements.txt

๐Ÿ“– Usage

Read Metadata

# Single file
python -m src.main read photo.jpg

# Recursive directory scan
python -m src.main read ./photos/ -r -ext jpg

Scrub Metadata

# Single file
python -m src.main scrub photo.jpg --output ./cleaned

# Batch process with 8 workers
python -m src.main scrub ./photos/ -r -ext jpg --output ./cleaned --workers 8

# Preview without changes
python -m src.main scrub ./photos/ -r -ext jpg --dry-run

CLI Options

Command Options
read -r / --recursive, -ext / --extension
scrub -r, -ext, -o / --output, -d / --dry-run, -w / --workers
Global -V / --verbose, -v / --version

๐Ÿ—๏ธ Architecture

src/
โ”œโ”€โ”€ main.py                 # CLI entry point (Typer app)
โ”œโ”€โ”€ commands/
โ”‚   โ”œโ”€โ”€ read.py            # Read metadata command
โ”‚   โ””โ”€โ”€ scrub.py           # Scrub metadata command (batch processing)
โ”œโ”€โ”€ services/
โ”‚   โ”œโ”€โ”€ metadata_factory.py # Factory for creating handlers
โ”‚   โ”œโ”€โ”€ metadata_handler.py # Abstract base class
โ”‚   โ”œโ”€โ”€ image_handler.py    # JPEG/PNG handler
โ”‚   โ””โ”€โ”€ batch_processor.py  # Concurrent batch processing
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ jpeg_metadata.py    # JPEG EXIF processor (piexif)
โ”‚   โ””โ”€โ”€ png_metadata.py     # PNG metadata processor (PIL)
โ””โ”€โ”€ utils/
    โ”œโ”€โ”€ display.py          # Rich output formatting
    โ”œโ”€โ”€ formatter.py        # Value formatting helpers
    โ”œโ”€โ”€ exceptions.py       # Custom exceptions
    โ””โ”€โ”€ logger.py           # Logging configuration

Data Flow:

CLI Command โ†’ MetadataFactory โ†’ Handler (readโ†’wipeโ†’save) โ†’ Output
                    โ†“
              Format Detection
                    โ†“
           JpegProcessor / PngProcessor

โš ๏ธ Security Considerations

  • Always backup files before scrubbing in production
  • Use --dry-run to preview changes before committing
  • GPS coordinates are completely stripped for privacy
  • Original files are not modified - processed copies are created

๐Ÿ“„ License

MIT License - See LICENSE for details.


Made with โค๏ธ for privacy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metadata_scrubber-0.1.0.tar.gz (74.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metadata_scrubber-0.1.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file metadata_scrubber-0.1.0.tar.gz.

File metadata

  • Download URL: metadata_scrubber-0.1.0.tar.gz
  • Upload date:
  • Size: 74.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metadata_scrubber-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fa632ee94c67f3eeee5b3e36e7d7052972cb058130c3dcf116a19dac67472910
MD5 fda6514ad4cb38d006564887f5e8fba0
BLAKE2b-256 af6d1494ef442116b210557036ec41d2ae060a3aea75a03e92684e9002987346

See more details on using hashes here.

Provenance

The following attestation bundles were made for metadata_scrubber-0.1.0.tar.gz:

Publisher: publish.yml on Heritage-XioN/metadata-scrubber-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metadata_scrubber-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for metadata_scrubber-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb0fd6073a7be7d1e77c62bcbde4442fccf539f569ac33565bc2ec1b37142c41
MD5 df98b78a2ec4887a5498dd5ff2b41a3d
BLAKE2b-256 59bbb8b08a8f89e4faafe941fce3d2aa44472bcd0b0480f914decb233b75bdf0

See more details on using hashes here.

Provenance

The following attestation bundles were made for metadata_scrubber-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Heritage-XioN/metadata-scrubber-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page