Skip to main content

A privacy-focused CLI tool that removes sensitive metadata from image files

Project description

๐Ÿ”’ Metadata Scrubber

A privacy-focused CLI tool that removes sensitive metadata from files. Supports images, PDFs, and Microsoft Office documents. Perfect for protecting your privacy before sharing files online.

Tests Python 3.10+ License: MIT

โœจ Features

  • Multi-format support - Images (JPEG, PNG), PDFs, and Office docs (Word, Excel, PowerPoint)
  • Concurrent processing - Process 1000+ files efficiently with ThreadPoolExecutor
  • Dry-run mode - Preview what would be scrubbed without making changes
  • Smart format detection - Uses library-level format detection, not just file extensions
  • Beautiful CLI - Rich progress bars and formatted output
  • Privacy-first - Removes GPS coordinates, author info, timestamps, camera data

๐Ÿ“ Supported Formats

Category Extensions Metadata Removed
Images .jpg, .jpeg, .png EXIF, GPS, camera info, timestamps
PDF .pdf Author, creator, producer, dates
Word .docx Author, title, comments, keywords
Excel .xlsx, .xlsm, .xltx, .xltm Author, title, company, comments
PowerPoint .pptx, .pptm, .potx, .potm Author, title, comments, keywords

๐Ÿš€ Quick Start

Installation

# Using uv (recommended)
uv pip install metadata-scrubber

# Or clone and install locally
git clone https://github.com/Heritage-XioN/metadata-scrubber-tool.git
cd metadata-scrubber-tool
uv sync

Basic Usage

# Read metadata from a file
mst read document.pdf

# Scrub metadata and save to output folder
mst scrub photo.jpg --output ./cleaned

# Batch process entire folder
mst scrub ./documents -r -ext docx --output ./cleaned

๐Ÿ“– Commands

mst read - View Metadata

mst read photo.jpg                      # Single file
mst read report.pdf                     # PDF file
mst read ./docs -r -ext docx            # All Word docs recursively

mst scrub - Remove Metadata

mst scrub photo.jpg --output ./out      # Single file
mst scrub ./photos -r -ext jpg -o ./out # All JPEGs in directory
mst scrub ./docs -r -ext pdf --dry-run  # Preview PDF scrubbing
mst scrub ./files -r -ext xlsx -w 8     # 8 concurrent workers

CLI Options

Option Description
-r, --recursive Process directories recursively
-ext, --extension Filter by file extension
-o, --output Output directory for cleaned files
-d, --dry-run Preview without making changes
-w, --workers Number of concurrent workers
-V, --verbose Show detailed debug logs
-v, --version Show version

๐Ÿ› ๏ธ Development

Setup

git clone https://github.com/Heritage-XioN/metadata-scrubber-tool.git
cd metadata-scrubber-tool

# Install with dev dependencies
uv sync --all-extras

# Run tests
pytest

# Run linting
ruff check .

# Run type checking
mypy src

Project Structure

src/
โ”œโ”€โ”€ main.py                 # CLI entry point (Typer app)
โ”œโ”€โ”€ commands/
โ”‚   โ”œโ”€โ”€ read.py             # Read metadata command
โ”‚   โ””โ”€โ”€ scrub.py            # Scrub metadata command
โ”œโ”€โ”€ services/
โ”‚   โ”œโ”€โ”€ metadata_factory.py # Factory for creating handlers
โ”‚   โ”œโ”€โ”€ metadata_handler.py # Abstract base class
โ”‚   โ”œโ”€โ”€ image_handler.py    # JPEG/PNG handler
โ”‚   โ”œโ”€โ”€ pdf_handler.py      # PDF handler
โ”‚   โ”œโ”€โ”€ excel_handler.py    # Excel handler
โ”‚   โ”œโ”€โ”€ powerpoint_handler.py # PowerPoint handler
โ”‚   โ”œโ”€โ”€ worddoc_handler.py  # Word document handler
โ”‚   โ””โ”€โ”€ batch_processor.py  # Concurrent batch processing
โ””โ”€โ”€ core/
    โ”œโ”€โ”€ jpeg_metadata.py    # JPEG EXIF processor
    โ””โ”€โ”€ png_metadata.py     # PNG metadata processor

โš ๏ธ Security Considerations

  • Original files are never modified - processed copies are created
  • Use --dry-run to preview changes before committing
  • GPS coordinates are completely stripped for privacy
  • Author information is removed from all supported formats
  • Always backup files before scrubbing in production

๐Ÿ“„ License

MIT License - See LICENSE for details.


Made with โค๏ธ for privacy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metadata_scrubber-0.2.0.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metadata_scrubber-0.2.0-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file metadata_scrubber-0.2.0.tar.gz.

File metadata

  • Download URL: metadata_scrubber-0.2.0.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metadata_scrubber-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0ce8e570f8a35ade4364e6f56748c89544d9cd43ca6a50b1664593e539b10529
MD5 b43d630f6295025fd4a0f933916117fe
BLAKE2b-256 4a0b7e8e6dd1e1172a9f0a2ecb0b209214504c0e26c7afe6ca010328b3212e3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for metadata_scrubber-0.2.0.tar.gz:

Publisher: publish.yml on Heritage-XioN/metadata-scrubber-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metadata_scrubber-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for metadata_scrubber-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 988631f046c0257101674c111be54fc3270567420f7d1d4531d595d5de9afda3
MD5 addde03f6d6a487d60503c1f86105d8d
BLAKE2b-256 806415d78acfa389f1b7ed1126d0d0bd7d6bd53764699903031aeb52f4bf6e12

See more details on using hashes here.

Provenance

The following attestation bundles were made for metadata_scrubber-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Heritage-XioN/metadata-scrubber-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page