A privacy-focused CLI tool that removes sensitive metadata from image files
Project description
๐ Metadata Scrubber Tool
A privacy-focused CLI tool that removes sensitive metadata (EXIF, GPS, author info) from image files. Perfect for protecting your privacy before sharing photos online.
โจ Features
- Multi-format support - JPEG, PNG (with PDF/Office planned)
- Concurrent processing - Process 1000+ files efficiently with ThreadPoolExecutor
- Dry-run mode - Preview what would be scrubbed without making changes
- Smart format detection - Uses Pillow's format detection, not just file extensions
- Beautiful CLI - Rich progress bars and formatted output
- Privacy-first - Removes GPS coordinates, camera info, timestamps, author data
๐ Educational Value
This project demonstrates:
- Factory pattern for extensible file type handling
- Abstract base classes for consistent handler interfaces
- Concurrent processing with thread-safe operations
- CLI development with Typer and Rich
- Image metadata handling with Pillow and piexif
๐ Prerequisites
- Python 3.10+
- uv (recommended) or pip
๐ Installation
# Clone the repository
git clone https://github.com/Heritage-XioN/metadata-scrubber-tool.git
cd metadata-scrubber-tool
# Create virtual environment and install dependencies
uv venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
uv pip install -r requirements.txt
๐ Usage
Read Metadata
# Single file
python -m src.main read photo.jpg
# Recursive directory scan
python -m src.main read ./photos/ -r -ext jpg
Scrub Metadata
# Single file
python -m src.main scrub photo.jpg --output ./cleaned
# Batch process with 8 workers
python -m src.main scrub ./photos/ -r -ext jpg --output ./cleaned --workers 8
# Preview without changes
python -m src.main scrub ./photos/ -r -ext jpg --dry-run
CLI Options
| Command | Options |
|---|---|
read |
-r / --recursive, -ext / --extension |
scrub |
-r, -ext, -o / --output, -d / --dry-run, -w / --workers |
| Global | -V / --verbose, -v / --version |
๐๏ธ Architecture
src/
โโโ main.py # CLI entry point (Typer app)
โโโ commands/
โ โโโ read.py # Read metadata command
โ โโโ scrub.py # Scrub metadata command (batch processing)
โโโ services/
โ โโโ metadata_factory.py # Factory for creating handlers
โ โโโ metadata_handler.py # Abstract base class
โ โโโ image_handler.py # JPEG/PNG handler
โ โโโ batch_processor.py # Concurrent batch processing
โโโ core/
โ โโโ jpeg_metadata.py # JPEG EXIF processor (piexif)
โ โโโ png_metadata.py # PNG metadata processor (PIL)
โโโ utils/
โโโ display.py # Rich output formatting
โโโ formatter.py # Value formatting helpers
โโโ exceptions.py # Custom exceptions
โโโ logger.py # Logging configuration
Data Flow:
CLI Command โ MetadataFactory โ Handler (readโwipeโsave) โ Output
โ
Format Detection
โ
JpegProcessor / PngProcessor
โ ๏ธ Security Considerations
- Always backup files before scrubbing in production
- Use
--dry-runto preview changes before committing - GPS coordinates are completely stripped for privacy
- Original files are not modified - processed copies are created
๐ License
MIT License - See LICENSE for details.
Made with โค๏ธ for privacy
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file metadata_scrubber-0.1.0.tar.gz.
File metadata
- Download URL: metadata_scrubber-0.1.0.tar.gz
- Upload date:
- Size: 74.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa632ee94c67f3eeee5b3e36e7d7052972cb058130c3dcf116a19dac67472910
|
|
| MD5 |
fda6514ad4cb38d006564887f5e8fba0
|
|
| BLAKE2b-256 |
af6d1494ef442116b210557036ec41d2ae060a3aea75a03e92684e9002987346
|
Provenance
The following attestation bundles were made for metadata_scrubber-0.1.0.tar.gz:
Publisher:
publish.yml on Heritage-XioN/metadata-scrubber-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
metadata_scrubber-0.1.0.tar.gz -
Subject digest:
fa632ee94c67f3eeee5b3e36e7d7052972cb058130c3dcf116a19dac67472910 - Sigstore transparency entry: 801580690
- Sigstore integration time:
-
Permalink:
Heritage-XioN/metadata-scrubber-tool@84f60d66aa1da11b47be6417bb6d0c56b4fa1d4b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Heritage-XioN
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84f60d66aa1da11b47be6417bb6d0c56b4fa1d4b -
Trigger Event:
release
-
Statement type:
File details
Details for the file metadata_scrubber-0.1.0-py3-none-any.whl.
File metadata
- Download URL: metadata_scrubber-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb0fd6073a7be7d1e77c62bcbde4442fccf539f569ac33565bc2ec1b37142c41
|
|
| MD5 |
df98b78a2ec4887a5498dd5ff2b41a3d
|
|
| BLAKE2b-256 |
59bbb8b08a8f89e4faafe941fce3d2aa44472bcd0b0480f914decb233b75bdf0
|
Provenance
The following attestation bundles were made for metadata_scrubber-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Heritage-XioN/metadata-scrubber-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
metadata_scrubber-0.1.0-py3-none-any.whl -
Subject digest:
fb0fd6073a7be7d1e77c62bcbde4442fccf539f569ac33565bc2ec1b37142c41 - Sigstore transparency entry: 801580752
- Sigstore integration time:
-
Permalink:
Heritage-XioN/metadata-scrubber-tool@84f60d66aa1da11b47be6417bb6d0c56b4fa1d4b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Heritage-XioN
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84f60d66aa1da11b47be6417bb6d0c56b4fa1d4b -
Trigger Event:
release
-
Statement type: