Skip to main content

A CLI tool to remove or selectively filter metadata from images, documents, audio, and video files.

Project description

๐Ÿ“„ Metadata Cleaner ๐Ÿ”

Build Status Release PyPI version License: MIT

A powerful CLI tool to remove or selectively filter metadata from images, PDFs, DOCX, audio, and video files.


๐Ÿ“Œ Overview

Metadata Cleaner is a fast and efficient command-line tool designed for privacy protection, security compliance, and data sanitization. It supports removing metadata from various file formats including images, documents, audio, and video files, with options for selective filtering and parallel batch processing.

๐Ÿ” Why use Metadata Cleaner?

  • Protect your privacy: Strip hidden metadata from files.
  • Sanitize sensitive documents: Prepare files for sharing without revealing personal information.
  • Reduce file size: Remove unnecessary metadata.
  • Batch process: Clean metadata from individual files or entire folders (with recursive support).

๐Ÿš€ Features

  • Selective Metadata Filtering:
    Configure which metadata fields to preserve or remove using a JSON configuration file.

  • Batch & Recursive Processing:
    Process a single file, an entire folder, or even subfolders recursively.

  • Parallel Processing:
    Accelerate batch operations using multi-file parallel execution.

  • Cross-Platform CLI:
    Works on Linux, macOS, and Windows.

  • Logging & Error Reporting:
    Detailed logs help troubleshoot issues easily.


๐Ÿ› ๏ธ Installation & Usage

1๏ธโƒฃ Using Poetry (Recommended)

If you use Poetry, simply clone the repository and install dependencies:

git clone https://github.com/sandy-sp/metadata-cleaner.git
cd metadata-cleaner
poetry install

To run Metadata Cleaner:

poetry run metadata-cleaner --help

2๏ธโƒฃ Install via PyPI

Once published to PyPI, you can install it with pip:

pip install metadata-cleaner

And run it:

metadata-cleaner --help

3๏ธโƒฃ Usage Examples

Remove Metadata from a Single File

metadata-cleaner --file path/to/file.jpg

Example Output:

Do you want to process file.jpg? [Y/n]: Y
โœ… Metadata removed. Cleaned file saved at: path/to/file_cleaned.jpg

Remove Metadata from All Files in a Folder (Non-Recursive)

metadata-cleaner --folder test_folder

Example Output:

Do you want to process all files in test_folder? [Y/n]: Y
Processing Files: 100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 5/5 [00:10s]

๐Ÿ“Š Summary Report:
โœ… Successfully processed: 5 files
Cleaned files saved in: test_folder/cleaned

Batch Processing with Recursive Search & Custom Output

metadata-cleaner --folder my_folder --recursive --output sanitized_files --yes

Example Output:

Processing Files: 100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 20/20 [00:15s]

๐Ÿ“Š Summary Report:
โœ… Successfully processed: 20 files
Cleaned files saved in: sanitized_files

Using a Custom Configuration File

You can create a JSON configuration file (e.g., config.json) to specify selective metadata rules. Then run:

metadata-cleaner --file sample.jpg --config config.json

๐Ÿ”ง How It Works

  1. File Detection:
    The tool detects the file type and selects the appropriate handler.

  2. Selective Filtering:
    For image files, it uses a configuration file (if provided) to selectively remove or preserve EXIF metadata.

  3. Processing:
    Files are processedโ€”either individually or in batchesโ€”with parallel execution for efficiency.

  4. Output & Logging:
    Cleaned files are saved in a default or specified output folder, and detailed logs are generated for troubleshooting.


๐Ÿ’ป Project Structure

metadata-cleaner/
โ”œโ”€โ”€ docs/                     # Documentation
โ”œโ”€โ”€ metadata_cleaner/         # Python package source code
โ”‚   โ”œโ”€โ”€ cli.py                # CLI entry point
โ”‚   โ”œโ”€โ”€ remover.py            # Core metadata removal logic
โ”‚   โ”œโ”€โ”€ config/               # Configuration settings
โ”‚   โ”œโ”€โ”€ core/                 # Metadata filtering utilities
โ”‚   โ”œโ”€โ”€ file_handlers/        # File-specific metadata handlers
โ”‚   โ””โ”€โ”€ logs/                 # Logging configuration
โ”œโ”€โ”€ tests/                    # Unit tests
โ”œโ”€โ”€ scripts/                  # Setup and environment scripts (Poetry-based)
โ”œโ”€โ”€ pyproject.toml            # Poetry configuration file
โ”œโ”€โ”€ MANIFEST.in               # Manifest file for packaging
โ””โ”€โ”€ README.md                 # This file

๐Ÿ’ก Contributing

Contributions are welcome! To contribute:

  1. Fork the repository
  2. Create a new branch for your feature:
    git checkout -b feature-name
    
  3. Make your changes and test using:
    poetry run pytest
    
  4. Commit and push your changes:
    git commit -m "Describe your feature"
    git push origin feature-name
    
  5. Submit a Pull Request

๐Ÿ”— Resources & Links


โค๏ธ Support

If you find this tool useful, please give it a โญ on GitHub!
For issues or questions, open an issue or contact sandeep.paidipati@gmail.com.


๐Ÿ”’ License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metadata_cleaner-2.0.3.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metadata_cleaner-2.0.3-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file metadata_cleaner-2.0.3.tar.gz.

File metadata

  • Download URL: metadata_cleaner-2.0.3.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure

File hashes

Hashes for metadata_cleaner-2.0.3.tar.gz
Algorithm Hash digest
SHA256 3d91308afe730f844bf42da83ee5ae435bcd5d28b579936cff959545e3ca16e4
MD5 8ed44d0829101e99dfe4e9a448e45e02
BLAKE2b-256 e7e6df8d6df1dea055f7e2fed49744aee866ef27db764b21c17b5f2efe85dac4

See more details on using hashes here.

File details

Details for the file metadata_cleaner-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: metadata_cleaner-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure

File hashes

Hashes for metadata_cleaner-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4a82cefe627bb7e2c2e5aa6732f6b480e3fe9cfc796706706cf938487dc2c81c
MD5 d3b6c028fa8cf419c9fb19084f60dcd1
BLAKE2b-256 e33929ab5228813f1480c3a6e72b9a692601f9a1eb4a0f9495af516406ada8fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page