Skip to main content

Removing microsoft office files' metadata

Project description



Codecov PyPI version built with Python3 Discord Channel

Overview

DMeta is an open source Python package that removes metadata of Microsoft Office files and image files.

PyPI Counter PyPI Downloads
Github Stars
Branch main dev
CI
Code Quality CodeFactor

Installation

PyPI

Source code

Usage

In Python

⚠️ Use in_place to apply the changes directly to the original file.

⚠️in_place flag is False by default.

Clear metadata for a .docx file in place

import os
from dmeta.functions import clear

DOCX_FILE_PATH = os.path.join(os.getcwd(), "sample.docx")
clear(DOCX_FILE_PATH, in_place=True)

Clear metadata for any supported file in place

import os
from dmeta.functions import clear_file

FILE_PATH = os.path.join(os.getcwd(), "photo.png")
clear_file(FILE_PATH, in_place=True)

Clear metadata for all existing supported files (.docx|.pptx|.xlsx|.png|.jpg|.jpeg|.gif) in the current directory

from dmeta.functions import clear_all
clear_all()

Update metadata for a .pptx file in place

import os
from dmeta.functions import update

CONFIG_FILE_PATH = os.path.join(os.getcwd(), "config.json") 
DOCX_FILE_PATH = os.path.join(os.getcwd(), "sample.pptx")
update(CONFIG_FILE_PATH, DOCX_FILE_PATH, in_place=True)

Update metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory

import os
from dmeta.functions import update_all

CONFIG_FILE_PATH = os.path.join(os.getcwd(), "config.json") 
update_all(CONFIG_FILE_PATH)

CLI

⚠️ You can use dmeta or python -m dmeta to run this program

⚠️ Use --inplace to apply the changes directly to the original file.

Clear metadata for a .docx file in place

dmeta --clear "./test_a.docx" --inplace

Clear metadata for a .png file in place

dmeta --clear "./photo.png" --inplace

Clear metadata for all existing supported files (.docx|.pptx|.xlsx|.png|.jpg|.jpeg|.gif) in the current directory

dmeta --clear-all

Update metadata for a .xlsx file in place

dmeta --update "./test_a.xlsx" --config "./config.json" --inplace

Update metadata for all existing microsoft files (.docx|.pptx|.xlsx) files in the current directory

dmeta --update-all --config "./config.json"

Version

dmeta -v
dmeta --version

Info

dmeta --info

Dmeta as pre-commit hook

To ensure that no Microsoft Office files ever enter your repo with embedded metadata, you can use Dmeta’s built-in pre-commit hooks.

1. Install the pre-commit framework

If you don’t already have it:

pip install pre-commit

2. Add Dmeta to your project’s .pre-commit-config.yaml

In your project root, create or update .pre-commit-config.yaml:

repos:
  - repo: https://github.com/openscilab/dmeta.git
    rev: v0.5 # minimum v0.4 or commit SHA
    hooks:
      - id: clear-metadata
  • rev: must exactly match the minimum tag supporting pre-commit hooks or the commit SHA where the targetted .pre-commit-hooks.yaml exists.

3. Install the hook

pre-commit install # or pre_commit install (in windows)

Now, every time you git commit, Dmeta will automatically clear metadata from any supported files in-place.

⚠️ Important: Clean Before You Commit

Do not stage or add Microsoft Office files before removing their metadata.

If you run git add on Office files that still contain embedded metadata, the pre-commit hook will attempt to clean them in-place, which modifies the files after they’ve been staged. As a result, Git will block the commit because the content has changed mid-process.

✅ Suggested Correct Workflow

  1. Let the hook run automatically on earlier commits that didn’t add Office files, or run it manually. To do manually you can run pre-commit run clear-metadata --all-files

  2. Then:

    git add <cleaned-files>
    git commit -m "Your message"
    

Supported files

File format support
Microsoft Word (.docx)
Microsoft PowerPoint (.pptx)
Microsoft Excel (.xlsx)
PNG (.png)
JPEG (.jpg, .jpeg)
GIF (.gif)

Issues & bug reports

Just fill an issue and describe it. We'll check it ASAP! or send an email to dmeta@openscilab.com.

  • Please complete the issue template

You can also join our discord server

Discord Channel

Acknowledgments

Python Software Foundation (PSF) granted DMeta library partially for version(s) 0.4, 0.5. PSF is the organization behind Python. Their mission is to promote, protect, and advance the Python programming language and to support and facilitate the growth of a diverse and international community of Python programmers.

Python Software Foundation

Show your support

Star this repo

Give a ⭐️ if this project helped you!

Donate to our project

If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .

DMeta Donation

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased

0.5 - 2026-05-27

Added

  • GIF params in params.py
  • clear_gif_metadata function in functions.py
  • JPEG params in params.py
  • clear_jpeg_metadata function in functions.py
  • clear_png_metadata function in functions.py
  • extract_metadata function in functions.py
  • SUPPORTED_IMAGE_FORMATS and SUPPORTED_FORMATS in params.py
  • get_file_format function in util.py
  • CLEAR_HANDLERS dict in functions.py
  • clear_file function in functions.py

Changed

  • RELEASE.md
  • test.yml
  • clear function in functions.py
  • update function in functions.py
  • clear_all function in functions.py
  • update_all function in functions.py
  • run_dmeta function in functions.py
  • CLI help text in __main__.py
  • .pre-commit-hooks.yaml updated
  • Test system modified
  • README.md updated

Removed

  • get_microsoft_format function in util.py

0.4 - 2025-06-16

Added

  • Acknowledgments in README.md
  • .pre-commit-config.yaml
  • .pre-commit-hooks.yaml
  • DMeta pre-commit hook section in README.md
  • recursive search in clear_all and update_all
  • --verbose flag in CLI
  • modern issue template structure
  • --info flag in CLI

Changed

  • get_microsoft_format function in util.py
  • overwrite_metadata function in functions.py
  • clear_all function in functions.py
  • clear function in functions.py
  • update_all function in functions.py enhanced
  • update function in functions.py

Removed

  • Python 3.6 support
  • old issue template structure

0.3 - 2025-01-13

Removed

  • extract_namespaces function in util.py

Added

  • DMetaBaseError added to dmeta/__init__.py
  • overwrite_metadata function added to functions.py

Changed

  • update function in functions.py refactored
  • clear function in functions.py refactored
  • README.md updated
  • GitHub actions are limited to the dev and main branches
  • Python 3.13 added to test.yml

0.2 - 2024-08-14

Added

  • dmeta/errors.py
  • pptx and xlsx support
  • get_microsoft_format function in util.py
  • SECURITY.md
  • inplace parameter in the clear function in functions.py
  • inplace parameter in the clear_all function in functions.py
  • inplace parameter in the update function in functions.py
  • inplace parameter in the update_all function in functions.py
  • inplace parameter in CLI
  • inplace tests

Changed

  • run_dmeta in functions.py
  • read_json in util.py
  • get_microsoft_format in util.py
  • error messages in params.py
  • clear function in functions.py
  • extract function in util.py
  • remove_format function in util.py
  • clear function in functions.py
  • clear_all function in functions.py
  • update function in functions.py
  • update_all function in functions.py
  • extract_namespaces function in util.py
  • README.md updated

0.1 - 2024-06-19

Added

  • CLI handler
  • main function in __main__.py
  • README.md
  • clear function in functions.py
  • clear_all function in functions.py
  • update function in functions.py
  • update_all function in functions.py
  • run_dmeta function in functions.py
  • dmeta_help function in functions.py
  • extract_namespaces function in util.py
  • remove_format function in util.py
  • extract_docx function in util.py
  • read_json function in util.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmeta-0.5.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmeta-0.5-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file dmeta-0.5.tar.gz.

File metadata

  • Download URL: dmeta-0.5.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for dmeta-0.5.tar.gz
Algorithm Hash digest
SHA256 f4ad20ff54368238ba8b0d0432c477be23a1cd648231d95e4b961dafa0454159
MD5 eda4fd6e3a7899743f2847ec7f1b0986
BLAKE2b-256 a6a41f1520d22b28db1beefc6fe37d4947dcae29bcd342c660c84a938def0003

See more details on using hashes here.

File details

Details for the file dmeta-0.5-py3-none-any.whl.

File metadata

  • Download URL: dmeta-0.5-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for dmeta-0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 17f4b2e45bc0d2fc0986c220c6d4ec62a2692d48be530947e25f96718d16186c
MD5 952783302ac84b2e5f963ffa17ddf880
BLAKE2b-256 51e6afb5372c3e0554d8bb432c45e7d23e04be7c64b76161aa2857a922fad5e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page