Skip to main content

Removing microsoft office files' metadata

Project description



Codecov PyPI version built with Python3 Discord Channel

Overview

DMeta is an open source Python package that removes metadata of Microsoft Office files.

PyPI Counter PyPI Downloads
Github Stars
Branch main dev
CI

Installation

PyPI

Source code

Usage

In Python

⚠️ Use in_place to apply the changes directly to the original file.

⚠️in_place flag is False by default.

Clear metadata for a .docx file in place

import os
from dmeta.functions import clear

DOCX_FILE_PATH = os.path.join(os.getcwd(), "sample.docx")
clear(DOCX_FILE_PATH, in_place=True)

Clear metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory

from dmeta.functions import clear_all
clear_all()

Update metadata for a .pptx file in place

import os
from dmeta.functions import update

CONFIG_FILE_PATH = os.path.join(os.getcwd(), "config.json") 
DOCX_FILE_PATH = os.path.join(os.getcwd(), "sample.pptx")
update(CONFIG_FILE_PATH, DOCX_FILE_PATH, in_place=True)

Update metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory

import os
from dmeta.functions import update_all

CONFIG_FILE_PATH = os.path.join(os.getcwd(), "config.json") 
update_all(CONFIG_FILE_PATH)

CLI

⚠️ You can use dmeta or python -m dmeta to run this program

⚠️ Use --inplace to apply the changes directly to the original file.

Clear metadata for a .docx file in place

dmeta --clear "./test_a.docx" --inplace

Clear metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory

dmeta --clear-all

Update metadata for a .xlsx file in place

dmeta --update "./test_a.xlsx" --config "./config.json" --inplace

Update metadata for all existing microsoft files (.docx|.pptx|.xlsx) files in the current directory

dmeta --update-all --config "./config.json"

Version

dmeta -v
dmeta --version

Info

dmeta --info

Dmeta as pre-commit hook

To ensure that no Microsoft Office files ever enter your repo with embedded metadata, you can use Dmeta’s built-in pre-commit hooks.

1. Install the pre-commit framework

If you don’t already have it:

pip install pre-commit

2. Add Dmeta to your project’s .pre-commit-config.yaml

In your project root, create or update .pre-commit-config.yaml:

repos:
  - repo: https://github.com/openscilab/dmeta.git
    rev: v0.4 # minimum v0.4 or commit SHA
    hooks:
      - id: clear-metadata
  • rev: must exactly match the minimum tag supporting pre-commit hooks or the commit SHA where the targetted .pre-commit-hooks.yaml exists.

3. Install the hook

pre-commit install # or pre_commit install (in windows)

Now, every time you git commit, Dmeta will automatically clear metadata from any Microsoft files in-place.

⚠️ Important: Clean Before You Commit

Do not stage or add Microsoft Office files before removing their metadata.

If you run git add on Office files that still contain embedded metadata, the pre-commit hook will attempt to clean them in-place, which modifies the files after they’ve been staged. As a result, Git will block the commit because the content has changed mid-process.

✅ Suggested Correct Workflow

  1. Let the hook run automatically on earlier commits that didn’t add Office files, or run it manually. To do manually you can run pre-commit run clear-metadata --all-files

  2. Then:

    git add <cleaned-files>
    git commit -m "Your message"
    

Supported files

File format support
Microsoft Word (.docx)
Microsoft PowerPoint (.pptx)
Microsoft Excel (.xlsx)

Issues & bug reports

Just fill an issue and describe it. We'll check it ASAP! or send an email to dmeta@openscilab.com.

  • Please complete the issue template

You can also join our discord server

Discord Channel

Acknowledgments

Python Software Foundation (PSF) granted DMeta library partially for version(s) 0.4. PSF is the organization behind Python. Their mission is to promote, protect, and advance the Python programming language and to support and facilitate the growth of a diverse and international community of Python programmers.

Python Software Foundation

Show your support

Star this repo

Give a ⭐️ if this project helped you!

Donate to our project

If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .

DMeta Donation

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

0.4 - 2025-06-16

Added

  • Acknowledgments in README.md
  • .pre-commit-config.yaml
  • .pre-commit-hooks.yaml
  • DMeta pre-commit hook section in README.md
  • recursive search in clear_all and update_all
  • --verbose flag in CLI
  • modern issue template structure
  • --info flag in CLI

Changed

  • get_microsoft_format function in util.py
  • overwrite_metadata function in functions.py
  • clear_all function in functions.py
  • clear function in functions.py
  • update_all function in functions.py enhanced
  • update function in functions.py

Removed

  • Python 3.6 support
  • old issue template structure

0.3 - 2025-01-13

Removed

  • extract_namespaces function in util.py

Added

  • DMetaBaseError added to dmeta/__init__.py
  • overwrite_metadata function added to functions.py

Changed

  • update function in functions.py refactored
  • clear function in functions.py refactored
  • README.md updated
  • GitHub actions are limited to the dev and main branches
  • Python 3.13 added to test.yml

0.2 - 2024-08-14

Added

  • dmeta/errors.py
  • pptx and xlsx support
  • get_microsoft_format function in util.py
  • SECURITY.md
  • inplace parameter in the clear function in functions.py
  • inplace parameter in the clear_all function in functions.py
  • inplace parameter in the update function in functions.py
  • inplace parameter in the update_all function in functions.py
  • inplace parameter in CLI
  • inplace tests

Changed

  • run_dmeta in functions.py
  • read_json in util.py
  • get_microsoft_format in util.py
  • error messages in params.py
  • clear function in functions.py
  • extract function in util.py
  • remove_format function in util.py
  • clear function in functions.py
  • clear_all function in functions.py
  • update function in functions.py
  • update_all function in functions.py
  • extract_namespaces function in util.py
  • README.md updated

0.1 - 2024-06-19

Added

  • CLI handler
  • main function in __main__.py
  • README.md
  • clear function in functions.py
  • clear_all function in functions.py
  • update function in functions.py
  • update_all function in functions.py
  • run_dmeta function in functions.py
  • dmeta_help function in functions.py
  • extract_namespaces function in util.py
  • remove_format function in util.py
  • extract_docx function in util.py
  • read_json function in util.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmeta-0.4.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmeta-0.4-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file dmeta-0.4.tar.gz.

File metadata

  • Download URL: dmeta-0.4.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for dmeta-0.4.tar.gz
Algorithm Hash digest
SHA256 c66d46a9f0b4128fee3d9918dccc75a66e349c503e9dc73e672b2bf4b3f28af5
MD5 6d8c2f8e1d1cc085d25617425ed1d970
BLAKE2b-256 66627783fcf1a553d7c6331f6db67a89a3117c3300880eabd22e898422e2b7bc

See more details on using hashes here.

File details

Details for the file dmeta-0.4-py3-none-any.whl.

File metadata

  • Download URL: dmeta-0.4-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for dmeta-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a85cda9709f384d1ac225bdb7873362d68b449881af0a6f6edeaf6bb05e585f9
MD5 4e0d7c32aca6322b50cf5f41245d2696
BLAKE2b-256 5dcf40b78cbd8bc77f4bf63adea0a7347511ae9bef01d84e794cc7cd6de56bda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page