Removing microsoft office files' metadata
Project description
Overview
DMeta is an open source Python package that removes metadata of Microsoft Office files.
| PyPI Counter |
|
| Github Stars |
|
| Branch | main | dev |
| CI |
|
|
Installation
PyPI
- Check Python Packaging User Guide
- Run
pip install dmeta==0.4
Source code
- Download Version 0.4 or Latest Source
- Run
pip install .
Usage
In Python
⚠️ Use in_place to apply the changes directly to the original file.
⚠️in_place flag is False by default.
Clear metadata for a .docx file in place
import os
from dmeta.functions import clear
DOCX_FILE_PATH = os.path.join(os.getcwd(), "sample.docx")
clear(DOCX_FILE_PATH, in_place=True)
Clear metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory
from dmeta.functions import clear_all
clear_all()
Update metadata for a .pptx file in place
import os
from dmeta.functions import update
CONFIG_FILE_PATH = os.path.join(os.getcwd(), "config.json")
DOCX_FILE_PATH = os.path.join(os.getcwd(), "sample.pptx")
update(CONFIG_FILE_PATH, DOCX_FILE_PATH, in_place=True)
Update metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory
import os
from dmeta.functions import update_all
CONFIG_FILE_PATH = os.path.join(os.getcwd(), "config.json")
update_all(CONFIG_FILE_PATH)
CLI
⚠️ You can use dmeta or python -m dmeta to run this program
⚠️ Use --inplace to apply the changes directly to the original file.
Clear metadata for a .docx file in place
dmeta --clear "./test_a.docx" --inplace
Clear metadata for all existing microsoft files (.docx|.pptx|.xlsx) in the current directory
dmeta --clear-all
Update metadata for a .xlsx file in place
dmeta --update "./test_a.xlsx" --config "./config.json" --inplace
Update metadata for all existing microsoft files (.docx|.pptx|.xlsx) files in the current directory
dmeta --update-all --config "./config.json"
Version
dmeta -v
dmeta --version
Info
dmeta --info
Dmeta as pre-commit hook
To ensure that no Microsoft Office files ever enter your repo with embedded metadata, you can use Dmeta’s built-in pre-commit hooks.
1. Install the pre-commit framework
If you don’t already have it:
pip install pre-commit
2. Add Dmeta to your project’s .pre-commit-config.yaml
In your project root, create or update .pre-commit-config.yaml:
repos:
- repo: https://github.com/openscilab/dmeta.git
rev: v0.4 # minimum v0.4 or commit SHA
hooks:
- id: clear-metadata
rev: must exactly match the minimum tag supporting pre-commit hooks or the commit SHA where the targetted.pre-commit-hooks.yamlexists.
3. Install the hook
pre-commit install # or pre_commit install (in windows)
Now, every time you git commit, Dmeta will automatically clear metadata from any Microsoft files in-place.
⚠️ Important: Clean Before You Commit
Do not stage or add Microsoft Office files before removing their metadata.
If you run git add on Office files that still contain embedded metadata, the pre-commit hook will attempt to clean them in-place, which modifies the files after they’ve been staged. As a result, Git will block the commit because the content has changed mid-process.
✅ Suggested Correct Workflow
-
Let the hook run automatically on earlier commits that didn’t add Office files, or run it manually. To do manually you can run
pre-commit run clear-metadata --all-files -
Then:
git add <cleaned-files> git commit -m "Your message"
Supported files
| File format | support |
|---|---|
| Microsoft Word (.docx) | ✅ |
| Microsoft PowerPoint (.pptx) | ✅ |
| Microsoft Excel (.xlsx) | ✅ |
Issues & bug reports
Just fill an issue and describe it. We'll check it ASAP! or send an email to dmeta@openscilab.com.
- Please complete the issue template
You can also join our discord server
Acknowledgments
Python Software Foundation (PSF) granted DMeta library partially for version(s) 0.4. PSF is the organization behind Python. Their mission is to promote, protect, and advance the Python programming language and to support and facilitate the growth of a diverse and international community of Python programmers.
Show your support
Star this repo
Give a ⭐️ if this project helped you!
Donate to our project
If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
0.4 - 2025-06-16
Added
AcknowledgmentsinREADME.md.pre-commit-config.yaml.pre-commit-hooks.yaml- DMeta pre-commit hook section in
README.md - recursive search in
clear_allandupdate_all --verboseflag in CLI- modern issue template structure
--infoflag in CLI
Changed
get_microsoft_formatfunction inutil.pyoverwrite_metadatafunction infunctions.pyclear_allfunction infunctions.pyclearfunction infunctions.pyupdate_allfunction infunctions.pyenhancedupdatefunction infunctions.py
Removed
- Python 3.6 support
- old issue template structure
0.3 - 2025-01-13
Removed
extract_namespacesfunction inutil.py
Added
DMetaBaseErroradded todmeta/__init__.pyoverwrite_metadatafunction added tofunctions.py
Changed
updatefunction infunctions.pyrefactoredclearfunction infunctions.pyrefactoredREADME.mdupdated- GitHub actions are limited to the
devandmainbranches Python 3.13added totest.yml
0.2 - 2024-08-14
Added
dmeta/errors.pypptxandxlsxsupportget_microsoft_formatfunction inutil.pySECURITY.mdinplaceparameter in theclearfunction infunctions.pyinplaceparameter in theclear_allfunction infunctions.pyinplaceparameter in theupdatefunction infunctions.pyinplaceparameter in theupdate_allfunction infunctions.pyinplaceparameter in CLIinplacetests
Changed
run_dmetainfunctions.pyread_jsoninutil.pyget_microsoft_formatinutil.py- error messages in
params.py clearfunction infunctions.pyextractfunction inutil.pyremove_formatfunction inutil.pyclearfunction infunctions.pyclear_allfunction infunctions.pyupdatefunction infunctions.pyupdate_allfunction infunctions.pyextract_namespacesfunction inutil.pyREADME.mdupdated
0.1 - 2024-06-19
Added
CLIhandlermainfunction in__main__.pyREADME.mdclearfunction infunctions.pyclear_allfunction infunctions.pyupdatefunction infunctions.pyupdate_allfunction infunctions.pyrun_dmetafunction infunctions.pydmeta_helpfunction infunctions.pyextract_namespacesfunction inutil.pyremove_formatfunction inutil.pyextract_docxfunction inutil.pyread_jsonfunction inutil.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dmeta-0.4.tar.gz.
File metadata
- Download URL: dmeta-0.4.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c66d46a9f0b4128fee3d9918dccc75a66e349c503e9dc73e672b2bf4b3f28af5
|
|
| MD5 |
6d8c2f8e1d1cc085d25617425ed1d970
|
|
| BLAKE2b-256 |
66627783fcf1a553d7c6331f6db67a89a3117c3300880eabd22e898422e2b7bc
|
File details
Details for the file dmeta-0.4-py3-none-any.whl.
File metadata
- Download URL: dmeta-0.4-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a85cda9709f384d1ac225bdb7873362d68b449881af0a6f6edeaf6bb05e585f9
|
|
| MD5 |
4e0d7c32aca6322b50cf5f41245d2696
|
|
| BLAKE2b-256 |
5dcf40b78cbd8bc77f4bf63adea0a7347511ae9bef01d84e794cc7cd6de56bda
|