Skip to main content

GroupDocs.Metadata for Python via .NET - Read, write and remove metadata from documents and images

Project description

banner

PyPI PyPI - Python Version

Product Page | Docs | Demos | API Reference | Blog | Free Support | Temporary License

GroupDocs.Metadata for Python via .NET is a metadata management API that reads, edits, and removes metadata from documents, spreadsheets, presentations, PDFs, images, audio, and video — 70+ file formats. It works with the major metadata standards (XMP, EXIF, IPTC, Image Resource Blocks, ID3, document properties) through one unified, format-independent API.

Get Started

pip install groupdocs-metadata-net
from groupdocs.metadata import Metadata

with Metadata("document.docx") as metadata:
    root = metadata.get_root_package()
    print("Format:", root.file_type.file_format)

How It Works

The package is a self-contained Python wheel (~160 MB) that bundles the embedded .NET runtime and everything needed to process metadata. No external software installation is required — just pip install and start reading metadata. The wheel works across Python 3.5 – 3.14 on Windows, Linux, and macOS (Intel + Apple Silicon).

Features

  • Read, edit, and remove metadata from 70+ formats with one unified API.
  • Metadata standards: XMP, EXIF, IPTC IIM, Image Resource Blocks, ID3 (ID3v1/ID3v2), Lyrics3, APE.
  • Search engine: find, update, add, and remove properties with simple Python predicates and predefined tags.
  • One-call sanitize: strip every detected property before sharing a file.
  • Document inspection: detect format/MIME type, page count, encryption, digital signatures, comments, and hidden pages.
  • Export: dump the metadata tree to CSV, XLSX, JSON, or XML.
  • Cross-Platform: Windows x64/x86, Linux x64, macOS x64/ARM64.

Common Tasks

  • Detect a file's real format and MIME type by its internal structure
  • Read EXIF/XMP/IPTC properties from photos
  • Read and edit ID3/APE/Lyrics tags in audio files
  • Strip author, comments, and revision history from Office documents before publishing
  • Find and remove properties that match a condition (by tag, name, type, or value)
  • Export a document's full metadata tree to a spreadsheet for auditing
  • Feed extracted metadata into a search, compliance, or DAM pipeline

Supported File Formats

For a complete list, see supported formats.

  • Microsoft Office (Word, Excel, PowerPoint, Visio, Project, OneNote)
  • PDF
  • OpenDocument (ODT, ODS, ODP)
  • Images (JPEG, PNG, GIF, BMP, TIFF, WebP, DICOM, JPEG 2000, PSD, HEIF/HEIC, CR2/DNG and other RAW)
  • Audio (MP3, WAV, OGG)
  • Video (AVI, MOV, FLV, ASF, Matroska/MKV)
  • Email (EML, MSG)
  • eBook (EPUB)
  • Archives (ZIP, RAR, 7Z, TAR)
  • CAD (DWG, DXF)
  • 3D (FBX, STL, 3DS, DAE)
  • Fonts (TTF, OTF) and other formats (vCard, torrent)

Examples

Read metadata

from groupdocs.metadata import Metadata

with Metadata("input.docx") as metadata:
    for prop in metadata.find_properties(lambda p: True):
        print(f"{prop.name} = {prop.value}")

Get document info

from groupdocs.metadata import Metadata

with Metadata("input.xlsx") as metadata:
    info = metadata.get_document_info()
    print("Format:", info.file_type.file_format)
    print("MIME type:", info.file_type.mime_type)
    print("Pages:", info.page_count)
    print("Size:", info.size, "bytes")
    print("Encrypted:", info.is_encrypted)

Find properties by tag

from groupdocs.metadata import Metadata
from groupdocs.metadata.tagging import Tags

with Metadata("input.docx") as metadata:
    authors = metadata.find_properties(lambda p: Tags.person.creator in list(p.tags))
    for prop in authors:
        print(prop.name, "=", prop.value)

Set / update properties and save

from datetime import datetime
from groupdocs.metadata import Metadata
from groupdocs.metadata.common import PropertyValue
from groupdocs.metadata.tagging import Tags

with Metadata("input.docx") as metadata:
    affected = metadata.set_properties(
        lambda p: Tags.time.created in list(p.tags),
        PropertyValue(datetime.now()),
    )
    print("Updated:", affected)
    metadata.save("output.docx")

Remove all metadata (sanitize)

from groupdocs.metadata import Metadata

with Metadata("input.pdf") as metadata:
    removed = metadata.sanitize()
    print("Removed:", removed)
    metadata.save("clean.pdf")

Export the metadata tree

from groupdocs.metadata import Metadata
from groupdocs.metadata.export import ExportManager, ExportFormat

with Metadata("input.pdf") as metadata:
    properties = list(metadata.find_properties(lambda p: True))
    ExportManager(properties).export("metadata.xlsx", ExportFormat.XLSX)

Load from a stream

import io
from groupdocs.metadata import Metadata

with open("input.docx", "rb") as stream:
    with Metadata(stream) as metadata:
        print("Format:", metadata.file_format)

buf = io.BytesIO(downloaded_bytes)
with Metadata(buf) as metadata:
    print(metadata.get_document_info().file_type.file_format)

AI Agent & LLM Friendly

This package is designed for seamless integration with AI agents, LLMs, and automated code generation tools.

  • AGENTS.md in the package — AI coding assistants (Claude Code, Cursor, GitHub Copilot) auto-discover the API surface, usage patterns, and troubleshooting tips from the installed package
  • MCP server — connect your AI tool to GroupDocs documentation for on-demand API lookups:
    { "mcpServers": { "groupdocs-docs": { "url": "https://docs.groupdocs.com/mcp" } } }
    
  • Machine-readable docs — full documentation available as plain text for RAG and LLM context:
    • Single file: https://docs.groupdocs.com/metadata/python-net/llms-full.txt
    • Per page: append .md to any docs URL

Evaluation Mode

The API works without a license in evaluation mode, with these limitations:

  • Only the first few properties of each metadata package are read.
  • Saving files is disabledsave() raises an "Evaluation only" exception.

To remove these limitations, apply a license or request a temporary license:

from groupdocs.metadata import License
License().set_license("path/to/license.lic")

Or set the environment variable (auto-applied at import):

export GROUPDOCS_LIC_PATH="path/to/license.lic"

Troubleshooting

Issue Platform Fix
System.Drawing.Common is not supported Linux/macOS apt-get install libgdiplus (Linux) or brew install mono-libgdiplus (macOS)
The type initializer for 'Gdip' threw an exception macOS brew install mono-libgdiplus
Errors processing images that need fonts Linux apt-get install ttf-mscorefonts-installer fontconfig && fc-cache -f
DOTNET_SYSTEM_GLOBALIZATION_INVARIANT errors Linux Do NOT set this variable. ICU must be available.

System Requirements

  • Python 3.5 - 3.14
  • Windows x64/x86, Linux x64, macOS x64/ARM64

More Resources

Also available for other platforms: .NET | Java | Node.js


Product Page | Docs | Demos | API Reference | Blog | Free Support | Temporary License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

groupdocs_metadata_net-26.5.0-py3-none-win_amd64.whl (170.3 MB view details)

Uploaded Python 3Windows x86-64

groupdocs_metadata_net-26.5.0-py3-none-macosx_11_0_arm64.whl (169.4 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

groupdocs_metadata_net-26.5.0-py3-none-macosx_10_14_x86_64.whl (171.7 MB view details)

Uploaded Python 3macOS 10.14+ x86-64

File details

Details for the file groupdocs_metadata_net-26.5.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for groupdocs_metadata_net-26.5.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 2c5d9b69ac87d639f8cbe8a3ba9ae156db5b66071daefe5f971fbfaacd1066ae
MD5 fe98b039372a7725ab478b9739e1dada
BLAKE2b-256 60128c00c137dc05b7e1f80b32102c0dd46c702bdb41b30ea80039ea773980e0

See more details on using hashes here.

File details

Details for the file groupdocs_metadata_net-26.5.0-py3-none-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for groupdocs_metadata_net-26.5.0-py3-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ce16af30bef73cdd3b5b60a11c9fadea371d4fd226742792bb4b7a072fe4ee32
MD5 144865a72a18ac021c252d88400f54e9
BLAKE2b-256 ca900196e6236b35881723c45634e64f63d57fcb509f25d651431055e3a63758

See more details on using hashes here.

File details

Details for the file groupdocs_metadata_net-26.5.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for groupdocs_metadata_net-26.5.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a875a8883e1d0e642128f6d9a4784572e261f0e855d3ccff0020cb3826474027
MD5 1619817f71387c684868c1e89eff3c77
BLAKE2b-256 e888f1066db8784a87f3351e7fefe48f3eb36808fc7b7e2a6b08891390bf5d4c

See more details on using hashes here.

File details

Details for the file groupdocs_metadata_net-26.5.0-py3-none-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for groupdocs_metadata_net-26.5.0-py3-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 4edd97c99ddceeec81f530f4dfb68d375ed291fcf4b6ba38c3dd70fe6ce2b31a
MD5 71ae96edce4f28ec17950f4167cd27ed
BLAKE2b-256 fbe7c48a08200f27efe2121be2db8681d55f07edc9919081882ebd23fa90638e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page