Skip to main content

Library for reading and writing ECMA 376-2 (Open Packaging Conventions) files

Project description

PyECMA376-2

A Python implementation of the Open Packaging Conventions (OPC).

ECMA 376 Part 2 defines the “Open Packaging Conventions”, which is the packaging format to be used by the Office Open XML file formats. It specifies, how to represent multiple logical files (“Parts”) within a physical Package (as a ZIP container), how to express semantic relationships between those Parts (using accompanying XML Parts), and how to add meta data and cryptographic signatures to the Package. The format is defined in two steps: an abstract logic package model with Parts, Content Types and Relationships, and a physical mapping of this package model to PKZIP files.

This Python package aims to implement both, the logical model and physical mapping of OPC package files, to allow reading and writing such files. However, it does not provide functionality to deal with the packages' payload, i.e. there is not functionality included to parse MS Word Documents from .docx files etc.

Features of PyECMA376-2

  • reading OPC package files

    • listing contained Parts (incl. Content Type)
    • reading Parts as file-like objects (incl. interleaved Parts)
    • parsing and following Relationships
    • parsing package meta data (“Core Properties”)
  • writing OPC package files

    • creating and writing Parts (via writable file-like objects, incl. interleaved Parts)
    • adding Relationships (as simple Python objects)
    • adding Content Type information
    • composing and writing package meta data (“Core Properties”)

Modifying packages in-place is not supported.

Currently Missing Features

  • reading/verifying/creating cryptographic signatures

Dependencies

This package requires lxml for XML reading and writing (with proper XML namespaces support). Apart from that only the Python standard library is required.

The Python interpreter must support Python 3.6 or higher.

Usage

Short example of reading an OPC package file:

import pyecma376_2

with pyecma376_2.ZipPackageReader("document.docx") as reader:
    # List parts in package
    for part_name, content_type in reader.list_parts():
        print(part_name)
    
    # Get Relationship of type "…/officeDocument" from package-level Relationships
    document_part_name = reader.get_related_parts_by_type("/")[
        'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument'][0]

    # Read core properties (package meta data)
    core_props = reader.get_core_properties()
    print(core_props.creator)

    # Open part as (binary) file-like object
    with reader.open_part(document_part_name) as part:
        # XML parsing and document interpretation goes here
        print(part.read().decode())

Short example of creating and writing into an OPC package file:

import pyecma376_2
import datetime

with pyecma376_2.ZipPackageWriter("new_document.myx") as writer:
    # Add a part
    with writer.open_part("/example/document.txt", "text/plain") as part:
        part.write("Lorem ipsum dolor sit amet.".encode())

    # Write core properties (meta data)
    # To make those work, we need to add the RELATIONSHIP_TYPE_CORE_PROPERTIES relationship below. 
    cp = pyecma376_2.OPCCoreProperties()
    cp.created = datetime.datetime.now()
    with writer.open_part(pyecma376_2.DEFAULT_CORE_PROPERTIES_NAME, "application/xml") as part:
        cp.write_xml(part)
    
    # Write the packages root relationships
    writer.write_relationships([
        pyecma376_2.OPCRelationship("r1", "http://example.com/my-package-relationship-id", "http://example.com",
                                    pyecma376_2.OPCTargetMode.EXTERNAL),
        pyecma376_2.OPCRelationship("r2", "http://example.com/my-document-rel", "example/document.txt",
                                    pyecma376_2.OPCTargetMode.INTERNAL),
        pyecma376_2.OPCRelationship("r3", pyecma376_2.RELATIONSHIP_TYPE_CORE_PROPERTIES,
                                    pyecma376_2.DEFAULT_CORE_PROPERTIES_NAME,
                                    pyecma376_2.OPCTargetMode.INTERNAL),
    ])
    
    # The Content Types Stream with all parts' ContentTypes is automatically added when closing the package
    # Modify `writer.content_types` to change Content Types representation and use `writer.write_content_types_stream()`
    # for premature serialization/output.

Package Architecture

The architecture of this package follows the logical concept of the ECMA standard: The package_model module defines abstract OPCPackageReader and OPCPackageWriter classes that implement all the logical package model functionality, but omit the physical mapping to ZIP files. This mapping is reflected in the abstract methods list_items(), open_item() and create_item() which are then implemented by the ZipPackageReader and ZipPackageWriter classes from the zip_package module.

Auxiliary classes and functions like OPCRelationship, part_realpath and normalize_part_name are also contained in the package_model module.

License

This package has been developed by Michael Thies at the Chair of Information and Automation Systems for Process and Material Technology (PLT) at RWTH Aachen University.

It is published under the terms of Apache License v2. See LICENSE and NOTICE files for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyecma376_2-1.0.2.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyecma376_2-1.0.2-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file pyecma376_2-1.0.2.tar.gz.

File metadata

  • Download URL: pyecma376_2-1.0.2.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for pyecma376_2-1.0.2.tar.gz
Algorithm Hash digest
SHA256 8b54b4cd4d70151af07b60693fc0b56e1c3604e149074fe307be9fd054d03f2e
MD5 944e7ab7b2016d49c66af40fe64dfa54
BLAKE2b-256 dd7fce4a7dce80651ad0680ff16795b4e3723784e1d20d9d68f0c2f10c2c857e

See more details on using hashes here.

File details

Details for the file pyecma376_2-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: pyecma376_2-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for pyecma376_2-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 97972421f614ba411c055d5ba0c70a88346587883c3e3d4655838c743ae24ae5
MD5 665a5697f14db7cabb4c3ac9479682a7
BLAKE2b-256 410c7d1faa19ded19969522e8657d6029fe0855ece7bf18bc67bc99ebd324386

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page