Skip to main content

Provides resources to handle OpenXML documents from Python.

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

openxmllib

openxmllib is a set of tools that deals with the new ECMA 376 office file formats known as OpenXML.

http://www.ecma-international.org/publications/standards/Ecma-376.htm

OpenXML format is actually used by Microsoft Office 2007. Apple iWork’08 and OpenOffice 2.2 have filters to use this format too.

Features

Tested features

  • Extract words from a document for indexing purpose.

  • Get metadata from a document

Planned features

  • Transform a document to HTML

Public API

>>> import openxmllib
>>> doc = openxmllib.openXmlDocument('office.docx')
>>> # Raises a ValueError on not supported office files.
>>> doc.mimeType
'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
>>> doc.coreProperties # Keys may depend on application
{'title': u'blah...', u'creator': u'John Doe', ...}
>>> doc.extendedProperties # Keys may depend on application
{'Words': u'312', 'Application': u'Your favorite word processor', ...}
>>> doc.customProperties # May return an empty mapping
{'My property': u'My value', ...}
>>> doc.allProperties # Merges core+extended+custom properties (see above)
{...}
>>> doc.indexableText(include_properties=False)
u'all the words of that document body'
>>> doc.indexableText(include_properties=True)
u'all the words of that document body and all properties values'

Copying and License

Copyright (c) 2008 Gilles Lenfant

This software is subject to the provisions of the GNU General Public License, Version 2.0 (GPL). A copy of the GPL should accompany this distribution. THIS SOFTWARE IS PROVIDED “AS IS” AND ANY AND ALL EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE

More details in the COPYING file included in this package.

Status

This software is in alpha quality, has been tested only on Mac OSX with Python 2.4 and lxml 1.3.6.

It should work on other platforms, with Python 2.5, perhaps with other versions of lxml.

Requirements

  • lxml 1.3.6: get lxml with easy_install. e.g:

    $ easy_install lxml==1.3.6

Warning: openxmllib is untested with the new lxml 2 (alpha state when writing this line). It may or may not work with this lxml 2, but please don’t report bugs found in such situation until lxml 2 officially required here.

Installation

$ python setup.py install

From now you can “import openxmllib” in your Python apps and use the “openxmlinfo.py” command line utility.

Gotchas

Be aware that most text data coming from the various openxmllib services might be us-ascii or Unicode. This is a side effect of lxml (bug or feature ?). It’s up to your application to convert these texts to the appropriate charset.

TODO: File this to lxml tracker or ML

We do not actually handle exceptions due to malformed XML or various unexpected structures. You should handle the various (potential) problems in a try (…) except (…) block in your application.

Testing

Note that testing does not require the installation:

$ cd tests
$ python runalltests.py

Credits

Gilles Lenfant <gilles dot lenfant at gmail dot com>

Future features and bugfixes

Features

Support for standard mimetypes module

Add our mime types to standard Python module.

Support for URLs

>>> from openxmllib import openXmlDocument
>>> doc = openXmlDocument('http://www.mydomain.com/mydoc.docx')

Human readable plain text conversion

>>> from openxmllib import openXmlDocument
>>> doc = openXmlDocument(...)
>>> doc.textDocument(target_directory)

(this may be not possible for spreadsheets)

HTML conversions

>>> from openxmllib import openXmlDocument
>>> doc = openXmlDocument(...)
>>> doc.htmlDocument(target_directory)

This requires to find open source XSLT stylesheets.

Document generation

FIXME: more to say here

Packaging

Installation

Turn this into an egg (“easy_install openxmllib”).

Documentation

Add epydoc generated API documentation in doc/api.

Utility

Install “openxmlinfo.py” on Windows.

Bugfixes

…Waiting for feedback ;o)

History

1.0.1

  • Egg-ification. [kev_AT_coolcavemen.com]

1.0.0

  • First public version. [gilles.lenfant]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openxmllib-1.0.1.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openxmllib-1.0.1-py2.4.egg (5.0 kB view details)

Uploaded Egg

File details

Details for the file openxmllib-1.0.1.tar.gz.

File metadata

  • Download URL: openxmllib-1.0.1.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for openxmllib-1.0.1.tar.gz
Algorithm Hash digest
SHA256 52eb1662ef766c002c3ffbf5f806a2779bcca5d3509156f195a943bab63d118e
MD5 5ea0d8b231a3326c3dc6cc9538e07f39
BLAKE2b-256 9f89e03ab6abe8f37c90c33d46c9cc30a36cf111013eb5338c257567188d1c1e

See more details on using hashes here.

File details

Details for the file openxmllib-1.0.1-py2.4.egg.

File metadata

File hashes

Hashes for openxmllib-1.0.1-py2.4.egg
Algorithm Hash digest
SHA256 286e834f8b54f44122a8f6ac528be3bb1da21479c7edbb59a0df4f3ef9857255
MD5 b69aa5487547f6df6da43e2f63c01f5e
BLAKE2b-256 54c55cc1ec78a9ef0eb3579588c6b594cc0b01cf6f572f0c84e33c77570e99b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page