Skip to main content

Provides resources to handle OpenXML documents from Python.

Project description

openxmllib

openxmllib is a set of tools that deals with the new ECMA 376 office file formats known as OpenXML.

http://www.ecma-international.org/publications/standards/Ecma-376.htm

OpenXML format is actually used by Microsoft Office 2007. Apple iWork’08 and OpenOffice 2.2 have filters to use this format too.

Features

Tested features

  • Extract words from a document for indexing purpose.
  • Get metadata from a document

Planned features

  • Transform a document to HTML

Public API

>>> import openxmllib
>>> doc = openxmllib.openXmlDocument('office.docx')
>>> # Raises a ValueError on not supported office files.
>>> doc.mimeType
'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
>>> doc.coreProperties # Keys may depend on application
{'title': u'blah...', u'creator': u'John Doe', ...}
>>> doc.extendedProperties # Keys may depend on application
{'Words': u'312', 'Application': u'Your favorite word processor', ...}
>>> doc.customProperties # May return an empty mapping
{'My property': u'My value', ...}
>>> doc.allProperties # Merges core+extended+custom properties (see above)
{...}
>>> doc.indexableText(include_properties=False)
u'all the words of that document body'
>>> doc.indexableText(include_properties=True)
u'all the words of that document body and all properties values'

Copying and License

Copyright (c) 2008 Gilles Lenfant

This software is subject to the provisions of the GNU General Public License, Version 2.0 (GPL). A copy of the GPL should accompany this distribution. THIS SOFTWARE IS PROVIDED “AS IS” AND ANY AND ALL EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE

More details in the COPYING file included in this package.

Status

This software is in alpha quality, has been tested only on Mac OSX with Python 2.4 and lxml 1.3.6.

It should work on other platforms, with Python 2.5, perhaps with other versions of lxml.

Requirements

  • lxml 1.3.6: get lxml with easy_install. e.g:

    $ easy_install lxml==1.3.6
    

Warning: openxmllib is untested with the new lxml 2 (alpha state when writing this line). It may or may not work with this lxml 2, but please don’t report bugs found in such situation until lxml 2 officially required here.

Installation

$ python setup.py install

From now you can “import openxmllib” in your Python apps and use the “openxmlinfo.py” command line utility.

Gotchas

Be aware that most text data coming from the various openxmllib services might be us-ascii or Unicode. This is a side effect of lxml (bug or feature ?). It’s up to your application to convert these texts to the appropriate charset.

TODO: File this to lxml tracker or ML

We do not actually handle exceptions due to malformed XML or various unexpected structures. You should handle the various (potential) problems in a try (…) except (…) block in your application.

Testing

Note that testing does not require the installation:

$ cd tests
$ python runalltests.py

Credits

Gilles Lenfant <gilles dot lenfant at gmail dot com>

Future features and bugfixes

Features

Support for standard mimetypes module

Add our mime types to standard Python module.

Support for URLs

>>> from openxmllib import openXmlDocument
>>> doc = openXmlDocument('http://www.mydomain.com/mydoc.docx')

Human readable plain text conversion

>>> from openxmllib import openXmlDocument
>>> doc = openXmlDocument(...)
>>> doc.textDocument(target_directory)

(this may be not possible for spreadsheets)

HTML conversions

>>> from openxmllib import openXmlDocument
>>> doc = openXmlDocument(...)
>>> doc.htmlDocument(target_directory)

This requires to find open source XSLT stylesheets.

Document generation

FIXME: more to say here

Packaging

Installation

Turn this into an egg (“easy_install openxmllib”).

Documentation

Add epydoc generated API documentation in doc/api.

Utility

Install “openxmlinfo.py” on Windows.

Bugfixes

…Waiting for feedback ;o)

History

1.0.2

  • Fix bad “egging”. [kev_AT_coolcavemen_DOT_com]

1.0.1

  • Egg-ification. [kev_AT_coolcavemen_DOT_com]

1.0.0

  • First public version. [gilles.lenfant]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for openxmllib, version 1.0.2
Filename, size File type Python version Upload date Hashes
Filename, size openxmllib-1.0.2-py2.4.egg (24.8 kB) File type Egg Python version 2.4 Upload date Hashes View
Filename, size openxmllib-1.0.2.tar.gz (55.9 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page