Skip to main content

OOoPy: Modify OpenOffice.org documents in Python

Project description

width:210 :alt: SourceForge.net Logo :target: http://sourceforge.net

OOoPy: Modify OpenOffice.org documents in Python

OpenOffice.org (OOo) documents are ZIP archives containing several XML files. Therefore it is easy to inspect, create, or modify OOo documents. OOoPy is a library in Python for these tasks with OOo documents. To not reinvent the wheel, OOoPy uses an existing XML library, ElementTree by Fredrik Lundh. OOoPy is a thin wrapper around ElementTree using Python’s ZipFile to read and write OOo documents.

In addition to being a wrapper for ElementTree, OOoPy contains a framework for applying XML transforms to OOo documents. Several Transforms for OOo documents exist, e.g., for changing OOo fields (OOo Insert-Fields menu) or using OOo fields for a mail merge application. Some other transformations for modifying OOo settings and meta information are also given as examples.

Applications like this come in handy in applications where calling native OOo is not an option, e.g., in server-side Web applications.

Don’t be alarmed by the Alpha-Status of the Software: Reading and writing of OOo documents is stable as well as most transforms.

The only problematic transform is mailmerge: The OOo format is well documented but there are ordering constraints in the body of an OOo document. I’ve not yet figured out all the tags and their order in the OOo body. Another known shortcoming of OOoPys mailmerge is the renumbering of body parts of an OOo document. Individual parts (like e.g., frames, sections, tables) need to have their own unique names. After a mailmerge, there are duplicate names for some items. So far I’m renumbering only frames, sections, and tables. See the renumber objects at the end of ooopy/Transforms.py. So if you encounter missing parts of the mailmerged document, check if there are some renumberings missing or send me a bug report.

Another reason for the Alpha-Status is stability of the API. I may still change the API slightly. There were some slight changes to the API when supporting the open document format introduced with OOo 2.0.

There is currently not much documentation except for a python doctest in OOoPy.py and Transformer.py and the command-line utilities. For running these test, after installing ooopy (assuming here you installed using python2.4 into /usr/local):

cd /usr/local/share/ooopy
python2.4 run_doctest.py /usr/local/lib/python2.4/site-packages/ooopy/Transformer.py
python2.4 run_doctest.py /usr/local/lib/python2.4/site-packages/ooopy/OOoPy.py

Both should report no failed tests. For running the doctest on python2.3 with the metaclass trickery of autosuper, see the file run_doctest.py. For later versions of python the bug in doctest is already fixed.

Usage

See the online documentation, e.g.:

% python
>>> from ooopy.OOoPy import OOoPy
>>> help (OOoPy)
>>> from ooopy.Transformer import Transformer
>>> help (Transformer)

Help, I’m getting an AssertionError traceback from Transformer, e.g.:

Traceback (most recent call last):
  File "./replace.py", line 17, in ?
    t = Transformer(Field_Replace(replace = replace_dictionary))
  File "/usr/local/lib/python2.4/site-packages/ooopy/Transformer.py", line 1226, in __init__
    assert (mimetype in mimetypes)
AssertionError

The API changed slightly when implementing handling of different versions of OOo files. Now the first parameter you pass to the Transformer constructor is the mimetype of the OpenOffice.org document you intend to transform. The mimetype can be fetched from another opened OOo document, e.g.:

ooo = OOoPy (infile = 'test.odt', outfile = 'test_out.odt')
t = Transformer(ooo.mimetype, ...

A, well, there are command-line utilities now:

  • ooo_cat for concatenating several OOo files into one
  • ooo_fieldreplace for replacing fields in an OOo document
  • ooo_mailmerge for doing a mailmerge from a template OOo document and a CSV (comma separated values) input
  • ooo_as_text for getting the text from an OOo-File (e.g., for doing a “grep” on the output).

Project details


Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page