Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
Project description
lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It provides safe and convenient access to these libraries using the ElementTree API.
It extends the ElementTree API significantly to offer support for XPath, RelaxNG, XML Schema, XSLT, C14N and much more.
To contact the project, go to the project home page or see our bug tracker at https://launchpad.net/lxml
In case you want to use the current in-development version of lxml, you can get it from the subversion repository at http://codespeak.net/svn/lxml/trunk . Running easy_install lxml==dev will install it from http://codespeak.net/svn/lxml/trunk#egg=lxml-dev
2.1beta3 (2008-06-19)
Features added
Major overhaul of tools/xpathgrep.py script.
Pickling ElementTree objects in lxml.objectify.
Support for parsing from file-like objects that return unicode strings.
New function etree.cleanup_namespaces(el) that removes unused namespace declarations from a (sub)tree (experimental).
XSLT results support the buffer protocol in Python 3.
Polymorphic functions in lxml.html that accept either a tree or a parsable string will return either a UTF-8 encoded byte string, a unicode string or a tree, based on the type of the input. Previously, the result was always a byte string or a tree.
Support for Python 2.6 and 3.0 beta.
File name handling now uses a heuristic to convert between byte strings (usually filenames) and unicode strings (usually URLs).
Parsing from a plain file object frees the GIL under Python 2.x.
Running iterparse() on a plain file (or filename) frees the GIL on reading under Python 2.x.
Conversion functions html_to_xhtml() and xhtml_to_html() in lxml.html (experimental).
Most features in lxml.html work for XHTML namespaced tag names (experimental).
Bugs fixed
ElementTree.parse() didn’t handle target parser result.
Crash in Element class lookup classes when the __init__() method of the super class is not called from Python subclasses.
A number of problems related to unicode/byte string conversion of filenames and error messages were fixed.
Building on MacOS-X now passes the “flat_namespace” option to the C compiler, which reportedly prevents build quirks and crashes on this platform.
Windows build was broken.
Rare crash when serialising to a file object with certain encodings.
Other changes
Non-ASCII characters in attribute values are no longer escaped on serialisation.
Passing non-ASCII byte strings or invalid unicode strings as .tag, namespaces, etc. will result in a ValueError instead of an AssertionError (just like the tag well-formedness check).
Up to several times faster attribute access (i.e. tree traversal) in lxml.objectify.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.