Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It provides safe and convenient access to these libraries using the ElementTree API.
It extends the ElementTree API significantly to offer support for XPath, RelaxNG, XML Schema, XSLT, C14N and much more.
In case you want to use the current in-development version of lxml, you can get it from the github repository at https://github.com/lxml/lxml . Note that this requires Cython to build the sources, see the build instructions on the project home page. To the same end, running easy_install lxml==dev will install lxml from https://github.com/lxml/lxml/tarball/master#egg=lxml-dev if you have an appropriate version of Cython installed.
After an official release of a new stable series, bug fixes may become available at https://github.com/lxml/lxml/tree/lxml-2.3 . Running easy_install lxml==2.3bugfix will install the unreleased branch state from https://github.com/lxml/lxml/tarball/lxml-2.3#egg=lxml-2.3bugfix as soon as a maintenance branch has been established. Note that this requires Cython to be installed at an appropriate version for the build.
- lxml.objectify.deannotate() has a new boolean option cleanup_namespaces to remove the objectify namespace declarations (and generally clean up the namespace declarations) after removing the type annotations.
- lxml.objectify gained its own SubElement() function as a copy of etree.SubElement to avoid an otherwise redundant import of lxml.etree on the user side.
- Fixed the “descendant” bug in cssselect a second time (after a first fix in lxml 2.3.1). The previous change resulted in a serious performance regression for the XPath based evaluation of the translated expression. Note that this breaks the usage of some of the generated XPath expressions as XSLT location paths that previously worked in 2.3.1.
- Fixed parsing of some selectors in cssselect. Whitespace after combinators “>”, “+” and “~” is now correctly ignored. Previously is was parsed as a descendant combinator. For example, “div> .foo” was parsed the same as “div>* .foo” instead of “div>.foo”.