Skip to main content

Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.

Project description

lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It provides safe and convenient access to these libraries using the ElementTree API.

It extends the ElementTree API significantly to offer support for XPath, RelaxNG, XML Schema, XSLT, C14N and much more.

To contact the project, go to the project home page or see our bug tracker at https://launchpad.net/lxml

In case you want to use the current in-development version of lxml, you can get it from the subversion repository at http://codespeak.net/svn/lxml/trunk . Running easy_install lxml==dev will install it from http://codespeak.net/svn/lxml/trunk#egg=lxml-dev

2.3alpha1 (2010-06-19)

Features added

  • Keyword argument namespaces in lxml.cssselect.CSSSelector() to pass a prefix-to-namespace mapping for the selector.
  • New function lxml.etree.register_namespace(prefix, uri) that globally registers a namespace prefix for a namespace that newly created Elements in that namespace will use automatically. Follows ElementTree 1.3.
  • Support ‘unicode’ string name as encoding parameter in tostring(), following ElementTree 1.3.
  • Support ‘c14n’ serialisation method in ElementTree.write() and tostring(), following ElementTree 1.3.
  • The ElementPath expression syntax (el.find*()) was extended to match the upcoming ElementTree 1.3 that will ship in the standard library of Python 3.2/2.7. This includes extended support for predicates as well as namespace prefixes (as known from XPath).
  • During regular XPath evaluation, various ESXLT functions are available within their namespace when using libxslt 1.1.26 or later.
  • Support passing a readily configured logger instance into PyErrorLog, instead of a logger name.
  • On serialisation, the new doctype parameter can be used to override the DOCTYPE (internal subset) of the document.
  • New parameter output_parent to XSLTExtension.apply_templates() to append the resulting content directly to an output element.
  • XSLTExtension.process_children() to process the content of the XSLT extension element itself.
  • ISO-Schematron support based on the de-facto Schematron reference ‘skeleton implementation’.
  • XSLT objects now take XPath object as __call__ stylesheet parameters.
  • Enable path caching in ElementPath (el.find*()) to avoid parsing overhead.
  • Setting the value of a namespaced attribute always uses a prefixed namespace instead of the default namespace even if both declare the same namespace URI. This avoids serialisation problems when an attribute from a default namespace is set on an element from a different namespace.
  • XSLT extension elements: support for XSLT context nodes other than elements: document root, comments, processing instructions.
  • Support for strings (in addition to Elements) in node-sets returned by extension functions.
  • Forms that lack an action attribute default to the base URL of the document on submit.
  • XPath attribute result strings have an attrname property.
  • Namespace URIs get validated against RFC 3986 at the API level (required by the XML namespace specification).
  • Target parsers show their target object in the .target property (compatible with ElementTree).

Bugs fixed

  • API is hardened against invalid proxy instances to prevent crashes due to incorrectly instantiated Element instances.
  • Prevent crash when instantiating CommentBase and friends.
  • Export ElementTree compatible XML parser class as XMLTreeBuilder, as it is called in ET 1.2.
  • ObjectifiedDataElements in lxml.objectify were not hashable. They now use the hash value of the underlying Python value (string, number, etc.) to which they compare equal.
  • Parsing broken fragments in lxml.html could fail if the fragment contained an orphaned closing ‘</div>’ tag.
  • Using XSLT extension elements around the root of the output document crashed.
  • lxml.cssselect did not distinguish between x[attr="val"] and x [attr="val"] (with a space). The latter now matches the attribute independent of the element.
  • Rewriting multiple links inside of HTML text content could end up replacing unrelated content as replacements could impact the reported position of subsequent matches. Modifications are now simplified by letting the iterlinks() generator in lxml.html return links in reversed order if they appear inside the same text node. Thus, replacements and link-internal modifications no longer change the position of links reported afterwards.
  • The .value attribute of textarea elements in lxml.html did not represent the complete raw value (including child tags etc.). It now serialises the complete content on read and replaces the complete content by a string on write.
  • Target parser didn’t call .close() on the target object if parsing failed. Now it is guaranteed that .close() will be called after parsing, regardless of the outcome.

Other changes

  • Official support for Python 3.1.2 and later.
  • Static MS Windows builds can now download their dependencies themselves.
  • Element.attrib no longer uses a cyclic reference back to its Element object. It therefore no longer requires the garbage collector to clean up.
  • Static builds include libiconv, in addition to libxml2 and libxslt.

Release history Release notifications

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page