Skip to main content

Utilities for building and manipulating ElementTrees

Project description

Build Status PyPI version

Klon is a collection of Python utilities for manipulating ElementTrees. It's a thin-ish, transparent wrapper around the lxml.etree module.

klon.build_etree

Source code: klon/build.py

A utility for building element trees using list and string literals.

>>> from klon import build_etree
>>> etree = build_etree(
...     'html',
...     [
...         'head',
...         ['title', 'Test Document'],
...     ],
...     [
...         'body',
...         ['h1#title', 'This is a test'],
...         ['a', {'href': '/page'}, ['img', {'src': 'image.jpg'}]],
...         [
...             'p.text',
...             'This is a text',
...             ['br'],
...             'This is a tail',
...         ],
...     ],
... )

Nested lists are translated to nested elements:

  • the first element in the list must be a string, and becomes the tag name
  • optionally, the second element can be a dict, specifying tag attributes
  • any other elements become the tag's children

As a convenience, the id and class attributes can be set directly from the tag name string, using CSS-like syntax: tag#id and tag.class.

klon.tostring

Source code: klon/utils.py

A thin wrapper around lxml.etree.tostring.

Defaults to HTML rules for rendering the element tree to a string. In these examples we use method="xml" to achieve nicer pretty-printing.

>>> from klon import tostring
>>> print(tostring(etree, pretty_print=True, method="xml"))
<html>
  <head>
    <title>Test Document</title>
  </head>
  <body>
    <h1 id="title">This is a test</h1>
    <a href="/page">
      <img src="image.jpg"/>
    </a>
    <p class="text">This is a text<br/>This is a tail</p>
  </body>
</html>

The main difference with the underlying LXML function is that encoding=str by default, i.e. it produces strings by default, rather than bytes.

klon.extract_text

Source code: klon/text.py

Extracts all text from the given node and its descendants.

By default, all contiguous whitespace is normalized to a single ASCII space, and so the output will always be a single line of text. However if multiline=True is specified, paragraph-breaking tags are preserved, in the same way that a web browser would. Other whitespace is still normalized, but the output now contains both ASCII spaces and ASCII newlines.

>>> from klon import extract_text

>>> body = etree.find('body')  # using the same example etree defined above
>>> print(tostring(body, pretty_print=True, method="xml"))
<body>
  <h1 id="title">This is a test</h1>
  <a href="/page">
    <img src="image.jpg"/>
  </a>
  <p class="text">This is a text<br/>This is a tail</p>
</body>

>>> extract_text(body)
'This is a test This is a text This is a tail'

>>> extract_text(body, multiline=True)
'This is a test\n\nThis is a text\nThis is a tail'

Note that the <p> tag translates to a double newline, while the <br> tag translates to a single \n, mimicking how a browser renders them.

klon.detach

Source code: klon/utils.py

Takes one node as argument, and removes it from its tree. Takes care to preserve the node's tail text by reattaching it to the correct position in the tree.

>>> from klon import detach

>>> print(tostring(body, pretty_print=True, method="xml"))
<body>
  <h1 id="title">This is a test</h1>
  <a href="/page">
    <img src="image.jpg"/>
  </a>
  <p class="text">This is a text<br/>This is a tail</p>
</body>

>>> br = detach(body.xpath('.//br')[0])

>>> print(tostring(body, pretty_print=True, method="xml"))
<body>
  <h1 id="title">This is a test</h1>
  <a href="/page">
    <img src="image.jpg"/>
  </a>
  <p class="text">This is a textThis is a tail</p>
</body>

Note that This is a tail, which was the tail of the <br> node, has been preserved, in this case by appending it to the text of its parent node.

klon.make_all_urls_absolute

Source code: klon/html.py

Takes a URL and a document etree, and modifies the etree in place to convert all relative URLs to absolute ones, using the given URL as a base. All standard tag attributes that specify a URL (e.g. <a href="...">, <img src="...">, <form action="..."> etc) are converted.

>>> from klon import make_all_urls_absolute

>>> print(tostring(body.find('a')))
<a href="/page"><img src="image.jpg"></a>

>>> make_all_urls_absolute('https://site.com/path/', etree)

>>> print(tostring(body.find('a')))
<a href="https://site.com/page"><img src="https://site.com/path/image.jpg"></a>

klon.parse_form

Source code: klon/forms.py

Takes an ElementTree whose root is a <form> node, and returns a requests.Request that corresponds to the request that would be sent by a browser if the form was submitted.

>>> from klon import parse_form

>>> form = build_etree(
...     'form',
...     {'method': 'POST', 'action': '/publish'},
...     [
...         'div',
...         ['input', {'name': 'title', 'value': 'Some title'}],
...     ],
...     [
...         'select',
...         {'name': 'kind'},
...         ['option', {'value': 'comment'}, 'Comment'],
...         ['option', {'value': 'question', 'selected': 'yes'}, 'Question'],
...     ],
... )

>>> request = parse_form(form, base_url='https://web.site/')
>>> request.url
'https://web.site/publish'
>>> request.method
'POST'
>>> request.data
{'title': 'Some title', 'kind': 'question'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

klon-2.4.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

klon-2.4.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file klon-2.4.0.tar.gz.

File metadata

  • Download URL: klon-2.4.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for klon-2.4.0.tar.gz
Algorithm Hash digest
SHA256 bcb1193d55830f0bbdd1647397a0358f28ee0f17a116bf977742d2458d9e11c7
MD5 c8eb2ece8aa922b32f951caf24d51cf8
BLAKE2b-256 c12aaf31449e525ab6d9d7665649a4dcfc9e4fbda7393d6257e18a29db346517

See more details on using hashes here.

File details

Details for the file klon-2.4.0-py3-none-any.whl.

File metadata

  • Download URL: klon-2.4.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for klon-2.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0518f66c5907edb89997026e53c90d5d0090fc67538f7fcfbbea6d7a4e2af006
MD5 3860178175387f3085d7a69d0775059c
BLAKE2b-256 66f1287e9b20e84af985c27620c42fa55043f20b1ef63073e1126d9a78443a45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page