Utilities for building and manipulating ElementTrees
Project description
Klon is a collection of Python utilities for manipulating ElementTrees. It's a thin-ish, transparent wrapper around the lxml.etree module.
klon.build_etree
Source code: klon/build.py
A utility for building element trees using list and string literals.
>>> from klon import build_etree
>>> etree = build_etree(
... 'html',
... [
... 'head',
... ['title', 'Test Document'],
... ],
... [
... 'body',
... ['h1#title', 'This is a test'],
... ['a', {'href': '/page'}, ['img', {'src': 'image.jpg'}]],
... [
... 'p.text',
... 'This is a text',
... ['br'],
... 'This is a tail',
... ],
... ],
... )
Nested lists are translated to nested elements:
- the first element in the list must be a string, and becomes the tag name
- optionally, the second element can be a dict, specifying tag attributes
- any other elements become the tag's children
As a convenience, the id
and class
attributes can be set directly from the
tag name string, using CSS-like syntax: tag#id
and tag.class
.
klon.tostring
Source code: klon/utils.py
A thin wrapper around lxml.etree.tostring.
>>> from klon import tostring
>>> print(tostring(etree, pretty_print=True))
<html>
<head>
<title>Test Document</title>
</head>
<body>
<h1 id="title">This is a test</h1>
<a href="/page">
<img src="image.jpg"/>
</a>
<p class="text">This is a text<br/>This is a tail</p>
</body>
</html>
The main difference with the underlying LXML function is that encoding=str
by
default, i.e. it produces strings by default, rather than bytes.
klon.extract_text
Source code: klon/text.py
Extracts all text from the given node and its descendants.
By default, all contiguous whitespace is normalized to a single ASCII space,
and so the output will always be a single line of text. However if
multiline=True
is specified, paragraph-breaking tags are preserved, in the
same way that a web browser would. Other whitespace is still normalized, but
the output now contains both ASCII spaces and ASCII newlines.
>>> from klon import extract_text
>>> body = etree.find('body') # using the same example etree defined above
>>> print(tostring(body, pretty_print=True))
<body>
<h1 id="title">This is a test</h1>
<a href="/page">
<img src="image.jpg"/>
</a>
<p class="text">This is a text<br/>This is a tail</p>
</body>
>>> extract_text(body)
'This is a test This is a text This is a tail'
>>> extract_text(body, multiline=True)
'This is a test\n\nThis is a text\nThis is a tail'
Note that the <p>
tag translates to a double newline, while the <br>
tag
translates to a single \n
, mimicking how a browser renders them.
klon.detach
Source code: klon/utils.py
Takes one node as argument, and removes it from its tree. Takes care to
preserve the node's tail
text by reattaching it to the correct position in
the tree.
>>> from klon import detach
>>> print(tostring(body, pretty_print=True))
<body>
<h1 id="title">This is a test</h1>
<a href="/page">
<img src="image.jpg"/>
</a>
<p class="text">This is a text<br/>This is a tail</p>
</body>
>>> br = detach(body.xpath('.//br')[0])
>>> print(tostring(body, pretty_print=True))
<body>
<h1 id="title">This is a test</h1>
<a href="/page">
<img src="image.jpg"/>
</a>
<p class="text">This is a textThis is a tail</p>
</body>
Note that This is a tail
, which was the tail
of the <br>
node, has been
preserved, in this case by appending it to the text
of its parent node.
klon.make_all_urls_absolute
Source code: klon/html.py
Takes a URL and a document etree, and modifies the etree in place to convert
all relative URLs to absolute ones, using the given URL as a base. All standard
tag attributes that specify a URL (e.g. <a href="...">
, <img src="...">
,
<form action="...">
etc) are converted.
>>> from klon import make_all_urls_absolute
>>> print(tostring(body.find('a')))
<a href="/page"><img src="image.jpg"/></a>
>>> make_all_urls_absolute('https://site.com/path/', etree)
>>> print(tostring(body.find('a')))
<a href="https://site.com/page"><img src="https://site.com/path/image.jpg"/></a>
klon.parse_form
Source code: klon/forms.py
Takes an ElementTree whose root is a <form>
node, and returns a
requests.Request
that corresponds to the request that would be sent by a
browser if the form was submitted.
>>> from klon import parse_form
>>> form = build_etree(
... 'form',
... {'method': 'POST', 'action': '/publish'},
... [
... 'div',
... ['input', {'name': 'title', 'value': 'Some title'}],
... ],
... [
... 'select',
... {'name': 'kind'},
... ['option', {'value': 'comment'}, 'Comment'],
... ['option', {'value': 'question', 'selected': 'yes'}, 'Question'],
... ],
... )
>>> request = parse_form(form, base_url='https://web.site/')
>>> request.url
'https://web.site/publish'
>>> request.method
'POST'
>>> request.data
{'title': 'Some title', 'kind': 'question'}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file klon-2.3.0.tar.gz
.
File metadata
- Download URL: klon-2.3.0.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
7329a97c41bb6bc63228ece07145bafcec870ccfa94c78be307585da0bde8644
|
|
MD5 |
e3175953f3d38f240056e2c8bd9e066e
|
|
BLAKE2b-256 |
7d96001111e9dfdc87a3755f2e160c6687843f244899dde6ad56c719a075f210
|
File details
Details for the file klon-2.3.0-py3-none-any.whl
.
File metadata
- Download URL: klon-2.3.0-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
b7e0c748f8e31402c67e66d98ae561dbb480c3656414de314a91f465ad3c8f22
|
|
MD5 |
cf4418f6d5e9e8a4dd1476fe4cd216c8
|
|
BLAKE2b-256 |
01386ed9761978eaf0ab0963858ea807236d65e329e8f044c9e4f79bd2aff11d
|