Fast HTML5 parser with CSS selectors.
Project description
A fast HTML5 parser with CSS selectors using Modest engine.
Installation
From PyPI using pip:
pip install selectolax
Development version from github:
git clone --recursive https://github.com/rushter/selectolax
cd selectolax
pip install -r requirements_dev.txt
python setup.py install
How to compile selectolax while developing:
make clean
make dev
Basic examples
In [1]: from selectolax.parser import HTMLParser
...:
...: html = """
...: <h1 id="title" data-updated="20201101">Hi there</h1>
...: <div class="post">Lorem Ipsum is simply dummy text of the printing and typesetting industry. </div>
...: <div class="post">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</div>
...: """
...: tree = HTMLParser(html)
In [2]: tree.css_first('h1#title').text()
Out[2]: 'Hi there'
In [3]: tree.css_first('h1#title').attributes
Out[3]: {'id': 'title', 'data-updated': '20201101'}
In [4]: [node.text() for node in tree.css('.post')]
Out[4]:
['Lorem Ipsum is simply dummy text of the printing and typesetting industry. ',
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.']
In [1]: html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>"
...: selector = "div > :nth-child(2n+1):not(:has(a))"
In [2]: for node in HTMLParser(html).css(selector):
...: print(node.attributes, node.text(), node.tag)
...: print(node.parent.tag)
...: print(node.html)
...:
{'id': 'p1'} p
div
<p id="p1"></p>
{'id': 'p5'} text p
div
<p id="p5">text</p>
Simple Benchmark
Average of 10 experiments to parse and retrieve URLs from 800 Google SERP pages.
Package |
Time |
Memory (peak) |
---|---|---|
selectolax |
2.38 sec. |
768.11 MB |
lxml |
18.67 sec. |
769.21 MB |
Links
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
selectolax-0.2.13.tar.gz
(1.3 MB
view hashes)
Built Distributions
selectolax-0.2.13-cp39-cp39-win32.whl
(558.2 kB
view hashes)
selectolax-0.2.13-cp38-cp38-win32.whl
(557.4 kB
view hashes)
Close
Hashes for selectolax-0.2.13-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0e32401378e9b048c8d1b639dea5d16399287aedf5d292edaff6f7c7538b9e2 |
|
MD5 | 88ccafb1b992a382268db34a83f9c5f5 |
|
BLAKE2b-256 | ef75344c9c98fd5140445567f41f5b22fc8038d52e6514f3dcc3c66981dbc862 |
Close
Hashes for selectolax-0.2.13-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2aa598465f5dd333f447ec8552c7610a91b578c205672d4c4decbe7e245d574b |
|
MD5 | 3100c2c636b25d8ac43283733a8ec5a8 |
|
BLAKE2b-256 | f19b30bd9907658c566d7f1251907bf8ad23f2b579760c930b79364e8fb6874d |
Close
Hashes for selectolax-0.2.13-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1ea2de3ac2170f3181fca3ebb07f2404f16567dc82571b9e8aaac365d208f8a |
|
MD5 | 6b7c142561cf025f5de6d28e0fb5854c |
|
BLAKE2b-256 | 2ecd112f48ca9429855c1a111be568ac7e644babaf6cbcb9f39206f805b1644b |
Close
Hashes for selectolax-0.2.13-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2806e7ca3d57618a61b9f617260f3cf2b856d9be3a22a442ec827bba5cb8ea90 |
|
MD5 | ad4bd473928c1f97bb56c283e2b484d2 |
|
BLAKE2b-256 | ef0c4c4e1ca3a417b2b39b2f7e152c9851a02f90f042c6707edc45a480807e1d |
Close
Hashes for selectolax-0.2.13-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18e54ab7766898d5795e230e6e1acefb215636ebb22cf8dc0b0511e745b33f57 |
|
MD5 | 4fbb429e6529154df7fe2b191fea7d35 |
|
BLAKE2b-256 | 6246ea3477032a46012d36abb790ddca7ab5efbd7160a990bfeb1af3d6d8e2f0 |
Close
Hashes for selectolax-0.2.13-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ef3fac67f883ca03a8dbdf547fe3f024f8b8b5f44acbb05a962578400693e96 |
|
MD5 | b5259fb90d52abb0ce75d68f2bd20b17 |
|
BLAKE2b-256 | 532d6d366ac374ba4317969e1643acbcd7df3c97e8c1977beb1ac3279575e3f4 |
Close
Hashes for selectolax-0.2.13-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6abd47e389aa5f1d98a25587ea328fe0ace0bc98c2f051fca0234ae1aa250e02 |
|
MD5 | 0a47a8e4b41353addef256027710d735 |
|
BLAKE2b-256 | 8a9e6790482a8c637a06cc578cd1c56df4be9a94bbcac4a09eb0f10893950421 |
Close
Hashes for selectolax-0.2.13-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a2f724f7b65de1a0531a89cf64dd1a59539432cfa7899fe3fd5d470fdd7ae0f |
|
MD5 | dc969e4f661c8c9d7e0121ec595b12cb |
|
BLAKE2b-256 | 32bb9da3175895aa8c4010a35d987d42e8e3765e184754b0cfc8b41e292ba8b4 |
Close
Hashes for selectolax-0.2.13-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 014c03eb46b147b5fd7111885d17204b00b7903dd0b606d6332e0769f1e561c9 |
|
MD5 | 5dee0db29eccd2a4f656142e505e67e1 |
|
BLAKE2b-256 | 88bdc6bc2ef798e3ecfc93686164c04a417d283272e7308507e12ab15f7bae31 |
Close
Hashes for selectolax-0.2.13-cp39-cp39-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73904ae9a7ac73661b0f298967e75564c58de1648663e2628ec9d29cd6914efb |
|
MD5 | fa6012cbeefb5486687cbf29ca5f0b00 |
|
BLAKE2b-256 | fdf061834498311cf56c7d04212918bad6c8df8c9bc647f4087181b1f77080e0 |
Close
Hashes for selectolax-0.2.13-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 675db3ef8fa3569747425ab4fee41e03d71bc1e9e3c44b2369a5d38b0ece09e8 |
|
MD5 | d4b265299f79782ea51bad236c9c6eda |
|
BLAKE2b-256 | e1d398799d6bf752d16c70b7e86e4ad549262b9db47ed3eb1e32a0f2e6d58362 |
Close
Hashes for selectolax-0.2.13-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c7bfb92f11abdd8542464fcfce310df120b4a574feb3baafaf065c7153720cf |
|
MD5 | a227049819ed58bd13cf4b3928122ee7 |
|
BLAKE2b-256 | 13531b53a64b47fd97b938c4b5cb2e98ae45c7aead254b566bdea78544aeb7da |
Close
Hashes for selectolax-0.2.13-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a562ed0e17425a2d6e388856a1832bd74c75f433979664d0ec4059b6aeea7eda |
|
MD5 | 87665f4a28658346a31ec1c4da166ee7 |
|
BLAKE2b-256 | 203816b105642488d4d507254e9a0606b7c1fc75cbbe9149d2d7980c1408402f |
Close
Hashes for selectolax-0.2.13-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f442b6ac51ae9e08509e3df218a5e95df665dfadbbf6e87605096fd86e7742ab |
|
MD5 | 0be01a868e46820dc926d40573c95206 |
|
BLAKE2b-256 | f4b0924ac9096e931791167308949b30599603f922484d15b9a1a2c8dbe980cb |
Close
Hashes for selectolax-0.2.13-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 052c66cccbd908ab2eb1d72408c31b3f5a36da087414f0cb3c720630e056adef |
|
MD5 | 1bab0e50ed5e11cac31a6239040f5ee2 |
|
BLAKE2b-256 | 02f7940ff4f75eacf40488572fad772b1d375e7a997cd1ee167d01e3acf8d76b |
Close
Hashes for selectolax-0.2.13-cp38-cp38-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9abe78afc33e3cd63f79f0571b190841ad918bee462dc9bf5d4e8301ea59c5f3 |
|
MD5 | 948730413ce324e096b58d526281531e |
|
BLAKE2b-256 | c6d1ea35da1dbfe1c12e7fd9322c90dc194d1aef8c68b0430389be3e5d2ba97f |
Close
Hashes for selectolax-0.2.13-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 723f79e9a9300f2bf2e2ea4886ae0fb5579beee7b6d1673dfaedc4ca7884c5ce |
|
MD5 | 44c7f8ae6ff059ee1bfc7f269a8bb10b |
|
BLAKE2b-256 | 712a0caaae81ea939ba309eccc6dac05b9524bac76a38119cd9587ebd7002583 |
Close
Hashes for selectolax-0.2.13-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9eac7a15a8195f39cb9609d29672db89013f3f5d8391cae77230d91804e865f |
|
MD5 | afc44628d90c9793bcd30a0cfb3a2384 |
|
BLAKE2b-256 | b54da3e08f21408ebf4ecdff5d1c051d7fc66c9aa7bba58323df9377d3ca7344 |
Close
Hashes for selectolax-0.2.13-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cb08425c709d8eb749562f8606a96697c3dd6be7ef1ff1927756d007e8269f8 |
|
MD5 | 679d3697a8984f31ac3dcdb0d7889568 |
|
BLAKE2b-256 | 3687f34bb44a3de098a4580ba9ba885d1dcef766bcfe6f316747ede3a413dac1 |
Close
Hashes for selectolax-0.2.13-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc33909737e8b640c82c5da9f5a0228b72c267f3fb059ef9697728d10635005a |
|
MD5 | 75f312d9efcb3f44682b6d0b638b1890 |
|
BLAKE2b-256 | 3c182d9411398d3bfab3664b69215ddac7e73ca4374a2b674e73a67ef8da6366 |
Close
Hashes for selectolax-0.2.13-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17393008d5e565a8aa568668e19a5515503233ebea67bf9509ea868b9a657d80 |
|
MD5 | df8df363ab643d39e4f95b68dcbe7822 |
|
BLAKE2b-256 | 12635d5186efa2f1fb6deb85b5c6075ea60ea3e3063c3e60c4dc26b9e1d967fa |
Close
Hashes for selectolax-0.2.13-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5434c4fafb58627376bf0f2acc1dd233f3409d63d015eadba4ecbaa81f05b5f |
|
MD5 | beaea9991896f454117992bf52505b6a |
|
BLAKE2b-256 | 778ebdd405e822c078964db63961dae65a87be09c9c9e0a35d5ecdca9a286afb |
Close
Hashes for selectolax-0.2.13-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 658c6daea98f5386c1f1fd70d82884717ee1f44a09cf6b40838a94d0ea24505e |
|
MD5 | f4110f0131d74c7ddb5802a5bd8c1764 |
|
BLAKE2b-256 | 6861950a90b368663a7bac2b9e15cee7e7de171ff82bbee912b4dc238fbbb242 |
Close
Hashes for selectolax-0.2.13-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b29589b3423c9de99a2c1404bcf8a8b7a38243570b65bea3a08f9afc237567e0 |
|
MD5 | 9231e5680e9fc5c75fd73924226296f9 |
|
BLAKE2b-256 | caa14e0be5b7a962915877012bfb02a4a62e4de78ed7748ee76fbef73f56034e |
Close
Hashes for selectolax-0.2.13-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 864877c9d44a91c1132c5d4f4bf3ca1c5d457080bbef2234b09cf4fe8d7ec719 |
|
MD5 | b5a11768137b97bc5a7e68f4520e1617 |
|
BLAKE2b-256 | cdd85c570196e57fe1a852e4f31d0edbe397fa2719077594d68753a76c570105 |
Close
Hashes for selectolax-0.2.13-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39856c195c0e4594a4dc52deeb00bf42f7fca91ab33546baec26b7ff358ef931 |
|
MD5 | ccbcbd72af90016a4b47e8bb12dcefb5 |
|
BLAKE2b-256 | 6da38069b12529d439098ccc071e3a09f1f7d065b9b79f1968051eaef0a5c836 |