Fast HTML5 parser with CSS selectors.
Project description
A fast HTML5 parser with CSS selectors using Modest engine.
Installation
From PyPI using pip:
pip install selectolax
Development version from github:
git clone --recursive https://github.com/rushter/selectolax
cd selectolax
pip install -r requirements_dev.txt
python setup.py install
How to compile selectolax while developing:
make clean
make dev
Basic examples
In [1]: from selectolax.parser import HTMLParser
...:
...: html = """
...: <h1 id="title" data-updated="20201101">Hi there</h1>
...: <div class="post">Lorem Ipsum is simply dummy text of the printing and typesetting industry. </div>
...: <div class="post">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</div>
...: """
...: tree = HTMLParser(html)
In [2]: tree.css_first('h1#title').text()
Out[2]: 'Hi there'
In [3]: tree.css_first('h1#title').attributes
Out[3]: {'id': 'title', 'data-updated': '20201101'}
In [4]: [node.text() for node in tree.css('.post')]
Out[4]:
['Lorem Ipsum is simply dummy text of the printing and typesetting industry. ',
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.']
In [1]: html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>"
...: selector = "div > :nth-child(2n+1):not(:has(a))"
In [2]: for node in HTMLParser(html).css(selector):
...: print(node.attributes, node.text(), node.tag)
...: print(node.parent.tag)
...: print(node.html)
...:
{'id': 'p1'} p
div
<p id="p1"></p>
{'id': 'p5'} text p
div
<p id="p5">text</p>
Simple Benchmark
Average of 10 experiments to parse and retrieve URLs from 800 Google SERP pages.
Package |
Time |
Memory (peak) |
---|---|---|
selectolax |
2.38 sec. |
768.11 MB |
lxml |
18.67 sec. |
769.21 MB |
Links
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
selectolax-0.2.12.tar.gz
(1.3 MB
view hashes)
Built Distributions
selectolax-0.2.12-cp39-cp39-win32.whl
(558.1 kB
view hashes)
selectolax-0.2.12-cp38-cp38-win32.whl
(556.8 kB
view hashes)
Close
Hashes for selectolax-0.2.12-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1d987a1bae7865f854e6e520af1e7c8d6b9f227973f15a66baae6d1e52a42d9 |
|
MD5 | 3b263e609d02e863c403807d507d38f2 |
|
BLAKE2b-256 | bfa25c749a6d8383489a83c5a7cbd803be4b8abb7b42b664a2f885d88fdc34cf |
Close
Hashes for selectolax-0.2.12-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fc461207f503331408f1bdffe64dfcaf0069ba89f31224b52be19f3cce7c128 |
|
MD5 | d01ae48afe10c405442c0d574d196d9f |
|
BLAKE2b-256 | b712197bbeb365fe9154cf52320bfb44fde65b70eae67dc715e9c9526144eb67 |
Close
Hashes for selectolax-0.2.12-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ac064c264b70e13dd0f9559dbd256037b1273dd4281152503371e986a87e60f |
|
MD5 | 1e19d4f3ee1330473b4172fde8d818be |
|
BLAKE2b-256 | 84db4005c5ef76d44aad9ef8d0b580b7337668b709759e400d81fa3121267a84 |
Close
Hashes for selectolax-0.2.12-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f93d84456811307bbbe0ab43c0952d81195bf387c32f6898efbe856c7038fc1 |
|
MD5 | 1ca9af29e58742128814199a7b77d822 |
|
BLAKE2b-256 | 4abbb542d2a10e1fcee3ccb41512ea5a9c7d174e5c562e20bb19e8011479ea9e |
Close
Hashes for selectolax-0.2.12-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8205e0e458f1ddf558ccefd460c9c2ebc34e7a4265fe579cc1892489628e68bc |
|
MD5 | dbca085467028d0882d08ac568b04910 |
|
BLAKE2b-256 | 0d19246611125c0d88971a9d070394630cd24e958994b34c5ac8a26e8e621470 |
Close
Hashes for selectolax-0.2.12-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 511e39c23f464ecb60f2fd5eb68a4fed205318b94f25f367a1e6107b4a90c020 |
|
MD5 | c38cda2f739591427ce2d56059d881d6 |
|
BLAKE2b-256 | 790b24b400f522ffa91d3e1c0b6a1ce1c4a8a8670d6c629b4ed51aff42c661fa |
Close
Hashes for selectolax-0.2.12-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e5db982b068497fe38d13d11c1cb404fd1d8ebb0e90851589fcd984bb804d7f |
|
MD5 | 0ba22693274e73c5ca8031cfce3d6eb0 |
|
BLAKE2b-256 | d4a24c6da60b9d3ed7b7eaf707acbf040af0822c2e3f916957585f0bcffc6ac3 |
Close
Hashes for selectolax-0.2.12-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e9a6bbf5b58c7df55dc9c2bae4032ac47efeb13522543ec1f71a951667b304f |
|
MD5 | bda8310279f2cdcf966e11e77bcdfa0f |
|
BLAKE2b-256 | 924ae596d939f952a0684b7bd0bda8c3db9e99854d14b54b081ea025268d34ed |
Close
Hashes for selectolax-0.2.12-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d240e68bbf0b9b08712badacb9cd12a22c58ca25934728df9fc3976d81b0df33 |
|
MD5 | 64533dfe7cfb3f9320a8dc2e0f628c9e |
|
BLAKE2b-256 | 29adc0a6bc97b56535e3a29d6dbb12d685cc6c26f4d50557337c01ea9c65e42e |
Close
Hashes for selectolax-0.2.12-cp39-cp39-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bb322340b860987416aa98db9a876c9323c0334285890682cb1c61d91312871 |
|
MD5 | 3e97c2832c889caa44342f37e0b02bc0 |
|
BLAKE2b-256 | 2f258911844e54622631bb7e2a5da1854194e37b40cfbce31aa1568ce02d4173 |
Close
Hashes for selectolax-0.2.12-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 114336641d2c5f61b00163e011da59dba1868c41d0c837d27a90c377de872a0e |
|
MD5 | 86e79ec09a8da5624fec9cc2faec8cbf |
|
BLAKE2b-256 | a4f1a0392aa8f510b0cad84b2796aaeee16e28a8938898a02945752aac0b2e80 |
Close
Hashes for selectolax-0.2.12-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26a2b2754e653c91c1f74486ed6b12df06b7a4cdcb97ea9b7121148d185c669d |
|
MD5 | 3df85fd52b18495391f6ed323fc72ab4 |
|
BLAKE2b-256 | 72757a6cc91296212ec0b7fd8be27dd8af262e1760e1cc00081d92d6257d30f3 |
Close
Hashes for selectolax-0.2.12-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | efc53b48f1f13dc8611dd0a29b0f1bcb3faeb3bd675a341831fadd6ecb53f9de |
|
MD5 | beb229de955b1e3af6b299f0d8e83588 |
|
BLAKE2b-256 | cb3bf6b490be8b6b283fe67762f38f39f2ae713c341bd1bf65ded6c1ab503184 |
Close
Hashes for selectolax-0.2.12-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a5a2aac73c979639a02a866fbfef270183933f35879854e07055f392a17d7a7 |
|
MD5 | 10f53c92c0b5d2a39f024787cda4905b |
|
BLAKE2b-256 | 194ee55055329d045ae2f914f73ff06282f8b21b692dbfa9f941b94db05a9193 |
Close
Hashes for selectolax-0.2.12-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7873c6ca86eb11a2a9700a40be14343f6d1feaa0b284f28b8e0bb0d80cfcf2cb |
|
MD5 | 6dd673982a00f4bb2a6281f24419c7b8 |
|
BLAKE2b-256 | 36a7343a788668c7efc9a8dee9c39b95aa234d78fe7d70aa3b3ee60431b3872f |
Close
Hashes for selectolax-0.2.12-cp38-cp38-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bc1f6426a444450c6ee8cf509c2d653e0e5c9497c144fd22ee83d90880aa28f |
|
MD5 | 7672c721ce3f6252d0d2dae73ab51253 |
|
BLAKE2b-256 | 8a0c617c9c9069861644a3f4f3c99e8360a28f4a33778a1822e567a2aaf3cd6d |
Close
Hashes for selectolax-0.2.12-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a25c4c9571b834b3ff4f5ddb61ae858d09dd1f68089b04c093a3aa7507f567da |
|
MD5 | cf29965883fa83e7a3f56e2b1171cb8c |
|
BLAKE2b-256 | ab2ee1b8bab34fe2caaa216f980d535d79126fba61e44a5798fd27c86daee485 |
Close
Hashes for selectolax-0.2.12-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e8ebbb483df745c20fe3ae5c27764d8a37c9761e6a8836313d51f386a5317db |
|
MD5 | 1c3e54b3d1534ae46520e8dcaf31f7bb |
|
BLAKE2b-256 | 044f828a14b37b3ec3adfb9ae439c0d513276af03a5252f3b5b56778e78a79e1 |
Close
Hashes for selectolax-0.2.12-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5425e8d53f2a336d5cbafab544a7f6d3b5e77059229417405c599e336f8feec9 |
|
MD5 | 7ce3877de08d2440c2c90eaddbf97e2e |
|
BLAKE2b-256 | 4e68632cfddc3bfcbec55c6d70a4cf1c6bd76a9673c1a35b2b3a41d44111098a |
Close
Hashes for selectolax-0.2.12-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 324c26f7dce7be683a540fb8146ca8815a6766dc2950b78eac2188f8105409c0 |
|
MD5 | 25f47907ebdbf127f23401687831ea8c |
|
BLAKE2b-256 | ad2659b1a6358db3274a2e63df81cf2a7a5bee228b188b39f5d861d6581ca2de |
Close
Hashes for selectolax-0.2.12-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75c14193600a126f9d892e2694404d9dd69f1a929b5d58f33dabed8add2f755b |
|
MD5 | da61f90add12cb540c678b56a6038bf2 |
|
BLAKE2b-256 | 1586320308dcfa4b59c970c23cf327e02b1b46dee6ffb6aa4185a8ebb6f70777 |
Close
Hashes for selectolax-0.2.12-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f97cc7c6fb798eb9d3d0d623f95276f54e9dcd61b5ec2ce0193bce695496163d |
|
MD5 | cf8ec36c7a82c9742ccec317a40295dc |
|
BLAKE2b-256 | 26596add7a103752af93ddd61c64b64e9d9959538b44035c037317803496fb91 |
Close
Hashes for selectolax-0.2.12-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3221a316127565fe4cea5e39bad5cbd2daaac4dec1622e9daaeae8c4512de095 |
|
MD5 | 319a75530614fa555fc47ad3590a6425 |
|
BLAKE2b-256 | 2c71693c77c0ad8281bac58c8dc22b3feed97f0356fe9a2c17d8d974e97e61c6 |
Close
Hashes for selectolax-0.2.12-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c7eb0e44c95711cd8bff3a800e902ec7789b0928e7d81c1aa4019bb2d934e50 |
|
MD5 | 549f2ae8e3139e362f24ddb139269827 |
|
BLAKE2b-256 | 312b3fd093679140088642ef4dbaa380bbe148cf47e6be4c967034e9d10345de |
Close
Hashes for selectolax-0.2.12-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc952d761b307df4ca2420eeaab94e870c9a9c28bb982cb2364ba9930440fcb9 |
|
MD5 | 736313558f56b21ab24b5065be4c0615 |
|
BLAKE2b-256 | 175bd0d17d1b10c4a3d36f5f95067bfe2e4121bfab86f7cb55acea1393b9cde7 |
Close
Hashes for selectolax-0.2.12-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b00a35d6003591ce0f0a8ce88bd9a60e8b38405e45db77bd82290a868e73dbfc |
|
MD5 | 09e01773084fa8ce53489f239a99ad94 |
|
BLAKE2b-256 | 50b5d757d963a6b32b465842c2f278ca330478296e8c1ed82944373bc983f098 |