Fast HTML5 parser with CSS selectors.
Project description
A fast HTML5 parser with CSS selectors using Modest engine.
Installation
From PyPI using pip:
pip install selectolax
Development version from github:
git clone --recursive https://github.com/rushter/selectolax
cd selectolax
pip install -r requirements_dev.txt
python setup.py install
How to compile selectolax while developing:
make clean
make dev
Basic examples
In [1]: from selectolax.parser import HTMLParser
...:
...: html = """
...: <h1 id="title" data-updated="20201101">Hi there</h1>
...: <div class="post">Lorem Ipsum is simply dummy text of the printing and typesetting industry. </div>
...: <div class="post">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</div>
...: """
...: tree = HTMLParser(html)
In [2]: tree.css_first('h1#title').text()
Out[2]: 'Hi there'
In [3]: tree.css_first('h1#title').attributes
Out[3]: {'id': 'title', 'data-updated': '20201101'}
In [4]: [node.text() for node in tree.css('.post')]
Out[4]:
['Lorem Ipsum is simply dummy text of the printing and typesetting industry. ',
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.']
In [1]: html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>"
...: selector = "div > :nth-child(2n+1):not(:has(a))"
In [2]: for node in HTMLParser(html).css(selector):
...: print(node.attributes, node.text(), node.tag)
...: print(node.parent.tag)
...: print(node.html)
...:
{'id': 'p1'} p
div
<p id="p1"></p>
{'id': 'p5'} text p
div
<p id="p5">text</p>
Simple Benchmark
Average of 10 experiments to parse and retrieve URLs from 800 Google SERP pages.
Package |
Time |
Memory (peak) |
---|---|---|
selectolax |
2.38 sec. |
768.11 MB |
lxml |
18.67 sec. |
769.21 MB |
Links
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
selectolax-0.2.14.tar.gz
(2.7 MB
view hashes)
Built Distributions
selectolax-0.2.14-cp39-cp39-win32.whl
(649.8 kB
view hashes)
selectolax-0.2.14-cp38-cp38-win32.whl
(649.2 kB
view hashes)
Close
Hashes for selectolax-0.2.14-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec6f46b370ac3753d9ee4299a2b032e33a5a6ab44761afeac4612f55fddf5c3d |
|
MD5 | 80fd12b45f914fa223ebcfcefd76ed70 |
|
BLAKE2b-256 | 2c6ab20f29001598f7c5ab6d68a2f31e3d16292192e45dca22b76ee7411881b6 |
Close
Hashes for selectolax-0.2.14-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38065013165ba13ecc001d4fd5b05985ee6bc9e02ddf1ec0cd61451498254c03 |
|
MD5 | 9004680cd5dbc95237420f38488c51a5 |
|
BLAKE2b-256 | a6c99faf1eb1d3bac5600bba937dec59f5f39ad6afdaaecebb84c41d3ec50433 |
Close
Hashes for selectolax-0.2.14-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd9695b6bd8ad984c45cce9779f618704ba2064471b6af318acc6f01a7c8006d |
|
MD5 | bf1e6feb950dc811c351fa70d9e8dcb6 |
|
BLAKE2b-256 | 0a395dd7b30c585603252c829b1c8bee120cbc4b351f58fa2693169699f9b978 |
Close
Hashes for selectolax-0.2.14-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5acdd7cc417685668b7d07e4c840b58158f074ec627c7c8cb5ea1ea753c4bceb |
|
MD5 | 7950077bf804260118523105fb557cdc |
|
BLAKE2b-256 | 8a62ada619eeabc2138bce727fd6def98d9e8d8d83fde5067d24f890f3741ef5 |
Close
Hashes for selectolax-0.2.14-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4dde8d0ad6fd3289a284e0bd5eedb0544cdf9d42fa158cdfec08c1726473165f |
|
MD5 | 82f67f0d26d54bc7973d7ed65f3e9a33 |
|
BLAKE2b-256 | 7f752d39aab515005da5085fe0f8d23598bdf3f390f5cab6f344c443705f4e71 |
Close
Hashes for selectolax-0.2.14-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb4715ca933d7f4dbdc85bf5d9cd23ed65b63021d496dc53f960b417d0790d30 |
|
MD5 | 67e3cd244d55b1444e9aeab92de55d09 |
|
BLAKE2b-256 | a3cdff7a318b42d4f0968325e3dd48333f1abdd7ca0a7c84d7df12d27170c7f1 |
Close
Hashes for selectolax-0.2.14-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c8e085361e47fe3efc5f62a14717a0532762004163cb1e106f8410175a9709a |
|
MD5 | ada3474ff7c8e59076eb5dbcc07a904b |
|
BLAKE2b-256 | d1745b76c1f616250954208d747596d15d70cf6386ff08194c93314f665c7be1 |
Close
Hashes for selectolax-0.2.14-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aacf3cda7283f1ef1ba0edd2ac362251200e186ede0c3a0ecc31095fb6a4408f |
|
MD5 | 6ce3ba771739b5dbff518146b020432c |
|
BLAKE2b-256 | bd2267fb00e2a40bdbd0887ad5c595355b9d622ef0d956bde88daabd74a501ff |
Close
Hashes for selectolax-0.2.14-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42072396de88afade6d8a79555085fe5c49084f4d73ccccdeb364a43082fbf44 |
|
MD5 | ddb335892bcfdbce17ad7d831b65aefd |
|
BLAKE2b-256 | 0dd0b11e866db225e790f2e6028f6d51b8437c0bc49f372a8b358a7c57bdf633 |
Close
Hashes for selectolax-0.2.14-cp39-cp39-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08990e0a71766fac5cb69cd97ce8b41ad1324a7c42edde63cd401af0dad590e3 |
|
MD5 | 6209dd633035195e0064f332537eb971 |
|
BLAKE2b-256 | 106c11dc67778e23c9dbd121aebd4ee4e71cf5c74e687608fce2004530a00036 |
Close
Hashes for selectolax-0.2.14-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8cb20c1c353a8e68509e0f7d985f34dc232d55db3050cbeb3cb1c766ef1fbc0 |
|
MD5 | 931764a53fc684c922016f07866e1fe8 |
|
BLAKE2b-256 | ec15995fb90c3939e11c14fa0a7e79eda5bd6234a695b505e5507f5c82512122 |
Close
Hashes for selectolax-0.2.14-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ba15b305fcc4c6ffc2f138c47c2122765277bbda5fc6b4639ea446d815ba411 |
|
MD5 | add22912dff0eeca7544cb5114c692ac |
|
BLAKE2b-256 | c5195d1da6b76ac4a48a716fcbaa72ff49c045d17d10fdca11bba440d12ed82e |
Close
Hashes for selectolax-0.2.14-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14ee822ea16492c9da6c62d3f6c8a8c76414ace4a9e177aad8b4ff3564301e3b |
|
MD5 | abc8163673cfad82120a8f6b6ce95988 |
|
BLAKE2b-256 | a8e2c40a87c609bd98c94e6143e181b0f87f5495e8df08078e8c06a117cd380f |
Close
Hashes for selectolax-0.2.14-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95397408168535f54f717222e151910643a4fdff111584763660953afa18516c |
|
MD5 | 168e0ac16ba5362bcf58ea3f85a214da |
|
BLAKE2b-256 | 6748bfb465d1ba95d019b0e09b4cfa6e3a474182046418b278c18f3db6c0fc42 |
Close
Hashes for selectolax-0.2.14-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b583d1ab4312e3e3757759242daa4eeace231e34995b0a160fda9935a895425f |
|
MD5 | a81bea901bd1c4b9758d0b42d93674d7 |
|
BLAKE2b-256 | eaaa644aa6c0ec59015972ad8ab300cfe7282ede415b54731bc76f9cddea7144 |
Close
Hashes for selectolax-0.2.14-cp38-cp38-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd577696be63f80421b5c4504500a344e8209a175b71521b12cfb85f062ae9f2 |
|
MD5 | 9fe0f1b13262a81f13b2a897805d835a |
|
BLAKE2b-256 | e6f562f5ee4690f84ea67870b53d059103eb2c93a4901d7b88356298b5c0925a |
Close
Hashes for selectolax-0.2.14-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2324b54c00cbe7a454015cb48cf8b85ac7e030915c5ea3bb1a2c9d56306742f |
|
MD5 | 7dca01e4a20014022034b233599c1f62 |
|
BLAKE2b-256 | ec0d5bb8eeab4bbed83e624afb8b7490efd0acad442fb611030ec5a3236254af |
Close
Hashes for selectolax-0.2.14-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91e23b3d20bf9849c4e630de695210fb697b54baffadc1350048b3426abca3f6 |
|
MD5 | 921a94fde7fa783cac0b5c7a33eebb26 |
|
BLAKE2b-256 | 651d29489d103b8e874c8e4fd7f3b0443997344900ffa467f767fe5ce496d5af |
Close
Hashes for selectolax-0.2.14-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f5d9c7b035fd727f6ff5780e16cf643002fef93a7c2150a81372f670637c335 |
|
MD5 | 12ef2c484f558dbff517bd3041fd15a5 |
|
BLAKE2b-256 | 3d00d8b3990ebe421e2f48e4172d70d6b929766b5eec16d3d1c0861f2488b228 |
Close
Hashes for selectolax-0.2.14-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0eaf96c53ca231e3bfa151f3fd037f28b0d4b35659563633d20e2e22e220a8e |
|
MD5 | 1bb9db84ade2f14071a6346434469f52 |
|
BLAKE2b-256 | c0f4f2077a179d5c61412fd5ba05de5f27bb8089ac544c8a1112d4850946e9aa |
Close
Hashes for selectolax-0.2.14-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78dfcd83a930d22a1fe02ea31b16499a3f307f5b795f21bb01d27b0fa8cb5cd3 |
|
MD5 | 67b12c744b4ffac70a7e87884aae02d9 |
|
BLAKE2b-256 | 3ec1ab1dc88877fd767321a2a252760d946b00de1b4978ac03149cb19e312c73 |
Close
Hashes for selectolax-0.2.14-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42cf8a3b856168754fa081e1a63ee543dcf79ca418e9b3ae30a23604285399fd |
|
MD5 | 8fbc8b32d79f79132fdd4651945af2c8 |
|
BLAKE2b-256 | 7c8a355445099c29fd989c07ea3b00a5f8b265cdf21aabe0c2741edd48859c7e |
Close
Hashes for selectolax-0.2.14-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c277d758dedbdf982d7c8853e96599cb48b16b52e5faa11f365a394903dbb9a8 |
|
MD5 | ffb16dc99572b50d79813c6165af4b33 |
|
BLAKE2b-256 | 6b960bb36969853434a584db3296114a799abae692b61d0991d8f9528b08e27a |
Close
Hashes for selectolax-0.2.14-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1877d676ce059ac871f494bbc2aef0d1c1884579bc56b90ca8ef19951004ea3 |
|
MD5 | 7bd05849ed156ef01ff56d321250dbb8 |
|
BLAKE2b-256 | 40e57f11ba56b4a76b5b61b37eacb426d82f9e3598843304eca21aa5093c5e6f |
Close
Hashes for selectolax-0.2.14-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1142a2cbdcf75de6052fdc5e7df09570541c53a707084c3ee16e24e18384a804 |
|
MD5 | 7c3c987796877ff0e8a20f64bd109deb |
|
BLAKE2b-256 | cae504c729b9fb6cc79c6634c70398ef2e06528c00779d43f41841a9e27ab4b4 |
Close
Hashes for selectolax-0.2.14-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a52e2a68651ee2b018ebfea24390c6b6a0ee60548ea7e899bf85403bc51265a2 |
|
MD5 | aaa18cd305fd4f1e21a1592d1b5bd3f9 |
|
BLAKE2b-256 | 5f4ab577eef85b1cd0b9c3020b54660bbc4cd58e7acedc3175f66f6c54990c2f |