Fast HTML5 parser with CSS selectors.
Project description
A fast HTML5 parser with CSS selectors using Modest engine.
Installation
From PyPI using pip:
pip install selectolax
Development version from github:
git clone --recursive https://github.com/rushter/selectolax
cd selectolax
pip install -r requirements_dev.txt
python setup.py install
How to compile selectolax while developing:
make clean
make dev
Basic examples
In [1]: from selectolax.parser import HTMLParser
...:
...: html = """
...: <h1 id="title" data-updated="20201101">Hi there</h1>
...: <div class="post">Lorem Ipsum is simply dummy text of the printing and typesetting industry. </div>
...: <div class="post">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</div>
...: """
...: tree = HTMLParser(html)
In [2]: tree.css_first('h1#title').text()
Out[2]: 'Hi there'
In [3]: tree.css_first('h1#title').attributes
Out[3]: {'id': 'title', 'data-updated': '20201101'}
In [4]: [node.text() for node in tree.css('.post')]
Out[4]:
['Lorem Ipsum is simply dummy text of the printing and typesetting industry. ',
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.']
In [1]: html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>"
...: selector = "div > :nth-child(2n+1):not(:has(a))"
In [2]: for node in HTMLParser(html).css(selector):
...: print(node.attributes, node.text(), node.tag)
...: print(node.parent.tag)
...: print(node.html)
...:
{'id': 'p1'} p
div
<p id="p1"></p>
{'id': 'p5'} text p
div
<p id="p5">text</p>
Simple Benchmark
Average of 10 experiments to parse and retrieve URLs from 800 Google SERP pages.
Package |
Time |
Memory (peak) |
---|---|---|
selectolax |
2.38 sec. |
768.11 MB |
lxml |
18.67 sec. |
769.21 MB |
Links
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
selectolax-0.2.10.tar.gz
(1.3 MB
view hashes)
Built Distributions
selectolax-0.2.10-cp39-cp39-win32.whl
(557.8 kB
view hashes)
selectolax-0.2.10-cp38-cp38-win32.whl
(556.5 kB
view hashes)
Close
Hashes for selectolax-0.2.10-pp37-pypy37_pp73-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c47c7602e8cf8bdce03716b0240d2067eec92f3185cffe34813c60706559ae6a |
|
MD5 | 2d4129d34a80cee652660872be07163a |
|
BLAKE2b-256 | 1c51886dd8585f3bc5597f439aa177687a72ca512f8fc193a42cf31316fad6c3 |
Close
Hashes for selectolax-0.2.10-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 061efe18e01a624e33317d1f98f20d67f025e228f5cf87f198caceadff9e77f5 |
|
MD5 | 255397ddd1a4ccc73ab05c89df316180 |
|
BLAKE2b-256 | 8be549b474bd3cbaea97a16ff0400b5c6ffa5ee182561dd8134cfb82521d7400 |
Close
Hashes for selectolax-0.2.10-pp37-pypy37_pp73-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98e8b60fca5ca6e2f0a2a1882f0c1b771612e5016bd6605545e7c20a8baac244 |
|
MD5 | 4970ea2cf05ccc9abb4b1fc5e820e418 |
|
BLAKE2b-256 | d77369cafc76f9b8562089254207a48ef80a5942379530a30016d358265c02cc |
Close
Hashes for selectolax-0.2.10-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 108f0ed757c5e74cd3d15f3ddb615c891711ae9647fb002aca6dbad5c7f0084c |
|
MD5 | 5d3b267bf3f6bb5e96e474fe015de74d |
|
BLAKE2b-256 | 14242f72f14ad5b1366b561a260be538e1334c61cfb612f299a1e230320506f8 |
Close
Hashes for selectolax-0.2.10-pp36-pypy36_pp73-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01b26820667dcd8dc0ec94ed874ffc8e45043f0da70466544a9328a79282ff41 |
|
MD5 | 1dfdac4bcf6e81393c4b14bdb1b44cd8 |
|
BLAKE2b-256 | f20d7bddcd3c459ec140b48e33db6477c0b764203a38ab727e71b759f4006d32 |
Close
Hashes for selectolax-0.2.10-pp36-pypy36_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 274d70e46a94a7b673585957574e571b1838afb5862b9edc7477f704a2e8be3f |
|
MD5 | b9ad8c02f0d959338e91054ada02e0a4 |
|
BLAKE2b-256 | cb45df9974d7c4af9e940277f9158a7d35b1ec88d12582a1f82a4269dfd57709 |
Close
Hashes for selectolax-0.2.10-pp36-pypy36_pp73-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e2fb6a27bc7760d57f8cc53adcf5b300a021a3f4102df0e5dd8abb436041c28 |
|
MD5 | 203b28c73a8877b0a459f24395b92728 |
|
BLAKE2b-256 | c0ce3cfb2acfc63a239cfef11a0fbf4a3f8cfaa6c6f9b0c2d02d2d96464bab97 |
Close
Hashes for selectolax-0.2.10-pp36-pypy36_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03e4c0b6d8feb16472482c89a1bf0752335d015172263388c94b8089224ed9e6 |
|
MD5 | 0978598ac299807b7a6050dfafdc3eb6 |
|
BLAKE2b-256 | ba95cdc12c14da45a76facf8c647f8d215b480a7e34d2ff495a2ec6f1d2b2f82 |
Close
Hashes for selectolax-0.2.10-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fe935472e9c2c14caf38b65a5ea836f0c3d56081945a8588e14f4136e34ba6b |
|
MD5 | a6c3ccd72cc92b88cf70a669909a7abd |
|
BLAKE2b-256 | 663e292fd398620e9fedda9402e6b67ed20735ff298ecd87594d92ded5a60d57 |
Close
Hashes for selectolax-0.2.10-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 290b9bc9df879c8538899b5d22e8fa272e07c9edc438396d9b9ad631a7689837 |
|
MD5 | 011b0b02fcf3cf582c03932560a8a2a6 |
|
BLAKE2b-256 | 663353a5e76c8a8b552ca259cbfa20fe4ec9d94c445d21d33f9b624b51918019 |
Close
Hashes for selectolax-0.2.10-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88cc811bb3f9c4eac303dde5ba3ecde0972dba8cebf2fb8001467e752c888838 |
|
MD5 | 0f94e25a873952a7d0085fe17147373b |
|
BLAKE2b-256 | eeafe65914c1856ebb9e580d277f2aa72207942bae6b86beb5012ea0b0f7ec9b |
Close
Hashes for selectolax-0.2.10-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4208bfab7c5e14d54104b7959ba1d66f67a51044cb1fccbab62d12c6bd905f02 |
|
MD5 | ad2fe1b30e5e53132453f1df14b40e33 |
|
BLAKE2b-256 | 9847eb878ac59abafca50d708e029821a63e8a7e75b9da85fff4a3da2f6be86c |
Close
Hashes for selectolax-0.2.10-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e577ea359151e4df515eabc0c6ea1ddda0577971597c5e9908498a80477befc6 |
|
MD5 | 7c48d4b2276a06d7cbd0bd5ab9802fcf |
|
BLAKE2b-256 | 9414b44687ff8db7d591decc4490b614590667bf2da54d87455efca23b34a105 |
Close
Hashes for selectolax-0.2.10-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90da202496bb99a0924cd26c471f455f64308ed13a24500852635aef5014a43f |
|
MD5 | 62a71230ef6ed612bf9803657213acc3 |
|
BLAKE2b-256 | c1a7b43aa8f38e07cf798ff3e792ca9b82d7d82205065e26b9253854cdff76ae |
Close
Hashes for selectolax-0.2.10-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f7f7a1a030c5612529c0e9df46d690b54d22416d500095ddf3985527f8fb78f |
|
MD5 | bf7e5ee0e22b382b9c995828d0e22010 |
|
BLAKE2b-256 | 33b24c03ede8136ee50b1cb161d75a069745c395c17dc5e9e94678ecd6f186d4 |
Close
Hashes for selectolax-0.2.10-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4714c5e6b18ad0ca9f2919b39f333590025e46cb0bb248ffe973333bbf18a491 |
|
MD5 | 2750f7d7636ca26cb5aea923a33d6244 |
|
BLAKE2b-256 | f1ab4cd74fe62245a72c66caea820381299bb32fbea53dbef3333f9054333ae8 |
Close
Hashes for selectolax-0.2.10-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4144619f88bb94ee2c29cccc23b00a020d6d140d84eda8d7fc4da05dc15f352 |
|
MD5 | 7b8b6b79af2b757178696d6069047b61 |
|
BLAKE2b-256 | 83baf2f7300b9ca2245465e48c8f8440de50ff1e668c906021be2ab162a39710 |
Close
Hashes for selectolax-0.2.10-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 231ce804a5e186afa4e7f1639f3a2fdefc5151c1094746fa09821c7c9f5dbeb6 |
|
MD5 | 90b23444ecd511a87e67bd9eb26adc05 |
|
BLAKE2b-256 | e676f97710223f2a130e9ab54db8261340679669d049afab382c2fee9a91d812 |
Close
Hashes for selectolax-0.2.10-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec2e3f6e49ee252c2fd0c0f297513150ec04e59c7aa0236baebeaaf21b83ffef |
|
MD5 | ba667cebebd8280b0c78973def0ade31 |
|
BLAKE2b-256 | aca37067865605768e4d8f96c3e5506eb298fd52eeaf5ae08b4f8db168798f49 |
Close
Hashes for selectolax-0.2.10-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60ba2ce5060bac7d56dedefe1403602aac1b999a60596294ce3a9520e2c95d71 |
|
MD5 | b660a0d8b9e068ed5a101f7aa258b801 |
|
BLAKE2b-256 | 9c2b5961ddfc4573e926a061f49d984238d2414e2ae739d41c8178886f4cab32 |
Close
Hashes for selectolax-0.2.10-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8632b165d5da9ecbfb671dbfa879a874cd63d2ea66a8d21b065da1236949947 |
|
MD5 | 773f713cda5c6fda633431f85a544c98 |
|
BLAKE2b-256 | 31c4cc05ecb4091263ae066bbe57fc21ccf7bdea272d7e21afb59a745a6dba05 |
Close
Hashes for selectolax-0.2.10-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c49ac91cb291eae5c396aa87725ad066ba2fd9690289f3ffcde0022e4276b56e |
|
MD5 | 993fc2ff3d938f685c5b07cca370b56a |
|
BLAKE2b-256 | 94272fa4643e591e0de1cc1e8a66ff393f00c94a7dcb04601609bf26839f78f0 |
Close
Hashes for selectolax-0.2.10-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b9f60a689c0453b6e2a6b92dd2407c82167f3d7624b27184842b2b58d5bc353 |
|
MD5 | 505ea9a40c9faf6db04f2bc14f0ebe8e |
|
BLAKE2b-256 | 6567baa6eb5d2442e2155ed4451789a9016a17e107fb67f7824cae029f53c533 |
Close
Hashes for selectolax-0.2.10-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6c637636cc3bd0025dc9bd07fde28d482c93a6c21cf2e88b827a06766b2b314 |
|
MD5 | c96e298f154adfbf5a31a55e954eb43a |
|
BLAKE2b-256 | 2d141b4672bbdc994fc2dcec92a6af5c3aabc8700d5cddbb21265d0445426105 |
Close
Hashes for selectolax-0.2.10-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df57fdbbf772b72993e44cdb326b4937d225e0dd2083cce56100593fe791326a |
|
MD5 | e96af2aab5c160687b5addd9cd2d9425 |
|
BLAKE2b-256 | d902a42474c27271c2222d8f9dc6e27fd70d3a6db31e67f0ecd8176d0ab9613d |
Close
Hashes for selectolax-0.2.10-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13e6a6ec4b8fc43ef3f6586e17ba85832bbcdf8074d9a31a159d87dd81bf2627 |
|
MD5 | efb19e373287a12a60b15837eb512594 |
|
BLAKE2b-256 | f2a0c78690f0956532de8a61cd376a71e6a07ae5a1afa090be82482bbf74794a |
Close
Hashes for selectolax-0.2.10-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 804f8e954428a1a325a62a88af39e1fef87c22f0689ee3c3a1d8968ee9648f6e |
|
MD5 | 242ee44c2b5fc066ca6bd512ce3647e9 |
|
BLAKE2b-256 | 52edd9c5524d1b88c81fd758a75697f165e5b04c4c9d3111b5ec440cb559743f |
Close
Hashes for selectolax-0.2.10-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f83648e412aa610bdff1259dc412383fb290427c05f54e4fad1419b16aca19fe |
|
MD5 | d95363370df31cb533df35286f6c877d |
|
BLAKE2b-256 | 8f502e806c79348b46b25f14d65572aab7ee459501797f79a77457ae6387de01 |
Close
Hashes for selectolax-0.2.10-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f22653fd48a7f835891bab16095c6f983994d68d16925447e537eb6e3ab79fc4 |
|
MD5 | a0cd6e564d814494032b21cbb70c8105 |
|
BLAKE2b-256 | bc917d77a5f689f79852c4158b8ee93d6d4512a4d2730f7cc5d43f32f4d7b69f |
Close
Hashes for selectolax-0.2.10-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38661265f318459cd93b1a87b20d8b7b5adeaa353cc96e2d5087a05eef9ce8a3 |
|
MD5 | 15f4fa006cec9850734c50845ba83af8 |
|
BLAKE2b-256 | 0952bbff53ee0d302c8745d8b2f68da2399d2503c351ad4bc6ad9d705ca08ac5 |
Close
Hashes for selectolax-0.2.10-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6724cb313cd7805c7cf4252fdf162e7253cf3a933b7c25ac954feed3edc23ce |
|
MD5 | c8ac557642e568da9059d88d5ccfc09b |
|
BLAKE2b-256 | 6a07a9288a65b3b1a6caa2ea175a01e06ce95ea3d83ac414cdccb57e580c35c3 |
Close
Hashes for selectolax-0.2.10-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6857ac61acbf747ea56f6c8a72968e7a6ba88053a9a2b5b44091bfb97fb1c87 |
|
MD5 | be195b45357d693382951d6850eb563b |
|
BLAKE2b-256 | 24993ac2e7a49a699c52f25e49c4c10aa500de2d5686ae2d1d1553db3fc886d6 |
Close
Hashes for selectolax-0.2.10-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ebb88d954dabffa3bafad6cdd758612a7d3b84ceee692c5818bbf0fa93c5f6b |
|
MD5 | 3184b46c68aab5e465a50e128972bfe5 |
|
BLAKE2b-256 | ae5ece6f8e2126907aefec0fa7ea108eceb920918c80459019a77b019d287391 |
Close
Hashes for selectolax-0.2.10-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d8e3c7c43805628f2112cda81dba0b8f6620912c24ab2d6635f351985097971 |
|
MD5 | 64b4f15df1a05f95a13eb1b7119b78fa |
|
BLAKE2b-256 | d989f6bcd622b9faa6eb112a2569b7ee6d8054e80326d35b8b13487ee9527498 |
Close
Hashes for selectolax-0.2.10-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4233599d6507e11a6fab67d9e933d8f445859868b4162eb71c849a832935b575 |
|
MD5 | ade3cf0c58e5d157743da5fad879efb1 |
|
BLAKE2b-256 | 2e95a7c2fed2c9c14acec2ce8fd69eef11575e4c2d2a4813dacaeeacca6fc8af |
Close
Hashes for selectolax-0.2.10-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8f8488fa5859b0da7e4a1bd265b5c0bba45dbf8286e6cee17bf95bcb3d5e797 |
|
MD5 | a799fa593afec2fec3158e39dd6de81d |
|
BLAKE2b-256 | 001b3f1d9d52dc58b33becef508b2cd222661ab07810c9385341819f5b5ae9a6 |
Close
Hashes for selectolax-0.2.10-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b85a1356e180d235d9ab92bc3dd90d07e78cab1ef324ae9d12207607c9f26f6 |
|
MD5 | 697bdf87a20d5f8afb27460fd63c5b28 |
|
BLAKE2b-256 | ea34228a1221f1e1b8995efcd5259259e5310f260eb244adfefb5263caffb0c2 |
Close
Hashes for selectolax-0.2.10-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37cb0fd1d933ad7321caa68773fda490d686286eaf4d77922686ad14506c4a2c |
|
MD5 | 42afce0c35dd3ee9af024f1ab658c367 |
|
BLAKE2b-256 | fc409dc4cf6ede54967c5a58519c6e44b913aecca915af46879551d996af9a59 |
Close
Hashes for selectolax-0.2.10-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bba6127957c3209e141e42077d952cb1df4a5dc23c522ca9038c8013509588d8 |
|
MD5 | c2247e7c1695181e9c8bffe026582b7f |
|
BLAKE2b-256 | 45a421d5607a35d8c2ac0d202b6005e4fd5fd680cbe40b60c20941183c764e78 |
Close
Hashes for selectolax-0.2.10-cp35-cp35m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 519335c313c49151e0a282bef88043eab8756732f24eeb42d2a17e68b3ab174e |
|
MD5 | f254ce3484f4681982307fcbbf2157da |
|
BLAKE2b-256 | 62ee5256bc5ee17220e551014911fa73860083e83b2194dfa967d1db915d2701 |
Close
Hashes for selectolax-0.2.10-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a36581e0a4f74c5a67d22048fbf34221f9d480bde05acc57702b1cffdcb9ecf5 |
|
MD5 | b9af3c2f7113a501e61b41a7bdffcc06 |
|
BLAKE2b-256 | 1ce8ed74e7c2ebe787801387ed67de2b1db15747811ca20b0b9322000652c688 |
Close
Hashes for selectolax-0.2.10-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecdbad6c95b93256df4c3cb14612215bcd754093415615c55a191bb17fd0ebdc |
|
MD5 | 9bba55d49e2d94e71b8a9e603f6c0228 |
|
BLAKE2b-256 | 34e9c41991c2e739a64995b35ab46b5d3f90e0792e8f3bb251af7e406ec3464c |
Close
Hashes for selectolax-0.2.10-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a18f75af342476356e5a437fc5215a3b79b58f52b56d9ea6e1a985cc21895952 |
|
MD5 | 34c4378a8e59f5bda9e0278b07690ac5 |
|
BLAKE2b-256 | da2daa277f50597b9257100cb59f0f999c563bfdb7d61a83400f8b6b09107f2e |