Fast HTML5 parser with CSS selectors.
Project description
A fast HTML5 parser with CSS selectors using Modest engine.
Installation
From PyPI using pip:
pip install selectolax
Development version from github:
git clone --recursive https://github.com/rushter/selectolax
cd selectolax
pip install -r requirements_dev.txt
python setup.py install
How to compile selectolax while developing:
make clean
make dev
Basic examples
In [1]: from selectolax.parser import HTMLParser
...:
...: html = """
...: <h1 id="title" data-updated="20201101">Hi there</h1>
...: <div class="post">Lorem Ipsum is simply dummy text of the printing and typesetting industry. </div>
...: <div class="post">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</div>
...: """
...: tree = HTMLParser(html)
In [2]: tree.css_first('h1#title').text()
Out[2]: 'Hi there'
In [3]: tree.css_first('h1#title').attributes
Out[3]: {'id': 'title', 'data-updated': '20201101'}
In [4]: [node.text() for node in tree.css('.post')]
Out[4]:
['Lorem Ipsum is simply dummy text of the printing and typesetting industry. ',
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.']
In [1]: html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>"
...: selector = "div > :nth-child(2n+1):not(:has(a))"
In [2]: for node in HTMLParser(html).css(selector):
...: print(node.attributes, node.text(), node.tag)
...: print(node.parent.tag)
...: print(node.html)
...:
{'id': 'p1'} p
div
<p id="p1"></p>
{'id': 'p5'} text p
div
<p id="p5">text</p>
Simple Benchmark
Average of 10 experiments to parse and retrieve URLs from 800 Google SERP pages.
Package |
Time |
Memory (peak) |
---|---|---|
selectolax |
2.38 sec. |
768.11 MB |
lxml |
18.67 sec. |
769.21 MB |
Links
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
selectolax-0.2.11.tar.gz
(1.3 MB
view hashes)
Built Distributions
selectolax-0.2.11-cp39-cp39-win32.whl
(558.2 kB
view hashes)
selectolax-0.2.11-cp38-cp38-win32.whl
(556.9 kB
view hashes)
Close
Hashes for selectolax-0.2.11-pp37-pypy37_pp73-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81ff075b237a1379498609744d6d2e95919704e322df70d79c86bb7201797e47 |
|
MD5 | 054709b20b0d1da07c35ba13b534fbd4 |
|
BLAKE2b-256 | 5651da20429795b13955439e8924f8946b056e711271c3e6317f12dc6c7ac4af |
Close
Hashes for selectolax-0.2.11-pp37-pypy37_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4bf85eece219e6e973b227f4fa47d2c541beda2247e790e566eb21ae93e3a9a |
|
MD5 | 2f2428a0cc1953d64f14e97163613bae |
|
BLAKE2b-256 | 4b40f92c1a6f9d0735c5793980059fd7ebdfab8acb526c81bb1453dbdd024e58 |
Close
Hashes for selectolax-0.2.11-pp37-pypy37_pp73-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 321ff445fbdf8ab459423ee70d41a4b9024ad0dcf1476f58c97bee6530befda8 |
|
MD5 | 42009f53c2886432ffd2ff725b29dbab |
|
BLAKE2b-256 | 0fd7c5cdccdabb6be44fba00de15ec35b850d0ac761003356580dcba191f84c8 |
Close
Hashes for selectolax-0.2.11-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1109e1d46a7af79df74bb21a45c1b632e81f8e03d96a535ae062ae1d1caf550 |
|
MD5 | 053280a2d25d7a676aab3e42fc89867b |
|
BLAKE2b-256 | 649273dfce61c3cea6aeb15aba2834276a275a8227488925f39725e73b8d45a1 |
Close
Hashes for selectolax-0.2.11-pp36-pypy36_pp73-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41b9249986938b218d6428543f90a8b480f001d30a384538ceb126fc7d23d1c6 |
|
MD5 | 8b55a209206e33964832a01ccc64cfef |
|
BLAKE2b-256 | 62dffc6acf4cd8772c18e9200fb536aeded5a8e5b8c9d190c4217b3af170a2e0 |
Close
Hashes for selectolax-0.2.11-pp36-pypy36_pp73-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90758c756b00bf805f6529e9b1de17279c9d2f2dd3450ed35fa604c036636dbb |
|
MD5 | d814affc491044a7efa116f3e5b87ea9 |
|
BLAKE2b-256 | b4fa98c6ba32f48cbaaae861164d9ad6c0d3ac70e6eaffee49deebae86e4a2bf |
Close
Hashes for selectolax-0.2.11-pp36-pypy36_pp73-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26a74309a1bdf352c8cc246dae489af1879bae47abb23fc8cb36c086fc6c1c6d |
|
MD5 | e47de6b4988b90ec45c3e85205b1f5f7 |
|
BLAKE2b-256 | 7a5de55cb85172ebafc22bedea68c81b2a1d8d9cd7f05450b434a739b5d7ecd5 |
Close
Hashes for selectolax-0.2.11-pp36-pypy36_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a3a7cf3c7fbec6ace65f77d6fa23e27b0dfd7f1ecd383feff754a398cbec213 |
|
MD5 | 8ea89320fd8e6f67401d0fb2bced6616 |
|
BLAKE2b-256 | 2192fe2d8509b0e4d9eea439e613d3d75b79feb2dd79ef2fdf9016757f0c67d8 |
Close
Hashes for selectolax-0.2.11-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 079e569e04a545d7596c9e31db55a9b81bd586a47edfba61e96d5accb886f678 |
|
MD5 | 0bcbb63f381a9728f0eb9ac03565c40e |
|
BLAKE2b-256 | 6d89e9e863d9ab44b2f5d6a0681ebf633cdf00a20c2a28633698e8c217f4f264 |
Close
Hashes for selectolax-0.2.11-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e512e5fd01fe664515bd18c1768b25583041752622d22b9e4b59c3491c0ca3f2 |
|
MD5 | 887e04725143287b65cd03798c5063ad |
|
BLAKE2b-256 | 9c074c7441c65b5695320ab377bc0c2f47cbd235b484a0d8d228e182ebf018ea |
Close
Hashes for selectolax-0.2.11-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1f3baf13574b135a3fca862a28e8bfe3ca636ea760967b4242a5bbd290e8ebf |
|
MD5 | 7704e8d43acb9d3301597f3d775be2d7 |
|
BLAKE2b-256 | d8d246db2ee04321188daa065060eb8c1dc0c93656b27d3af55d9dfc030a776a |
Close
Hashes for selectolax-0.2.11-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63dc1a41488eb577a9617268c79627d936b0cd510e90cd6467af637dca12a29a |
|
MD5 | fe384075dbc6b3fb23d2fa65846f50e1 |
|
BLAKE2b-256 | 526288fc3f209a03cb91bdfc3d6ad8d586e98e76f5460215d5d711c723e1bc2c |
Close
Hashes for selectolax-0.2.11-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c20fe66d6517cca3d0b9cf8564029959c30ad58c53795177e0e9ce94070c62ca |
|
MD5 | 2eb7594a0ada97e379a7d246070d820b |
|
BLAKE2b-256 | 5203463a8f535c25e0fe1291ace80d1be2daec74b52e06163138a4a0cefb6ee2 |
Close
Hashes for selectolax-0.2.11-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24d5ef06d7fda0c6704c7f6c8fd0f99b16fdc76e44d8541a055da4da8ccc0dfb |
|
MD5 | 7974b897f3cecce53bbf5920e6482ffc |
|
BLAKE2b-256 | 07b388fd908e156262ced506b08ab195a879a43b46f5fbd5a930613d807815da |
Close
Hashes for selectolax-0.2.11-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00403ce63e7767a58f5346e59ed189920750d7f3c6f0bf1d74365b70bf2bf0a9 |
|
MD5 | 1cda5450a44284d456bd1aa742fabca3 |
|
BLAKE2b-256 | 8c9ca9e1b94091d57836a7b90831348202a7f745a57ec557b3c945eefa2807ca |
Close
Hashes for selectolax-0.2.11-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd4f1ed51e89a2f961d38bbbe7ffa115b9da11703d8c6e1d9a183c8ca10d4c4c |
|
MD5 | e90b560c7ade0bfc70ad55291d03b0da |
|
BLAKE2b-256 | 3b5aa2e7167290172b85a0709bff3b28e12457675cae2083e6646c239a537904 |
Close
Hashes for selectolax-0.2.11-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64eb827ea2600c10c2c542521bd73749b3a570825b75ce6c7535167ca66640b7 |
|
MD5 | 92e43030541ed23dc548d5b51eab3640 |
|
BLAKE2b-256 | 742466087c753c27a35d0a1a6aeec9244a145a44d76165a985224f658e33a84e |
Close
Hashes for selectolax-0.2.11-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57fc84cb699e03e8e4a52c1d3d89b0f03622a4e89305c548077c50997e884747 |
|
MD5 | d0a7d3da310d6274d6a34fdf62d9b490 |
|
BLAKE2b-256 | daf70c062fd0274fd4879de6d47ad34c99d4b743e1c715e496d6a61c4d031344 |
Close
Hashes for selectolax-0.2.11-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec28810375cd14b56514f71df5ad648e6b961b70d3bec406d9b3dca67de9f1c5 |
|
MD5 | 149196d7b11956ddc1885fe279be9463 |
|
BLAKE2b-256 | 04c2f4b2329de2a0de3df86afcbe3b19774be5f523880bba8ff709099b0aa434 |
Close
Hashes for selectolax-0.2.11-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 098f9b02661490c092279aba8e8ff51a4b95f2ddeca5c915705c26da4547a448 |
|
MD5 | dd2502d0369567b105b5c36a0a4fb9fb |
|
BLAKE2b-256 | 96062e5085917f70cae3ea24903ea43e83242e2c053a1f5da17eb8f5d2a1211e |
Close
Hashes for selectolax-0.2.11-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89720dddd7bf5f79395ded553ce421dfd30718b69eb49fa9cdb8dd4cb87013f7 |
|
MD5 | 939b32ba65bf5584ac994a30a1965b95 |
|
BLAKE2b-256 | 1eeaaab28398a97c17fa6d26b60a8211b45cacb2803d26358a4569406af52341 |
Close
Hashes for selectolax-0.2.11-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08ca7f4a2c54671c1a28cba816095227a9e7dd8d974f219567ab90ec7c3b1a8b |
|
MD5 | 8cd04d52bd412212c83fa7f18e504389 |
|
BLAKE2b-256 | 99a36afe1ce14142a36cdfd8fb14b8f755aebc0b3df5881ac07e1c64f2c1db15 |
Close
Hashes for selectolax-0.2.11-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d13cc3b19ff3ec99a4822a9c3069263c2421f4806b0f149700b649b3729e2594 |
|
MD5 | 4362389d5691ceb0df37547135ceb6ec |
|
BLAKE2b-256 | 8a6117b88bad1cdc572819d5a269ea6d799d7a2c03549d5d2fb8f5a06651ab9f |
Close
Hashes for selectolax-0.2.11-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1401197b9bb77864c0e13580d3f54e30244b9026cf550e0a0cb331c7d60acd51 |
|
MD5 | 79be64e6c710593efd84fcbbf5154f6d |
|
BLAKE2b-256 | e07d4b228c222f9570588001445901bf756bfcf0e139681c59e4918ef245b3bd |
Close
Hashes for selectolax-0.2.11-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fdb9ae5fc1cce52030d827a4e2d0f2bf02826cb42b741979caf36524ff588fb1 |
|
MD5 | 490e6c52fabd9073d13bee718fb0d5a1 |
|
BLAKE2b-256 | e0d3a38938bdf309847e2cd4bdc5c6f8b5e3e793ad158ea89334bc032516ff96 |
Close
Hashes for selectolax-0.2.11-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7c09ba6f599053f48495c4d13208df2d535f9635bdf189c3e121042a6b60316 |
|
MD5 | 499c1ada524d02a512f66ae8a661b75c |
|
BLAKE2b-256 | 7e708eb71deef1aca4d86cbef40c03c3395879d99aea94f8229746848d64d027 |
Close
Hashes for selectolax-0.2.11-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fa38be81a81b5a312c32587d7a9e7b74e33b6e2bfbce8710538786e1890f75e |
|
MD5 | 4375992f85bbb68fc0d14b096f0d857c |
|
BLAKE2b-256 | 45cf0955385ad387f7e21dc23528dc660ccf75e028ccf213ffa2a2d2930addfb |
Close
Hashes for selectolax-0.2.11-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3792df1b3bc7100339b151f3190eecbd2c14057ca1b7de8a76ce02b902917647 |
|
MD5 | 56edd022fa6814fc7ce838e14fb94d6b |
|
BLAKE2b-256 | b2ce6d3976d0affba234a4c44be2b5efd97f6dfd355a5be9009ebc932ed75788 |
Close
Hashes for selectolax-0.2.11-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 652caaee91b3568b781335d8a98f758a5481f899fc589eabb6faeb02651c4641 |
|
MD5 | 8ab11f9b8c648f724618a3c3bf3f7284 |
|
BLAKE2b-256 | b5209de533b221696eea5ec64e9ef320d71aa77beb34ef1d3f2227377895a92c |
Close
Hashes for selectolax-0.2.11-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 317400101b5cc71a85a585edc724628162f000f961ffd955c165e5cbc72ad7ca |
|
MD5 | bb6d7c8f5808874dde94f322f8387ee4 |
|
BLAKE2b-256 | 02a3816beb01624857bca0f8b6685987faa4d912c0519af542fa433f09cdb89e |
Close
Hashes for selectolax-0.2.11-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13becd752f6dfb79f903aa3bf0225dedc2d364b57c93546e80a9492911f6ae27 |
|
MD5 | f56c391e4056ad6e12f53c058843ac41 |
|
BLAKE2b-256 | c9b94483201f4f72870ce2e5bd4170222ee0f5695a7e3ae7f38c78dccb2a09e3 |
Close
Hashes for selectolax-0.2.11-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fac4306353927ca01e86953fbcd276065286f37faef7e0a21921fdc9f7f5c85b |
|
MD5 | 801dcedb4fabb8630451e3fd852e3cb7 |
|
BLAKE2b-256 | 79ec38abf819263c747f130daceb933adf4c8b923b71a8807da30a3bd7e9f5fd |
Close
Hashes for selectolax-0.2.11-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ab41766c5290bf564df657ef989b1bc8e7264df2d92547ca57bd163ed217ca1 |
|
MD5 | 62ded6a6b1805631f28b31cadc365796 |
|
BLAKE2b-256 | cd158423a327087cc2de69a6c2d57a0357e4b13a6a37bdfb552aeaaee527e449 |
Close
Hashes for selectolax-0.2.11-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01cfe1e147cf1786698e9cbd29fb4e7f5ce9356831e7c749d61a35ffedd21090 |
|
MD5 | 296a2774c58ea067b9ecb24756d82aa9 |
|
BLAKE2b-256 | e5320a5c9712293127a1e4f6c3575ad9aa39b01f79a747cf4063a510193394d0 |
Close
Hashes for selectolax-0.2.11-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02ad48558b164b3f6e3abe52408f21d07647c9811271247335b3ee02cb6b0efd |
|
MD5 | 4d5387a9147d1e50ccc022e9c6a093e2 |
|
BLAKE2b-256 | 3ca76aa8a74ce3adf8372920ddb07d513fa18d0eec8542dd989bb22f260ca86a |
Close
Hashes for selectolax-0.2.11-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 935f9880cfe57468a245ef2409d99c90c315be625c6e143776d870bbfd9a111e |
|
MD5 | 0c25ba5fc4b2f0f941567d8b03122931 |
|
BLAKE2b-256 | 45a28b23763b2c123d8c88df18ae0690b7f4652896d84a6c3c72470220fa61ce |
Close
Hashes for selectolax-0.2.11-cp35-cp35m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b06a35cb77b59229b863404e4b318169cfeafd3385ac1bb0589acec3ea749641 |
|
MD5 | ff303658b9cfdb87de0e41f5c6c18498 |
|
BLAKE2b-256 | e902c6c241af930f1dc6478233fcf6a8b1ec94b88ea8e03f047d991ef803eacc |
Close
Hashes for selectolax-0.2.11-cp35-cp35m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7517b1758c1735b6e260003df955c7122a014f68cc4d6fe7799e831635374a9a |
|
MD5 | d2851afacd8793cd7f92c16ce7becaaa |
|
BLAKE2b-256 | b28e55cf0a05d71dfd69819cb517b6ce187c0e1ae0a446c34bb06797423dbbf3 |
Close
Hashes for selectolax-0.2.11-cp35-cp35m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee762b09d8690c65e5afc943aaf618e12ec7548f1355e6477255f54805c92000 |
|
MD5 | 94ff1be345a9eeeadce7d36a599d4a00 |
|
BLAKE2b-256 | 9c4c459f3ceeaf0089506796c1d42ad976d09d2925569f5fd7fe3ed30aaf8ed4 |
Close
Hashes for selectolax-0.2.11-cp35-cp35m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c675a9d284d863806524c69ff40fc0b13846ee191a872cdeb5cf6bf4aa1a3f17 |
|
MD5 | 7bc40dde9726f72dc82fa9710deecff3 |
|
BLAKE2b-256 | 887f3d06415d37cd9aa85cc4c6ee94eea5fc8aad7bbaef6f2cafadd038bb796c |
Close
Hashes for selectolax-0.2.11-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b8296d49bfc89cdad31393a11e9654135cbf43f40a8fb477a9232f5c9b6b110 |
|
MD5 | 78149171929fa4720ec350f3c160081c |
|
BLAKE2b-256 | c18d414622b925f49356bd9675146108de8c5c76c5deb67eb1a23644ee9774d6 |
Close
Hashes for selectolax-0.2.11-cp35-cp35m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4908e409c95a0459f8beff957c1f3e67b605ba29afb5a68c2060a6f2d91cee17 |
|
MD5 | d185b813f662c58aac1a89249df7a636 |
|
BLAKE2b-256 | ed06d217d86e90c4d0e03900514eccfe8463bb9aa8f8b385dda0c7707748f64a |
Close
Hashes for selectolax-0.2.11-cp35-cp35m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 797f95ed7d2a6e83c0c344ed3110ea8c932b7ab05b8d23f133f10d17f6a91c7e |
|
MD5 | 29f5da2f1995ee489478556e4cb7fd12 |
|
BLAKE2b-256 | 0ee660558a527f6439bd0964f1aeace636c2c13d6f620b4b6711d45ace4676e2 |