Extract html as json from one or multiple url's
Project description
Tiny python script which converts HTML from web pages that match a certain CSS pattern into JSON.
$ pip install lurk
usage
in python
from lurk import lurk for link in lurk('http://en.wikipedia.org/wiki/en', 'a'): if 'href' in link: print link
in bash
Familiarize yourself with CSS attribute selectors.
$ lurk \ http://www.gnu.org/software/libc/manual/html_node/Function-Index.html \ 'a[href*="#index-"]' \ > links.json
This command saves a JSON object containing an array of links to all GNU C functions into links.json:
[ { "code": "*pthread_getspecific", "href": "Thread_002dspecific-Data.html#index-_002apthread_005fgetspecific" }, { "code": "*sbrk", "href": "Resizing-the-Data-Segment.html#index-_002asbrk" }, // ... ]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lurk-0.1.0.tar.gz
(2.5 kB
view details)
Built Distribution
File details
Details for the file lurk-0.1.0.tar.gz
.
File metadata
- Download URL: lurk-0.1.0.tar.gz
- Upload date:
- Size: 2.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a5dbb3b47a0f1c3eecd635de4c7d532d4ebbdeddf9bff5331c1c95c9bb4741a |
|
MD5 | 641192497af18aebc43e104edf3d22b9 |
|
BLAKE2b-256 | b34dfeea6076eaad7d5f1b03ca1c227eb797ce3846170ad58daa8efc81f17cad |
File details
Details for the file lurk-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: lurk-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 999858051d0db228f7b94634e119b30af29bc23ad43e6d3843c78fcd2fef6d65 |
|
MD5 | cf716e4c65f25586c0960bfd47b1171f |
|
BLAKE2b-256 | e8377086cbd0e15131a75c9c6b69c00273b16c8bdb4155a88177aa7ac65bda73 |