Extract html from one or multiple urls
Project description
A script which extracts HTML from web pages that match a certain CSS pattern.
$ pip install lurk
usage
in python
In python, lurk returns a dictionary:
from lurk import lurk for link in lurk('http://en.wikipedia.org/wiki/en', 'a'): if 'href' in link: print link['href']
in bash
In bash, lurk returns JSON.
Familiarize yourself with CSS attribute selectors.
$ lurk \ http://www.gnu.org/software/libc/manual/html_node/Function-Index.html \ 'a[href*="#index-"]' \ > links.json
This command saves a JSON object containing an array of links to all GNU C functions into links.json:
[ { "code": "*pthread_getspecific", "href": "Thread_002dspecific-Data.html#index-_002apthread_005fgetspecific" }, { "code": "*sbrk", "href": "Resizing-the-Data-Segment.html#index-_002asbrk" }, // ... ]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lurk-0.1.3.tar.gz
(2.4 kB
view details)
Built Distribution
File details
Details for the file lurk-0.1.3.tar.gz
.
File metadata
- Download URL: lurk-0.1.3.tar.gz
- Upload date:
- Size: 2.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c93d12d655d65cb7de0457522c99f050f2ab96fa2f34e2e348ef6d315ab1469 |
|
MD5 | 0fea7cd64e2bb83faa21b23e31ad32bf |
|
BLAKE2b-256 | 5889d29d51c32ed231abe81b1b1731306af1df2c8e70bc64cca0c874c5255090 |
File details
Details for the file lurk-0.1.3-py2.py3-none-any.whl
.
File metadata
- Download URL: lurk-0.1.3-py2.py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8df1928e56c4985225877202d59f20c3f685c3f3092daf04db309c69841a2dcb |
|
MD5 | a1dd5d12f5d5aa56792d66ac4d90f888 |
|
BLAKE2b-256 | 0d5a6cb368063cb8409b0213e256c3ba611e931ddc14e8c277eaa32384d7fb5c |