Skip to main content

Extract html from one or multiple urls

Project description

A script which extracts HTML from web pages that match a certain CSS pattern.

$ pip install lurk

usage

in python

In python, lurk returns a dictionary:

from lurk import lurk

for link in lurk('http://en.wikipedia.org/wiki/en', 'a'):
    if 'href' in link:
        print link['href']

in bash

In bash, lurk returns JSON.

Familiarize yourself with CSS attribute selectors.

$ lurk \
http://www.gnu.org/software/libc/manual/html_node/Function-Index.html \
'a[href*="#index-"]' \
> links.json

This command saves a JSON object containing an array of links to all GNU C functions into links.json:

[
  {
    "code": "*pthread_getspecific",
    "href": "Thread_002dspecific-Data.html#index-_002apthread_005fgetspecific"
  },

  {
    "code": "*sbrk",
    "href": "Resizing-the-Data-Segment.html#index-_002asbrk"
  },

  // ...
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lurk-0.1.3.tar.gz (2.4 kB view details)

Uploaded Source

Built Distribution

lurk-0.1.3-py2.py3-none-any.whl (4.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file lurk-0.1.3.tar.gz.

File metadata

  • Download URL: lurk-0.1.3.tar.gz
  • Upload date:
  • Size: 2.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lurk-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5c93d12d655d65cb7de0457522c99f050f2ab96fa2f34e2e348ef6d315ab1469
MD5 0fea7cd64e2bb83faa21b23e31ad32bf
BLAKE2b-256 5889d29d51c32ed231abe81b1b1731306af1df2c8e70bc64cca0c874c5255090

See more details on using hashes here.

File details

Details for the file lurk-0.1.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for lurk-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8df1928e56c4985225877202d59f20c3f685c3f3092daf04db309c69841a2dcb
MD5 a1dd5d12f5d5aa56792d66ac4d90f888
BLAKE2b-256 0d5a6cb368063cb8409b0213e256c3ba611e931ddc14e8c277eaa32384d7fb5c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page