Skip to main content

Extract html as json from one or multiple url's

Project description

Tiny python script which converts HTML from web pages that match a certain CSS pattern into JSON.

$ pip install lurk

usage

in python

from lurk import lurk

for link in lurk('http://en.wikipedia.org/wiki/en', 'a'):
    if 'href' in link:
        print link

in bash

Familiarize yourself with CSS attribute selectors.

$ lurk \
http://www.gnu.org/software/libc/manual/html_node/Function-Index.html \
'a[href*="#index-"]' \
> links.json

This command saves a JSON object containing an array of links to all GNU C functions into links.json:

[
  {
    "code": "*pthread_getspecific",
    "href": "Thread_002dspecific-Data.html#index-_002apthread_005fgetspecific"
  },

  {
    "code": "*sbrk",
    "href": "Resizing-the-Data-Segment.html#index-_002asbrk"
  },

  // ...
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lurk-0.1.0.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

lurk-0.1.0-py2.py3-none-any.whl (4.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file lurk-0.1.0.tar.gz.

File metadata

  • Download URL: lurk-0.1.0.tar.gz
  • Upload date:
  • Size: 2.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lurk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4a5dbb3b47a0f1c3eecd635de4c7d532d4ebbdeddf9bff5331c1c95c9bb4741a
MD5 641192497af18aebc43e104edf3d22b9
BLAKE2b-256 b34dfeea6076eaad7d5f1b03ca1c227eb797ce3846170ad58daa8efc81f17cad

See more details on using hashes here.

File details

Details for the file lurk-0.1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for lurk-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 999858051d0db228f7b94634e119b30af29bc23ad43e6d3843c78fcd2fef6d65
MD5 cf716e4c65f25586c0960bfd47b1171f
BLAKE2b-256 e8377086cbd0e15131a75c9c6b69c00273b16c8bdb4155a88177aa7ac65bda73

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page