Skip to main content

Extract html from one or multiple url's

Project description

A script which extracts HTML from web pages that match a certain CSS pattern.

$ pip install lurk

usage

in python

In python, lurk returns a dictionary:

from lurk import lurk

for link in lurk('http://en.wikipedia.org/wiki/en', 'a'):
    if 'href' in link:
        print link['href']

in bash

In bash, lurk returns JSON.

Familiarize yourself with CSS attribute selectors.

$ lurk \
http://www.gnu.org/software/libc/manual/html_node/Function-Index.html \
'a[href*="#index-"]' \
> links.json

This command saves a JSON object containing an array of links to all GNU C functions into links.json:

[
  {
    "code": "*pthread_getspecific",
    "href": "Thread_002dspecific-Data.html#index-_002apthread_005fgetspecific"
  },

  {
    "code": "*sbrk",
    "href": "Resizing-the-Data-Segment.html#index-_002asbrk"
  },

  // ...
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lurk-0.1.2.tar.gz (2.4 kB view details)

Uploaded Source

Built Distribution

lurk-0.1.2-py2.py3-none-any.whl (4.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file lurk-0.1.2.tar.gz.

File metadata

  • Download URL: lurk-0.1.2.tar.gz
  • Upload date:
  • Size: 2.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for lurk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d248b5ce9de38b6a989535cec2bb5a630d46a3f4040acad6b896e368e42ca412
MD5 ad7e486bdc29be1978eafd8961a99515
BLAKE2b-256 9852b47bd6a69274dc40ab9055965e5ef8f8f6b4d868aa71d42dff9428b18e8e

See more details on using hashes here.

File details

Details for the file lurk-0.1.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for lurk-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ad0da97fc40d9334d2e318ef23281e6b15e102fe08d8a8016c7aef25b9c56c0e
MD5 da737c25557701748753baf369620c26
BLAKE2b-256 d8b385a026aa6baee45e989816271d590b132b534082284b8e9050924203bc89

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page