Skip to main content

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Project description

selectorlib

https://img.shields.io/pypi/v/selectorlib.svg https://img.shields.io/travis/scrapehero/selectorlib.svg Documentation Status Updates

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Example

>>> from selectorlib import Extractor
>>> yaml_string = """
    title:
        css: "h1"
        type: Text
    link:
        css: "h2 a"
        type: Link
    """
>>> extractor = Extractor.from_yaml_string(yaml_string)
>>> html = """
    <h1>Title</h1>
    <h2>Usage
        <a class="headerlink" href="http://test">¶</a>
    </h2>
    """
>>> extractor.extract(html)
{'title': 'Title', 'link': 'http://test'}

History

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selectorlib-0.14.0.tar.gz (188.8 kB view details)

Uploaded Source

Built Distribution

selectorlib-0.14.0-py2.py3-none-any.whl (5.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file selectorlib-0.14.0.tar.gz.

File metadata

  • Download URL: selectorlib-0.14.0.tar.gz
  • Upload date:
  • Size: 188.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.1

File hashes

Hashes for selectorlib-0.14.0.tar.gz
Algorithm Hash digest
SHA256 a8d30e0963354ca7f2570f5538b80de8c846aef8632e7da2b10ebd68a32c3fd7
MD5 668a0c2aad76f12abc97b8b9cd3c84a5
BLAKE2b-256 4cfaef673e7ec3f5821311f14f2e348774370211ca072606f140f535837e9e5a

See more details on using hashes here.

File details

Details for the file selectorlib-0.14.0-py2.py3-none-any.whl.

File metadata

  • Download URL: selectorlib-0.14.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.1

File hashes

Hashes for selectorlib-0.14.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5a7965e5666502454d3092ac3cb07a990dca30b1ae6229cf1d2348482072e22f
MD5 03a44f925468b082ec68b7e41a70e6a1
BLAKE2b-256 4a7459b32e13cd71096156e29e8c1748913deff03d67b2708d5efc407a226425

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page