Skip to main content

A simple scraping library.

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Installation

pip install purifier

Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.13.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.13-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.13.tar.gz.

File metadata

  • Download URL: purifier-0.2.13.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.13.tar.gz
Algorithm Hash digest
SHA256 d4c5a532fdcc96b743ea7be4f14708555e71afbd55fe058ac51132d974d7ca66
MD5 14bc3d83e1afac0a0b0cf9efb0d70d6f
BLAKE2b-256 9f160a1e9188b72e3939a66a5d772097e47fabb3e58ddf82b1b26349cf1e418e

See more details on using hashes here.

File details

Details for the file purifier-0.2.13-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.13-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.13-py3-none-any.whl
Algorithm Hash digest
SHA256 054363fbbf19046d7b93a2a0c4041ce66199eb7d4e1911bf71392ab7f9315005
MD5 885c0699cc6802ca14785adc27496c13
BLAKE2b-256 c42c656851519903137e40260fad8471046a241ca7ece3cf8f4b65d9b6ec0aa4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page