Skip to main content

A simple scraping library.

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Installation

pip install purifier

Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.14.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.14-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.14.tar.gz.

File metadata

  • Download URL: purifier-0.2.14.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.14.tar.gz
Algorithm Hash digest
SHA256 c830534ef84d3ac4601ba2715104085bebd0732855d7909137f78bb9cc62edec
MD5 1f6130c8dacd9bfda13de4a737521b2c
BLAKE2b-256 4ed760a6c815d53604b9a59e5dbace93f0f6ac1f7b137cc1d6a3da49b9b1bd45

See more details on using hashes here.

File details

Details for the file purifier-0.2.14-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.14-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 2d62c4bbea58ea4aeb58b01e522827d91de9b2465b0a5910420e8f1b77a7674d
MD5 389d4c83c961b1be98a50da94616eb8d
BLAKE2b-256 492af399c7fd6786a892d0871763d49b67ab7896ad3e5a8c50d7d7078a830a71

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page