Skip to main content

A simple scraping library.

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Installation

pip install purifier

Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.11.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.11-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.11.tar.gz.

File metadata

  • Download URL: purifier-0.2.11.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.11.tar.gz
Algorithm Hash digest
SHA256 e47b912666973a886703a74aac3c77b43edb2128000501d1b539bfd8d3898081
MD5 132afa5f50a8a8be96883e3f7418402c
BLAKE2b-256 77bd9efbec76fd4920819fc354042422b85a742334e85267b3be64dfeb7b3c4f

See more details on using hashes here.

File details

Details for the file purifier-0.2.11-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.11-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.11-py3-none-any.whl
Algorithm Hash digest
SHA256 92c7881e29d72db7e4e2e3a70fc981dd64367afe90cc9671f5d85f31f395e8d4
MD5 ffad56e2ad300da78ce14e82035b0477
BLAKE2b-256 bf043fd75c70a25e6f4ae00a5108c84696d2458bd8dcd352c98e054914da9fc9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page