Skip to main content

A simple scraping library.

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Installation

pip install purifier

Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.16.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.16-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.16.tar.gz.

File metadata

  • Download URL: purifier-0.2.16.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.16.tar.gz
Algorithm Hash digest
SHA256 b8a25a027f5b33188a9af6d5d9c1fcd1ef9587092ed471ca6de668db9251239a
MD5 a4925614e7390f05c47771ebf3299981
BLAKE2b-256 74964560a645e8c91d47791cf88c71ff2f0243c4a4e1f71b2d5f699cb643c3c1

See more details on using hashes here.

File details

Details for the file purifier-0.2.16-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.16-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.16-py3-none-any.whl
Algorithm Hash digest
SHA256 19ba1e1aab895ef964cd4f7905607edb5d2b6e28564161fc8ee7473bdfc28bee
MD5 f2e1c17d401de3c4a2d5076c6178feab
BLAKE2b-256 fcabfbed5979fe23d64873d2732b8df3b089ec5f4ba604fba88c0bb6eaf4b91b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page