Skip to main content

No project description provided

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Tutorial

See docs/Tutorial.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.5.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.5-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.5.tar.gz.

File metadata

  • Download URL: purifier-0.2.5.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.5.tar.gz
Algorithm Hash digest
SHA256 ce65eedd3311d48378da4ebfa4d4c57a1f32a1f022d60f9dd888801bdb21d8f9
MD5 af6443c9baf68472386aa725f94ec893
BLAKE2b-256 fa42c01b154ae7264a34b804f4bdc4dbc51d81eabee6a9ae27ac6eaa65b7f4fe

See more details on using hashes here.

File details

Details for the file purifier-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1fe719de5fbccd1ffd68d5734b8f06438b24c8c673a14ba33e6deae5ccb9783d
MD5 218460f23bea0f22e4f5b384ade3457d
BLAKE2b-256 8e390147f4ba1f070ba6c82bb3f493dcab78dcdba1da332cb458d53d26338418

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page