Skip to main content

No project description provided

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Tutorial

See docs/Tutorial.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.6.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.6-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.6.tar.gz.

File metadata

  • Download URL: purifier-0.2.6.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.6.tar.gz
Algorithm Hash digest
SHA256 0548ae12c8f741464d6e23232caf19e277405ad657f91865a5bfdf2df3527d2c
MD5 d88868191a4f8c4fce8a9b65558afa4c
BLAKE2b-256 1ea337f661a52ca9c48e250bc3119fc09f85f1bcd5735c883528a6cfb55c157a

See more details on using hashes here.

File details

Details for the file purifier-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 fe6a4cddef54ecc232f96c5247f3f04a4ddb82d1c2b61e1cf78f3d6a15f65f2e
MD5 3f393a7c71b1b8a8228aae5bde10608c
BLAKE2b-256 6297553cb1265c4106cabf48499e57c01f1f9aa1f3945aee1e1cefa3172eee59

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page