Skip to main content

No project description provided

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Tutorial

See docs/Tutorial.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.7.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.7-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.7.tar.gz.

File metadata

  • Download URL: purifier-0.2.7.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.7.tar.gz
Algorithm Hash digest
SHA256 64fee5f08e5fdd183389d96c3ecf1b2488ba79292ec1b918143f3335fc3e2e13
MD5 b0668675f498d4e053213bcc787b47d2
BLAKE2b-256 3930ff4aa8642db93d683aeed38347148b59b58aa742d38c464d6a2adf18439f

See more details on using hashes here.

File details

Details for the file purifier-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c4e97a93bf8244e848134381f07999bc27fef8fe48aac55e229c5576e52b86e2
MD5 96516f24f3ab43052d73c7aa51500c37
BLAKE2b-256 92afd5c036e486685d226f1d89d108e40b583437ac9399b30cc8dd9862d71246

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page