Skip to main content

No project description provided

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Tutorial

See docs/Tutorial.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.2.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.2-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.2.tar.gz.

File metadata

  • Download URL: purifier-0.2.2.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.2.tar.gz
Algorithm Hash digest
SHA256 745a37fd15970f590b715f01622460497d5aa99436025b3aa87e4a1fede4b792
MD5 1dd047a1a8f83d5b6be08c2997db5155
BLAKE2b-256 a03723708f71bdedf1e33a4e32beed2e841edde2224f77fb3af79437eedbbf03

See more details on using hashes here.

File details

Details for the file purifier-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cc2ce438409417aad5f84b76d55599eb8d211954610d3f571a0dbcd2083ecb31
MD5 9874bf12259efa8ac69cd4a9d088c4e5
BLAKE2b-256 132db9496ae9d7f7c83db70c051e23e2d7776e047a0df643f8a5e9cad72e5743

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page