Skip to main content

A simple scraping library.

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Tutorial

See docs/Tutorial.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.8.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.8-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.8.tar.gz.

File metadata

  • Download URL: purifier-0.2.8.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.8.tar.gz
Algorithm Hash digest
SHA256 1db44696009f2010ead50a7eb61db25091994cba32e5c0845751f0f89f37d062
MD5 b86c215c924934a2cf24d5695ec070b4
BLAKE2b-256 fc71018a70bf7535810af30d3c2791621f0a482ca45b1eb6f1cf1f41b9bcc769

See more details on using hashes here.

File details

Details for the file purifier-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 55f48f92c10ab45c39fa88a719f5b84d978dc16034f71469cabcac0388fa42a6
MD5 4d310f1d484c4739d838e616e14c95eb
BLAKE2b-256 7c83fde2616ee9640cb97bf64d0f7f6308c7e28f251ade4d303b466d2cd477ab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page