Skip to main content

A simple scraping library.

Project description

Purifier

A simple scraping library.

It allows you to easily create simple and concise scrapers, even when the input is quite messy.

Example usage

Extract titles and URLs of articles from Hacker News:

from purifier import request, html, xpath, maps, fields, one

scraper = (
    request()
    | html()
    | xpath('//a[@class="titlelink"]')
    | maps(
        fields(
            title=xpath("text()") | one(),
            url=xpath("@href") | one(),
        )
    )
)

result = scraper.scrape("https://news.ycombinator.com")
result == [
     {
         "title": "Why Is the Web So Monotonous? Google",
         "url": "https://reasonablypolymorphic.com/blog/monotonous-web/index.html",
     },
     {
         "title": "Old jokes",
         "url": "https://dynomight.net/old-jokes/",
     },
     ...
]

Tutorial

See docs/Tutorial.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purifier-0.2.9.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

purifier-0.2.9-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file purifier-0.2.9.tar.gz.

File metadata

  • Download URL: purifier-0.2.9.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.9.tar.gz
Algorithm Hash digest
SHA256 388d1b24dbe8307eff93818cf946cb9f4b91cf43511ad99e8d97a51afaf5f26c
MD5 9baa52a78af2b8a618b48a0aeed754a0
BLAKE2b-256 b4f412b65ea8f64227fc8f87a16163a202d9b5017dfc3a5d69f2a0e8a98815d3

See more details on using hashes here.

File details

Details for the file purifier-0.2.9-py3-none-any.whl.

File metadata

  • Download URL: purifier-0.2.9-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.9.7 Linux/5.13.0-44-generic

File hashes

Hashes for purifier-0.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ae86406de795e472204fec9918312dc69677b607f3640c375d959edad768fb36
MD5 b0d7adf8ec63fdf56f5f5e01e3b019da
BLAKE2b-256 ee0521afd77b181b5e7b33c16535b22a5877eb2d0f3431c420253e92535a497d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page