Skip to main content

Scrape HTML to dictionaries

Project description

Write scraping rules, get dictionaries.

scrapedict is a Python module designed to simplify the process of writing web scraping code. The goal is to make scrapers easy to adapt and maintain, with straightforward and readable code.

Features

  • The rules dictionary is straightforward and easy to read
  • Once you define the rules for one item you can extract multiple items
  • You get ✨dictionaries✨ of the data you want

Installation

$ pip install scrapedict

Usage

import requests
import scrapedict as sd

response = requests.get("https://www.urbandictionary.com/define.php?term=larping")

fields = {
    "word": sd.text(".word"),
    "meaning": sd.text(".meaning"),
    "example": sd.text(".example"),
}

item = sd.extract(fields, response.text)

The orange site example

import requests
import scrapedict as sd

response = requests.get("https://news.ycombinator.com/")

fields = {
    "title": sd.text(".titleline a"),
    "url": sd.attr(".titleline a", "href"),
}

items = sd.extract_all(".athing", fields, response.text)

Development

Dependencies are managed with Poetry.

Testing is done with Tox.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapedict-0.2.1.tar.gz (2.0 kB view hashes)

Uploaded Source

Built Distribution

scrapedict-0.2.1-py3-none-any.whl (2.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page