Skip to main content

Provides a simple api interface with a scrapy backen for requesting complicated scraping tasks in an intuitive manner.

Project description

script-scraper: Intuitive queries for nested scraping tasks

What is it?

script-scraper is a Python package that provides a simple and intuitive interface, designed to make the web scraping procedure fast and modifiable. It aims to help developers in writing simple and interpretable bots. script-scraper is built on top of the Scrapy framework, providing the user with a familiar interface.

Where to get it

The source code is currently hosted on GitHub at: https://github.com/vlad-yeghiazaryan/script-scraper

You can also install it with pip, by running:

pip install script-scraper

Working example

from scriptScraper.extractors import ScriptRunner
request = {
  "urls": ["https://quotes.toscrape.com"],
  "extract_rules": {
    "quotes": {
      "selector": ".quote",
      "type": "list",
      "output": {
        "text": ".text",
        "author": ".author",
        "tags": {
          "selector": ".tag",
          "type": "list"
        },
        "about": {
          "selector": "span a",
          "type": "page",
          "follow": "href",
          "output": {
            "author_name": ".author-title",
            "author_birth_date": ".author-born-date",
            "author_birth_location": ".author-born-location",
            "author_description": ".author-description"
          }
        }
      }
    }
  }
}
crawler = ScriptRunner(delay=1, log=False, output='data.json')
scraped_data = crawler.scrape(request)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

script-scraper-alpha-0.0.1.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

script_scraper_alpha-0.0.1-py3-none-any.whl (5.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page