Skip to main content

Provides a simple api interface with a scrapy backen for requesting complicated scraping tasks in an intuitive manner.

Project description

script-scraper: Intuitive queries for nested scraping tasks

What is it?

script-scraper is a Python package that provides a simple and intuitive interface, designed to make the web scraping procedure fast and modifiable. It aims to help developers in writing simple and interpretable bots. script-scraper is built on top of the Scrapy framework, providing the user with a familiar interface.

Where to get it

The source code is currently hosted on GitHub at: https://github.com/vlad-yeghiazaryan/script-scraper

You can also install it with pip, by running:

pip install script-scraper-alpha

Working example

from scriptScraper.extractors import ScriptRunner
request = {
  "urls": ["https://quotes.toscrape.com"],
  "extract_rules": {
    "quotes": {
      "selector": ".quote",
      "type": "list",
      "output": {
        "text": ".text",
        "author": ".author",
        "tags": {
          "selector": ".tag",
          "type": "list"
        },
        "about": {
          "selector": "span a",
          "type": "page",
          "follow": "href",
          "output": {
            "author_name": ".author-title",
            "author_birth_date": ".author-born-date",
            "author_birth_location": ".author-born-location",
            "author_description": ".author-description"
          }
        }
      }
    },
    "next_page": ".next a"
  }
}
crawler = ScriptRunner(delay=1, log=False, output='data.json')
scraped_data = crawler.scrape(request)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

script-scraper-alpha-0.0.6.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

script_scraper_alpha-0.0.6-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file script-scraper-alpha-0.0.6.tar.gz.

File metadata

  • Download URL: script-scraper-alpha-0.0.6.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.2

File hashes

Hashes for script-scraper-alpha-0.0.6.tar.gz
Algorithm Hash digest
SHA256 eb5f9317339328f02203fd26006492a2488a953343036d4a739583d8728489eb
MD5 f688893d005c1d3ae44f23e1683f08ff
BLAKE2b-256 abe11b3cc1cb8675836b38eab3f1628bee90861d06aeaaabea1e9307e08b0acf

See more details on using hashes here.

File details

Details for the file script_scraper_alpha-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: script_scraper_alpha-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.2

File hashes

Hashes for script_scraper_alpha-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9a5031087309669829592e2154e9abd72e2c989793aeb6bc5433562ee466ad0a
MD5 0f41f697a441210bde4c7bdfe51185d1
BLAKE2b-256 dbf8a7adeea55bbfe0647ea78d562e88855926dc38b1930e1db8dee2de5fe625

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page