Provides a simple api interface with a scrapy backen for requesting complicated scraping tasks in an intuitive manner.
Project description
script-scraper: Intuitive queries for nested scraping tasks
What is it?
script-scraper is a Python package that provides a simple and intuitive interface, designed to make the web scraping procedure fast and modifiable. It aims to help developers in writing simple and interpretable bots. script-scraper is built on top of the Scrapy framework, providing the user with a familiar interface.
Where to get it
The source code is currently hosted on GitHub at: https://github.com/vlad-yeghiazaryan/script-scraper
You can also install it with pip, by running:
pip install script-scraper-alpha
Working example
from scriptScraper.extractors import ScriptRunner
request = {
"urls": ["https://quotes.toscrape.com"],
"extract_rules": {
"quotes": {
"selector": ".quote",
"type": "list",
"output": {
"text": ".text",
"author": ".author",
"tags": {
"selector": ".tag",
"type": "list"
},
"about": {
"selector": "span a",
"type": "page",
"follow": "href",
"output": {
"author_name": ".author-title",
"author_birth_date": ".author-born-date",
"author_birth_location": ".author-born-location",
"author_description": ".author-description"
}
}
}
},
"next_page": ".next a"
}
}
crawler = ScriptRunner(delay=1, log=False, output='data.json')
scraped_data = crawler.scrape(request)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for script-scraper-alpha-0.0.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81e2e549748bbbdb2b741bd43af4cd3be3b07a4684f08464fd083bc3b50e45cf |
|
MD5 | f5ea86a992bbbd9b88d3492f25579e34 |
|
BLAKE2b-256 | b31c035331d41dd561c0afc2f9a7ad1866a9410bd8515ca65e32cacca1f368b3 |
Close
Hashes for script_scraper_alpha-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f2c913785c8ed7656dc1a927706891c250b2e0aed8fa479891c0c850b18a2f0 |
|
MD5 | 6ac89fdfa5ba209e7115b930e56fccbb |
|
BLAKE2b-256 | efa1b38080224ead11aa96c6b3f9eba9424f21de5ad25d06537ba0362fbc3d2e |