Provides a simple api interface with a scrapy backen for requesting complicated scraping tasks in an intuitive manner.
Project description
script-scraper: Intuitive queries for nested scraping tasks
What is it?
script-scraper is a Python package that provides a simple and intuitive interface, designed to make the web scraping procedure fast and modifiable. It aims to help developers in writing simple and interpretable bots. script-scraper is built on top of the Scrapy framework, providing the user with a familiar interface.
Where to get it
The source code is currently hosted on GitHub at: https://github.com/vlad-yeghiazaryan/script-scraper
You can also install it with pip, by running:
pip install script-scraper-alpha
Working example
from scriptScraper.extractors import ScriptRunner
request = {
"urls": ["https://quotes.toscrape.com"],
"extract_rules": {
"quotes": {
"selector": ".quote",
"type": "list",
"output": {
"text": ".text",
"author": ".author",
"tags": {
"selector": ".tag",
"type": "list"
},
"about": {
"selector": "span a",
"type": "page",
"follow": "href",
"output": {
"author_name": ".author-title",
"author_birth_date": ".author-born-date",
"author_birth_location": ".author-born-location",
"author_description": ".author-description"
}
}
}
},
"next_page": ".next a"
}
}
crawler = ScriptRunner(delay=1, log=False, output='data.json')
scraped_data = crawler.scrape(request)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file script-scraper-alpha-0.0.6.tar.gz
.
File metadata
- Download URL: script-scraper-alpha-0.0.6.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb5f9317339328f02203fd26006492a2488a953343036d4a739583d8728489eb |
|
MD5 | f688893d005c1d3ae44f23e1683f08ff |
|
BLAKE2b-256 | abe11b3cc1cb8675836b38eab3f1628bee90861d06aeaaabea1e9307e08b0acf |
File details
Details for the file script_scraper_alpha-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: script_scraper_alpha-0.0.6-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a5031087309669829592e2154e9abd72e2c989793aeb6bc5433562ee466ad0a |
|
MD5 | 0f41f697a441210bde4c7bdfe51185d1 |
|
BLAKE2b-256 | dbf8a7adeea55bbfe0647ea78d562e88855926dc38b1930e1db8dee2de5fe625 |