Scrapy middleware for Zenscrape
Project description
Scrapy Zenscrape Middleware
Acknowledgements
Thanks to arimbr and ScrapingBee, this is adaptation of their work.
Installation
pip install scrapy-zenscrape
Configuration
Add your ZENSCRAPE_API_KEY
and the ZenscrapeMiddleware
to your project settings.py. Don't forget to set CONCURRENT_REQUESTS
according to your Zenscrape plan.
ZENSCRAPE_API_KEY = 'REPLACE-WITH-YOUR-API-KEY'
DOWNLOADER_MIDDLEWARES = {
'scrapy_zenscrape.ZenscrapeMiddleware': 700,
}
CONCURRENT_REQUESTS = 1
Usage
Inherit your spiders from ZenscrapeSpider
and yield a ZenscrapeRequest
.
Below you can see an example from the spider in httpbin.py.
from scrapy import Spider
from scrapy_zenscrape import ZenscrapeSpider, ZenscrapeRequest
class HttpbinSpider(Spider):
name = 'httpbin'
start_urls = [
'https://httpbin.org',
]
def start_requests(self):
for url in self.start_urls:
yield ScrapingBeeRequest(url, params={
# 'render': False,
# 'block_ads': True,
# 'block_resources': False,
# 'premium': True,
# 'location': 'fr',
# 'wait_for': 5,
# 'wait_for_css': '#swagger-ui',
},
headers={
# 'Accept-Language': 'En-US',
},
cookies={
# 'name_1': 'value_1',
})
def parse(self, response):
...
You can pass Zenscrape parameters in the params argument of a ZenscrapeRequest. Headers and cookies are passed like a normal Scrapy Request. ZenscrapeRequests formats all parameters, headers and cookies to the format expected by the API.
Examples
Add your API key to settings.py.
To run the examples you need to clone this repository. In your terminal, go to examples/httpbin/httpbin
and run the example spider with:
scrapy crawl httpbin
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy-zenscrape-0.0.2.tar.gz
.
File metadata
- Download URL: scrapy-zenscrape-0.0.2.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb9914a189841dd83840a50139a3bd8eb13fe3952aff3ccb93beb09d3c9bf2b6 |
|
MD5 | 56333aaad870d20dbe00133eda6af66d |
|
BLAKE2b-256 | aea1a04432db9e931cd2c2f096c2d1b469212f419a425f2303d90911022ebbbf |