Redis-based components for Scrapy.
Project description
Redis-based components for Scrapy.
Documentation: https://github.com/rmax/scrapy-redis/wiki.
Contribution: https://github.com/rmax/scrapy-redis/wiki/Getting-Started
LICENSE: MIT license
Features
Distributed crawling/scraping
You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.
Distributed post-processing
Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue.
Scrapy plug-and-play components
Scheduler + Duplication Filter, Item Pipeline, Base Spiders.
In this forked version: added json supported data in Redis
data contains url, `meta` and other optional parameters. meta is a nested json which contains sub-data. this function extract this data and send another FormRequest with url, meta and addition formdata.
For example:
{ "url": "https://exaple.com", "meta": {"job-id":"123xsd", "start-date":"dd/mm/yy"}, "url_cookie_key":"fertxsas" }
this data can be accessed in scrapy spider through response. like: request.url, request.meta, request.cookies
Requirements
Python 3.7+
Redis >= 5.0
Scrapy >= 2.0
redis-py >= 4.0
Installation
From pip
pip install scrapy-redis
From GitHub
git clone https://github.com/darkrho/scrapy-redis.git
cd scrapy-redis
python setup.py install
pip uninstall scrapy-redis
Alternative Choice
Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large scale online web crawler.
History
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kk-scrapy-redis-0.0.1.tar.gz
.
File metadata
- Download URL: kk-scrapy-redis-0.0.1.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bf58890b199ca95ffafd8f02021b46ef84eb7c576cdab6356f0c59db46d0100 |
|
MD5 | 6cce6aca958fe30ede75435918fc483f |
|
BLAKE2b-256 | daf915a8f9f443adad4e834276f1323f8d89c91504aee3209b0d8661a51514c8 |
File details
Details for the file kk_scrapy_redis-0.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: kk_scrapy_redis-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d68d754df75b35064596e97dfbff8ff7817c7da1f48fad07b77c84cfaed5ee9 |
|
MD5 | 6d2e555212ad6e97ec5c01a9773bf426 |
|
BLAKE2b-256 | 41457d21adc8c7abe40d7afdcdf7bfa8f1339cc58601533dc42b56f5db32f0cf |