Redis-based components for Scrapy.
Project description
Redis-based components for Scrapy.
Documentation: https://github.com/rmax/scrapy-redis/wiki.
Contribution: https://github.com/rmax/scrapy-redis/wiki/Getting-Started
LICENSE: MIT license
Features
Distributed crawling/scraping
You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.
Distributed post-processing
Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue.
Scrapy plug-and-play components
Scheduler + Duplication Filter, Item Pipeline, Base Spiders.
In this forked version: added json supported data in Redis
data contains url, `meta` and other optional parameters. meta is a nested json which contains sub-data. this function extract this data and send another FormRequest with url, meta and addition formdata.
For example:
{ "url": "https://exaple.com", "meta": {"job-id":"123xsd", "start-date":"dd/mm/yy"}, "url_cookie_key":"fertxsas" }this data can be accessed in scrapy spider through response. like: request.url, request.meta, request.cookies
Requirements
Python 3.7+
Redis >= 5.0
Scrapy >= 2.0
redis-py >= 4.0
Installation
From pip
pip install scrapy-redis
From GitHub
git clone https://github.com/darkrho/scrapy-redis.git
cd scrapy-redis
python setup.py install
pip uninstall scrapy-redis
Alternative Choice
Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large scale online web crawler.
History
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kk-scrapy-redis-0.0.1.tar.gz.
File metadata
- Download URL: kk-scrapy-redis-0.0.1.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bf58890b199ca95ffafd8f02021b46ef84eb7c576cdab6356f0c59db46d0100
|
|
| MD5 |
6cce6aca958fe30ede75435918fc483f
|
|
| BLAKE2b-256 |
daf915a8f9f443adad4e834276f1323f8d89c91504aee3209b0d8661a51514c8
|
File details
Details for the file kk_scrapy_redis-0.0.1-py2.py3-none-any.whl.
File metadata
- Download URL: kk_scrapy_redis-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d68d754df75b35064596e97dfbff8ff7817c7da1f48fad07b77c84cfaed5ee9
|
|
| MD5 |
6d2e555212ad6e97ec5c01a9773bf426
|
|
| BLAKE2b-256 |
41457d21adc8c7abe40d7afdcdf7bfa8f1339cc58601533dc42b56f5db32f0cf
|