Skip to main content

Redis-based components for Scrapy.

Project description

Documentation Status https://img.shields.io/pypi/v/scrapy-redis.svg https://img.shields.io/pypi/pyversions/scrapy-redis.svg https://img.shields.io/travis/rmax/scrapy-redis.svg Coverage Status Requirements Status Security Status

Redis-based components for Scrapy.

Features

  • Distributed crawling/scraping

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls.

  • Distributed post-processing

    Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue.

  • Scrapy plug-and-play components

    Scheduler + Duplication Filter, Item Pipeline, Base Spiders.

  • In this forked version: added json supported data in Redis

    data contains url, `meta` and other optional parameters. meta is a nested json which contains sub-data. this function extract this data and send another FormRequest with url, meta and addition formdata.

    For example:

    { "url": "https://exaple.com", "meta": {"job-id":"123xsd", "start-date":"dd/mm/yy"}, "url_cookie_key":"fertxsas" }

    this data can be accessed in scrapy spider through response. like: request.url, request.meta, request.cookies

Requirements

  • Python 3.7+

  • Redis >= 5.0

  • Scrapy >= 2.0

  • redis-py >= 4.0

Installation

From pip

pip install scrapy-redis

From GitHub

git clone https://github.com/darkrho/scrapy-redis.git
cd scrapy-redis
python setup.py install
pip uninstall scrapy-redis

Alternative Choice

Frontera is a web crawling framework consisting of crawl frontier, and distribution/scaling primitives, allowing to build a large scale online web crawler.

History

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kk-scrapy-redis-0.0.1.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

kk_scrapy_redis-0.0.1-py2.py3-none-any.whl (16.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file kk-scrapy-redis-0.0.1.tar.gz.

File metadata

  • Download URL: kk-scrapy-redis-0.0.1.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for kk-scrapy-redis-0.0.1.tar.gz
Algorithm Hash digest
SHA256 3bf58890b199ca95ffafd8f02021b46ef84eb7c576cdab6356f0c59db46d0100
MD5 6cce6aca958fe30ede75435918fc483f
BLAKE2b-256 daf915a8f9f443adad4e834276f1323f8d89c91504aee3209b0d8661a51514c8

See more details on using hashes here.

File details

Details for the file kk_scrapy_redis-0.0.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for kk_scrapy_redis-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5d68d754df75b35064596e97dfbff8ff7817c7da1f48fad07b77c84cfaed5ee9
MD5 6d2e555212ad6e97ec5c01a9773bf426
BLAKE2b-256 41457d21adc8c7abe40d7afdcdf7bfa8f1339cc58601533dc42b56f5db32f0cf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page