rabbitmq-spider is an open-source tool that helps with web scraping by using RabbitMQ and Scrapy to distribute and scale scraping tasks across multiple instances.
Project description
rabbitmq-spider
rabbitmq-spider is an open-source tool that helps with web scraping by using RabbitMQ and Scrapy to distribute and scale scraping tasks across multiple instances.
Inpsired by scrapy-redis.
Features
- It only uses RabbitMQ for message generation tasks and does not use RabbitMQ to implement Scrapy’s queue.
- It can automatically acknowledge (ack) or negatively acknowledge (nack) messages based on the response results.
Installation
pip install rabbitmq_spider
Usage
1.Add config values:
RABBITMQ_HOST = 'localhost'
RABBITMQ_PORT = '5672'
RABBITMQ_USERNAME = 'guest'
RABBITMQ_PASSWORD = 'guest'
RABBITMQ_VIRTUAL_HOST = '/'
SPIDER_MIDDLEWARES = {
'rabbitscrape.middlewares.RabbitmqSpiderMiddleware': 49,
}
2.Add RabbitMQSpider to your spider
import json
from rabbitmq_spider.spiders import RabbitMQSpider
from scrapy import Request
class YourSpider(RabbitMQSpider):
"""Demo"""
name = 'demo'
routing_key = 'demo.queue'
def make_request_from_data(self, data):
msg_dict = json.loads(data)
url = msg_dict['url']
return Request(url)
def parse(self, response, **kwargs):
self.logger.debug(response.status)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rabbitmq_spider-0.0.2.tar.gz
(5.0 kB
view details)
Built Distribution
File details
Details for the file rabbitmq_spider-0.0.2.tar.gz
.
File metadata
- Download URL: rabbitmq_spider-0.0.2.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5858a06d25566684994316b75c3e9e606a71289caabf244a74f59af29aa863e2 |
|
MD5 | 09f8ee6da130849c0eb5acb523f20d19 |
|
BLAKE2b-256 | 48d7ec0c2427b74bf9cf46828481b03130db407a66b40e093e500e776069037e |
File details
Details for the file rabbitmq_spider-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: rabbitmq_spider-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0f3ee276f4ff62cc87e7b36e20a3e22df291451b89b9770f40d868b48ad1ee9 |
|
MD5 | a2a7a5437788341dd3f077a7f74aee5c |
|
BLAKE2b-256 | 037077ebe694f25f37c12865636a8423010adb4589b9d11fc4261bde8bd49dff |