rabbitmq-spider is an open-source tool that helps with web scraping by using RabbitMQ and Scrapy to distribute and scale scraping tasks across multiple instances.
Project description
rabbitmq-spider
rabbitmq-spider is an open-source tool that helps with web scraping by using RabbitMQ and Scrapy to distribute and scale scraping tasks across multiple instances.
Inpsired by scrapy-redis.
Features
- It only uses RabbitMQ for message generation tasks and does not use RabbitMQ to implement Scrapy’s queue.
- It can automatically acknowledge (ack) or negatively acknowledge (nack) messages based on the response results.
Installation
pip install rabbitmq_spider
Usage
1.Add config values:
RABBITMQ_HOST = 'localhost'
RABBITMQ_PORT = '5672'
RABBITMQ_USERNAME = 'guest'
RABBITMQ_PASSWORD = 'guest'
RABBITMQ_VIRTUAL_HOST = '/'
SPIDER_MIDDLEWARES = {
'rabbitscrape.middlewares.RabbitmqSpiderMiddleware': 49,
}
2.Add RabbitMQSpider to your spider
import json
from rabbitmq_spider.spiders import RabbitMQSpider
from scrapy import Request
class YourSpider(RabbitMQSpider):
"""Demo"""
name = 'demo'
routing_key = 'demo.queue'
def make_request_from_data(self, data):
msg_dict = json.loads(data)
url = msg_dict['url']
return Request(url)
def parse(self, response, **kwargs):
self.logger.debug(response.status)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rabbitmq_spider-0.0.2.tar.gz.
File metadata
- Download URL: rabbitmq_spider-0.0.2.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5858a06d25566684994316b75c3e9e606a71289caabf244a74f59af29aa863e2
|
|
| MD5 |
09f8ee6da130849c0eb5acb523f20d19
|
|
| BLAKE2b-256 |
48d7ec0c2427b74bf9cf46828481b03130db407a66b40e093e500e776069037e
|
File details
Details for the file rabbitmq_spider-0.0.2-py3-none-any.whl.
File metadata
- Download URL: rabbitmq_spider-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0f3ee276f4ff62cc87e7b36e20a3e22df291451b89b9770f40d868b48ad1ee9
|
|
| MD5 |
a2a7a5437788341dd3f077a7f74aee5c
|
|
| BLAKE2b-256 |
037077ebe694f25f37c12865636a8423010adb4589b9d11fc4261bde8bd49dff
|