rabbitmq-spider is an open-source tool that helps with web scraping by using RabbitMQ and Scrapy to distribute and scale scraping tasks across multiple instances.
Project description
rabbitmq-spider
rabbitmq-spider is an open-source tool that helps with web scraping by using RabbitMQ and Scrapy to distribute and scale scraping tasks across multiple instances.
Inpsired by scrapy-redis.
Features
- It only uses RabbitMQ for message generation tasks and does not use RabbitMQ to implement Scrapy’s queue.
- It can automatically acknowledge (ack) or negatively acknowledge (nack) messages based on the response results.
Installation
pip install rabbitmq_spider
Usage
1.Add config values:
RABBITMQ_HOST = 'localhost'
RABBITMQ_PORT = '5672'
RABBITMQ_USERNAME = 'guest'
RABBITMQ_PASSWORD = 'guest'
RABBITMQ_VIRTUAL_HOST = '/'
SPIDER_MIDDLEWARES = {
'rabbitscrape.middlewares.RabbitmqSpiderMiddleware': 49,
}
2.Add RabbitMQSpider to your spider
import json
from rabbitmq_spider.spiders import RabbitMQSpider
from scrapy import Request
class YourSpider(RabbitMQSpider):
"""Demo"""
name = 'demo'
api = 'demo.queue'
def make_request_from_data(self, data):
msg_dict = json.loads(data)
url = msg_dict['url']
return Request(url)
def parse(self, response, **kwargs):
self.logger.debug(response.status)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rabbitmq_spider-0.0.1.tar.gz
(5.0 kB
view details)
Built Distribution
File details
Details for the file rabbitmq_spider-0.0.1.tar.gz
.
File metadata
- Download URL: rabbitmq_spider-0.0.1.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72c70a5dc4ffa167985e59d5674a0948383997a3d19263de0f6d7dca5b10ec40 |
|
MD5 | 9217295abfebe428f159b81b59f23313 |
|
BLAKE2b-256 | f9e955197f4b10d57a055d313d4c3eda4a333e64c68df2243c2860972f922763 |
File details
Details for the file rabbitmq_spider-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: rabbitmq_spider-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11fd1be89205b92933a8b1560318114aa90e35bb66e9c4ffd7fa6548826dd2c8 |
|
MD5 | 2846d71f11c9d180d8d923d0aa82d944 |
|
BLAKE2b-256 | 1cac9ade27280f1e345aeabb6dfdbe387c54cf5756d415127e5a4406b7816fff |