Asynchronous job library that consume RabbitMQ for PDF urls and publish pdf text back.
Project description
rabbitmq_pdfparser
rabbitmq_pdfparser is asynchronous job library that consume RabbitMQ for PDF urls and publish pdf text back to RabbitMQ. It stops when queue is empty.
Installation
You can install this library easily with pip.
pip install rabbitmq-pdfparser
Usage
Data must send to source queue should this format:
{"id": "foo", "url": "http://example.com/foo/bar.pdf"}
As a library
import os
import asyncio
from rabbitmq_pdfparser import consume
if __name__ == '__main__':
logger = logging.getLogger("rabbitmq_pdfparser")
logger.setLevel(os.environ.get('LOG_LEVEL', "DEBUG"))
handler = logging.StreamHandler()
handler.setFormatter(
logging.Formatter(
os.environ.get('LOG_FORMAT', "%(asctime)s [%(levelname)s] %(name)s: %(message)s")
)
)
logger.addHandler(handler)
config = {
"mq_host": os.environ.get('MQ_HOST'),
"mq_port": int(os.environ.get('MQ_PORT')),
"mq_vhost": os.environ.get('MQ_VHOST'),
"mq_user": os.environ.get('MQ_USER'),
"mq_pass": os.environ.get('MQ_PASS'),
"mq_source_queue": os.environ.get('MQ_SOURCE_QUEUE'),
"mq_target_exchange": os.environ.get('MQ_TARGET_EXCHANGE'),
"mq_target_routing_key": os.environ.get('MQ_TARGET_ROUTING_KEY')
}
loop = asyncio.get_event_loop()
loop.run_until_complete(
consume(
loop=loop,
consumer_pool_size=10,
config=config
)
)
loop.close()
This library uses PyPDF2, aio_pika and aiohttp packages.
Standalone
You can also call this library as standalone PDF parser job. Just set required environment variables and run rabbitmq_pdfparser
. This usecase perfectly fits when you need run it on cronjobs or kubernetes jobs.
Required environment variables:
- MQ_HOST
- MQ_PORT (optional)
- MQ_VHOST
- MQ_USER
- MQ_PASS
- MQ_SOURCE_QUEUE (Queue that job consume urls)
- MQ_TARGET_EXCHANGE (Exchange that job publish texts)
- MQ_TARGET_ROUTING_KEY (Routing key that job publish texts)
- MQ_QUEUE_DURABLE (optional, default value: True)
- CONSUMER_POOL_SIZE (optional, default value: 10)
- LOG_LEVEL (Logging level. See: Python logging module docs)
Example Kubernetes job: You can see it to kube.yaml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file rabbitmq_pdfparser-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: rabbitmq_pdfparser-1.0.1-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4d3dc79b7608be10cbeac50620a004a8b4845eca386aabaaba2950f5d474497 |
|
MD5 | adf414f55d72fed58d2a90407325eb13 |
|
BLAKE2b-256 | 12f30e0728147c26f42b26b1bb7e53668b8c2c65d4941ad09ff44b158ed6ea33 |