Kafka pipline
Project description
os-scrapy-kafka-pipeline
This project provide pipeline to send Scrapy Item to kafka as JSON format
Features:
- support config default kafka brokers and topic in the settings.py file
- support kafka-python producer init args
- support dynamic connect and send to other kafka cluster and topic using item meta
- item will send to kafka as JSON format, bytes will encode to base64
Install
pip install os-scrapy-kafka-pipeline
You can run example spider directly in the project root path.
scrapy crawl example
Usage
Configures
-
enable pipeline in the project settings.py file
ITEM_PIPELINES = { "os_scrapy_kafka_pipeline.KafkaPipeline": 300, }
-
config default kafka brokers
KAFKA_PRODUCER_BROKERS = ["broker01.kafka:9092", "broker02.kafka:9092"]
- brokers in the item meta will override this default value
- pipeline will not be enabled when this settings can not to start kafka connection
- it will raise exception when no brokers configured
-
config default kafka producer
KAFKA_PRODUCER_CONFIGS = {"client_id": "id01", "retries": 1}
- this is global config, the dynamic connections will use this configs
- the
bootstrap_servers
will not work whenKAFKA_PRRDUCER_BROKERS
already configured
-
config defult topic
KAFKA_PRODUCER_TOPIC = "topic01"
- the config in the item.meta will override this config
- it will raise exception when no topic configured
-
config kafka-python loglevel (default "WARNING")
KAFKA_PRODUCER_LOGLEVEL = "DEBUG"
-
config kafka producer close timeout (default: None)
KAFKA_PRODUCER_CLOSE_TIMEOUT = 10
Dynamic Kafka Connection with item.meta
-
you can set topic, key, partition using item.meta
-
the item must has meta mumber which type is dict
-
options:
meta = { "kafka.topic": "topic01", "kafka.key": "key01", "kafka.partition": 1, "kafka.brokers": "broker01.kafka:9092,broker02.kafka:9092" }
Storage Format
Item will send to kafka as JSON format, bytes will encode to base64
Unit Tests
sh scripts/test.sh
License
MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for os_scrapy_kafka_pipeline-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f639951b1bcc67e12e91cc9012b46462c40175cd1e5d8face6a5a9a52fe76e5f |
|
MD5 | 8aed56b3cbf19df33d25f84ab23c7258 |
|
BLAKE2b-256 | f6e7061b350742b6a008fea9196abe9ca0124e8710d33b56c595e9ed79aa2bc0 |