Collection of reusable scrapy item pipelines
Project description
scrapy-item-pipelines
Various scrapy item pipelines
SaveToKafkaPipeline
Item pipeline to push items to kafka. Items will be converted into JSON format and pushed to a defined kafka topic.
Settings
SL_SCRAPY_ITEM_PIPELINES_SETTINGS = {
"push_to_kafka_hosts": "localhost:9092" # Kafka broker hosts. Separated with a comma.
"push_to_kafka_default_topic": "" # kafka default topic.
}
Usage
If items should be pushed to different kafka topics per item, the topic can be defined in the item class.
Also if a data key should be pushed to kafka we can define the item field value to use by defining it
in the item class. If no kafka_data_key
is defined no data key will be pushed.
class DemoItem(scrapy.Item):
kafka_topic = "topic-to-push-items"
kafka_data_key = "another_unique_field"
field_name = scrapy.Field()
another_unique_field = scrapy.Field()
After configuring add scrapy_item_pipelines.streaming.PushToKafkaPipeline
to the ITEM_PIPELINES setting.
ITEM_PIPELINES = {
...
...
"scrapy_item_pipelines.streaming.PushToKafkaPipeline": 999,
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy-item-pipelines-0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66893b29511c998578f7d025a14d0c1de68621fae0c1bc7c8c0e3d4d2600a4ec |
|
MD5 | 7d33e63c4fc041ff2cfa90e0068742ed |
|
BLAKE2b-256 | 80dbe7f38bbf85b67778dc315d1b2c20aaa4425ed6a4953b922ef804eab1f846 |
Close
Hashes for scrapy_item_pipelines-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 206c414ac6f804a0e71e52d61287668083dfe44371e927a5190850f6460843d6 |
|
MD5 | e1944f8745bad3827d7b67ae8442d4d0 |
|
BLAKE2b-256 | 625e5c4d592c1d9e4d1bdbb54dcf61901bc6c88f4a9ddb30c14dfb503b5c5fd2 |