Skip to main content

Collection of reusable scrapy item pipelines

Project description

scrapy-item-pipelines

Various scrapy item pipelines

SaveToKafkaPipeline

Item pipeline to push items to kafka. Items will be converted into JSON format and pushed to a defined kafka topic.

Settings

SL_SCRAPY_ITEM_PIPELINES_SETTINGS = {
    "push_to_kafka_hosts": "localhost:9092"  # Kafka broker hosts. Separated with a comma.
    "push_to_kafka_default_topic": ""  # kafka default topic.
}

Usage

If items should be pushed to different kafka topics per item, the topic can be defined in the item class. Also if a data key should be pushed to kafka we can define the item field value to use by defining it in the item class. If no kafka_data_key is defined no data key will be pushed.

class DemoItem(scrapy.Item):
    kafka_topic = "topic-to-push-items"
    kafka_data_key = "another_unique_field"

    field_name = scrapy.Field()
    another_unique_field = scrapy.Field()

After configuring add scrapy_item_pipelines.streaming.PushToKafkaPipeline to the ITEM_PIPELINES setting.

ITEM_PIPELINES = {
    ...
    ...
    "scrapy_item_pipelines.streaming.PushToKafkaPipeline": 999,
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-item-pipelines-0.1.tar.gz (4.4 kB view hashes)

Uploaded Source

Built Distribution

scrapy_item_pipelines-0.1-py3-none-any.whl (3.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page