Skip to main content

PDP Kafka package

Project description

PDP Kafka Reader

Requirements

  • git
  • python 3.6+
  • pip

Also, you need access to the git repository. Generate and use ssh keys in Skyway Bitbucket.

Install

pip install pdp_kafka_reader

Usage

CLI

You can use kafka-reader CLI tool to extract data into from a specific topic. An example of usage:

kafka-reader export-avro -k kafka-options.json -s schema.json -t my_kafka_topic -o out.parquet

Check all options with kafka-reader -h.

Python KafkaReader

import json

from pdp_kafka_reader.kafka_reader import KafkaAvroReader

kafka_options = {
    "kafka.bootstrap.servers": "my-kafka-server:9092",
    "subscribe": "test_avro"
}

avro_schema = open("schema.json").read()

reader = KafkaAvroReader(spark)
df = reader.read_avro(kafka_options, avro_schema, "my_kafka_topic")
df.show()

Testing

Testing environment in defined in docker-compose.yml. Start docker containers and run tox.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdp-kafka-reader-0.0.5.tar.gz (171.8 kB view hashes)

Uploaded Source

Built Distribution

pdp_kafka_reader-0.0.5-py3-none-any.whl (172.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page