Skip to main content

PDP Kafka package

Project description

PDP Kafka Reader

Requirements

  • git
  • python 3.6+
  • pip

Also, you need access to the git repository. Generate and use ssh keys in Skyway Bitbucket.

Install

pip install pdp_kafka_reader

Usage

CLI

You can use kafka-csv-export CLI tool to extract avro data into csv from a specific avro topic. An example of usage:

kafka-csv-export -k kafka-options.json -s schema.json -t my_kafka_topic -o out.csv

Python KafkaReader

import json

from pdp_kafka_reader.kafka_reader import KafkaAvroReader

kafka_options = {
    "kafka.bootstrap.servers": "my-kafka-server:9092",
    "subscribe": "test_avro"
}

avro_schema = open("schema.json").read()

reader = KafkaAvroReader(spark)
df = reader.read_avro(kafka_options, avro_schema, "my_kafka_topic")
df.show()

Testing

Testing environment in defined in docker-compose.yml. Start docker containers and run tox.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdp-kafka-reader-0.0.3.tar.gz (171.1 kB view hashes)

Uploaded Source

Built Distribution

pdp_kafka_reader-0.0.3-py3-none-any.whl (171.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page