Skip to main content

Kafka clients

Project description

dgkafka

Python package for working with Apache Kafka supporting multiple data formats.

Installation

pip install dgkafka

For Avro support (requires additional dependencies):

pip install dgkafka[avro]

For Json support (requires additional dependencies):

pip install dgkafka[json]

Features

  • Producers and consumers for different data formats:
    • Raw messages (bytes/strings)
    • JSON
    • Avro (with Schema Registry integration)
  • Robust error handling
  • Comprehensive operation logging
  • Context manager support
  • Flexible configuration

Quick Start

Basic Producer/Consumer

from dgkafka import KafkaProducer, KafkaConsumer

# Producer
with KafkaProducer(bootstrap_servers='localhost:9092') as producer:
    producer.produce('test_topic', 'Hello, Kafka!')

# Consumer
with KafkaConsumer(bootstrap_servers='localhost:9092', group_id='test_group') as consumer:
    consumer.subscribe(['test_topic'])
    for msg in consumer.consume():
        print(msg.value())

JSON Support

from dgkafka import JsonKafkaProducer, JsonKafkaConsumer

# Producer
with JsonKafkaProducer(bootstrap_servers='localhost:9092') as producer:
    producer.produce('json_topic', {'key': 'value'})

# Consumer
with JsonKafkaConsumer(bootstrap_servers='localhost:9092', group_id='json_group') as consumer:
    consumer.subscribe(['json_topic'])
    for msg in consumer.consume():
        print(msg.value())  # Automatically deserialized JSON

Avro Support

from dgkafka import AvroKafkaProducer, AvroKafkaConsumer

# Producer
value_schema = {
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "age", "type": "int"}
    ]
}

with AvroKafkaProducer(
    schema_registry_url='http://localhost:8081',
    bootstrap_servers='localhost:9092',
    default_value_schema=value_schema
) as producer:
    producer.produce('avro_topic', {'name': 'Alice', 'age': 30})

# Consumer
with AvroKafkaConsumer(
    schema_registry_url='http://localhost:8081',
    bootstrap_servers='localhost:9092',
    group_id='avro_group'
) as consumer:
    consumer.subscribe(['avro_topic'])
    for msg in consumer.consume():
        print(msg.value())  # Automatically deserialized Avro object

Classes

Base Classes

  • KafkaProducer - base message producer
  • KafkaConsumer - base message consumer

Specialized Classes

  • JsonKafkaProducer - JSON message producer (inherits from KafkaProducer)
  • JsonKafkaConsumer - JSON message consumer (inherits from KafkaConsumer)
  • AvroKafkaProducer - Avro message producer (inherits from KafkaProducer)
  • AvroKafkaConsumer - Avro message consumer (inherits from KafkaConsumer)

Configuration

All classes accept standard Kafka configuration parameters:

config = {
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'my_group',
    'auto.offset.reset': 'earliest'
}

Avro classes require additional parameter:

  • schema_registry_url - Schema Registry URL

Logging

All classes use dglog.Logger for logging. You can provide a custom logger:

from dglog import Logger

logger = Logger()
producer = KafkaProducer(logger_=logger, ...)

Best Practices

  1. Always use context managers (with) for proper resource cleanup
  2. Implement error handling and retry logic for production use
  3. Pre-register Avro schemas in Schema Registry
  4. Configure appropriate acks and retries parameters for producers
  5. Monitor consumer lag and producer throughput

Advanced Usage

Custom Serialization

# Custom Avro serializer
class CustomAvroProducer(AvroKafkaProducer):
    def _serialize_value(self, value):
        # Custom serialization logic
        return super()._serialize_value(value)

Message Headers

# Adding headers to messages
headers = {
    'correlation_id': '12345',
    'message_type': 'user_update'
}

producer.produce(
    topic='events',
    value=message_data,
    headers=headers
)

Error Handling

from confluent_kafka import KafkaException

try:
    with AvroKafkaProducer(...) as producer:
        producer.produce(...)
except KafkaException as e:
    print(f"Kafka error occurred: {e}")

Performance Tips

  1. Batch messages when possible (batch.num.messages config)
  2. Adjust linger.ms for better batching
  3. Use compression.type (lz4, snappy, or gzip)
  4. Tune fetch.max.bytes and max.partition.fetch.bytes for consumers

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dgkafka-1.0.0a7.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dgkafka-1.0.0a7-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file dgkafka-1.0.0a7.tar.gz.

File metadata

  • Download URL: dgkafka-1.0.0a7.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for dgkafka-1.0.0a7.tar.gz
Algorithm Hash digest
SHA256 c80ffe6c67529aa91396fcd4fde2d4543dd388cce998bdc8e2a7cb557ec8bb4e
MD5 12d621a8800c5ddf7f09243d8331f3e8
BLAKE2b-256 b45cbaa95b9aa99b31fca5a82192f144a345518d38b24ca3c196f170a5439c9a

See more details on using hashes here.

File details

Details for the file dgkafka-1.0.0a7-py3-none-any.whl.

File metadata

  • Download URL: dgkafka-1.0.0a7-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for dgkafka-1.0.0a7-py3-none-any.whl
Algorithm Hash digest
SHA256 b265836f6736e974519b4be6edb576eabcf2f394be52c6ff1ef80331e403596b
MD5 3aa3506d93d25982e5e0990a369d22c4
BLAKE2b-256 cfeb2309c86626e04cd9711b4e551e680e0d0b5f758c22c3deaa2bcc8f33611a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page