Skip to main content

A Kafka aggregator based on the Faust stream processing library.

Project description

kafka-aggregator

GitHub Actions Docker Hub repository

A Kafka aggregator based on the Faust Python Stream Processing library.

kafka-aggregator development is based on the Safir application template.

Overview

kafka-aggregator uses Faust’s windowing feature to aggregate a stream of messages from Kafka.

kafka-aggregator implements a Faust agent, a “stream processor”, that adds messages from a source topic into a Faust table. The table is configured as a tumbling window with a size, representing the window duration (time interval) and an expiration time, which specifies the duration for which the data allocated to each window will be stored. Every time a window expires, a callback function is called to aggregate the messages allocated to that window. The size of the window controls the frequency of the aggregated stream.

kafka-aggregator uses faust-avro to add Avro serialization and Schema Registry support to Faust. faust-avro can parse Faust models into Avro Schemas.

See the docs for more information.

Change log

0.2.0 (2020-08-14)

  • Add first and third quartiles (q1 and q3) to the list of summary statistics computed by the aggregator.

  • Ability to configure the list of summary statistics to be computed.

  • Pinned top-level requeriments.

  • Add Kafka Connect to the docker-compose setup.

  • Use only one Schema Registry by default to simplify local execution.

  • First release to PyPI.

0.1.0 (2020-07-13)

Initial release of kafka-aggregator with the following features:

  • Use Faust windowing feature to aggregate a stream of messages.

  • Use Faust-avro to add Avro serialization and Schema Registry support to Faust.

  • Support to an internal Schema Registry to store schemas for the aggreated topics (optional).

  • Create aggregation topic schemas from the source topic schemas and from the list of summary statistics to be computed.

  • Ability to create Faust records dynamically from aggregation topic schemas.

  • Ability to auto-generate code for the Faust agents (stream processors).

  • Compute summary statistics for numeric fields: min(), mean(), median(), stdev(), max().

  • Add example module to initialize a number of source topics in kafka, control the number of fields in each topic, and produce messages for those topics at a given frequency.

  • Use Kafdrop to inspect messages from source and aggregated topics.

  • Add kafka-aggregator documentation site.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kafka-aggregator-0.2.0.tar.gz (78.7 kB view hashes)

Uploaded Source

Built Distribution

kafka_aggregator-0.2.0-py3-none-any.whl (18.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page