Python library for building stream processing applications with Apache Kafka
Project description
100% Python Stream Processing for Apache Kafka
Quix Streams is a cloud-native library for processing data in Kafka using pure Python. It’s designed to give you the power of a distributed system in a lightweight library by combining Kafka's low-level scalability and resiliency features with an easy-to-use Python interface (to ease newcomers to stream processing).
It has the following benefits:
- Streaming DataFrame API (similar to pandas DataFrame) for tabular data transformations.
- Custom stateful operations via a state object.
- Custom reducing and aggregating over tumbling and hopping time windows.
- Exactly-once processing semantics via Kafka transactions.
- Pure Python with no need for a server-side engine.
Use Quix Streams to build simple Kafka producer/consumer applications or leverage stream processing to build complex event-driven systems, real-time data pipelines and AI/ML products.
Getting Started 🏄
Install Quix Streams
# PyPI
python -m pip install quixstreams
# or conda
conda install -c conda-forge quixio::quixstreams
Requirements
Python 3.9+, Apache Kafka 0.10+
See requirements.txt for the full list of requirements
Documentation
Example
Here's an example of how to process data from a Kafka Topic with Quix Streams:
from quixstreams import Application
# A minimal application reading temperature data in Celsius from the Kafka topic,
# converting it to Fahrenheit and producing alerts to another topic.
# Define an application that will connect to Kafka
app = Application(
broker_address="localhost:9092", # Kafka broker address
)
# Define the Kafka topics
temperature_topic = app.topic("temperature-celsius", value_deserializer="json")
alerts_topic = app.topic("temperature-alerts", value_serializer="json")
# Create a Streaming DataFrame connected to the input Kafka topic
sdf = app.dataframe(topic=temperature_topic)
# Convert temperature to Fahrenheit by transforming the input message (with an anonymous or user-defined function)
sdf = sdf.apply(lambda value: {"temperature_F": (value["temperature"] * 9/5) + 32})
# Filter values above the threshold
sdf = sdf[sdf["temperature_F"] > 150]
# Produce alerts to the output topic
sdf = sdf.to_topic(alerts_topic)
# Run the streaming application (app automatically tracks the sdf!)
app.run()
Tutorials
To see Quix Streams in action, check out the Quickstart and Tutorials in the docs:
Key Concepts
There are two primary objects:
StreamingDataFrame
- a predefined declarative pipeline to process and transform incoming messages.Application
- to manage the Kafka-related setup, teardown and message lifecycle (consuming, committing). It processes each message with the dataframe you provide for it to run.
Under the hood, the Application
will:
- Consume and deserialize messages.
- Process them with your
StreamingDataFrame
. - Produce it to the output topic.
- Automatically checkpoint processed messages and state for resiliency.
- Scale using Kafka's built-in consumer groups mechanism.
Deployment
You can run Quix Streams pipelines anywhere Python is installed.
Deploy to your own infrastructure or to Quix Cloud on AWS, Azure, GCP or on-premise for a fully managed platform.
You'll get self-service DevOps, CI/CD and monitoring, all built with best in class engineering practices learned from Formula 1 Racing.
Please see the Connecting to Quix Cloud page to learn how to use Quix Streams and Quix Cloud together.
Roadmap 📍
This library is being actively developed by a full-time team.
Here are some of the planned improvements:
- Windowed aggregations over Tumbling & Hopping windows
- Stateful operations and recovery based on Kafka changelog topics
- Group-by operation
- "Exactly Once" delivery guarantees for Kafka message processing (AKA transactions)
- Support for Avro and Protobuf formats
- Schema Registry support
- Joins
- Windowed aggregations over Sliding windows
For a more detailed overview of the planned features, please look at the Roadmap Board.
Get Involved 🤝
- Please use GitHub issues to report bugs and suggest new features.
- Join the Quix Community on Slack, a vibrant group of Kafka Python developers, data engineers and newcomers to Apache Kafka, who are learning and leveraging Quix Streams for real-time data processing.
- Watch and subscribe to @QuixStreams on YouTube for code-along tutorials from scratch and interesting community highlights.
- Follow us on X and LinkedIn where we share our latest tutorials, forthcoming community events and the occasional meme.
- If you have any questions or feedback - write to us at support@quix.io!
License 📗
Quix Streams is licensed under the Apache 2.0 license.
View a copy of the License file here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for quixstreams-3.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3d3bf384982920cd81b1dd06fe302c421f8be24305802b7f782d800e38e34c5 |
|
MD5 | f33e4f82d45db8f838d5bb98258b8743 |
|
BLAKE2b-256 | 40237b6390ef736770211d6d30def3add74c5793ee114e5570aed154f6e83de1 |