to deploy and orchestrate geo-replicated Apache Kafka clusters using Apache MirrorMaker 2.
Project description
Kafka Geo-Replication Demo Orchestrator
Project Overview
This project provides a Python library and command-line interface (CLI) tool to deploy and orchestrate geo-replicated Apache Kafka clusters using Apache MirrorMaker 2.
The project is composed of two modules:
- core: Exposes Python API to execute operations
- cli: Command line interface for core
The goal is to simulate a multi-region or multi-cloud environment in a simplified manner and to demonstrate how data can be reliably replicated for High Availability (HA), Disaster Recovery (DR), and data centralization scenarios.
The inspiration for this project comes from a user request to Aiven discussing stretched clusters - highlighting the growing need for resilient and distributed data architectures where data must be accessible and protected across different failure domains.
This tool aims to demonstrate not only the functionality of Kafka and MirrorMaker 2 but also how automation can simplify the deployment and management of such configurations—a key concept for companies providing abstraction layers over complex cloud services.
Problem Solved / Use Case
In real-world scenarios, companies might need to:
- Maintain a copy of their Kafka data in a separate geographical region for Disaster Recovery.
- Migrate data from an on-premise Kafka cluster to a cloud cluster (or vice-versa, or between different clouds).
- Aggregate data from multiple Kafka clusters (e.g., from microservices) into a central cluster for analytics or monitoring.
- Ensure high availability of critical event streams.
This tool allows for simulating and demonstrating how MirrorMaker 2 addresses these needs.
Tool Features
- Automated Kafka Cluster Deployment: Spins up two distinct Kafka clusters ("primary" and "secondary") locally using Docker Compose, simulating separate environments.
- MirrorMaker 2 Configuration and Startup: Dynamically generates the configuration for MirrorMaker 2 and starts it to replicate data from the primary to the secondary cluster.
- Integrated Event Producer: Allows for easily sending sample messages to the primary cluster to populate topics.
- Integrated Event Consumer: Allows for reading messages from the secondary cluster to verify successful replication.
- Simplified Management: CLI commands to start the entire stack, produce/consume messages, and clean up resources.
Technologies Used
- Python 3.x
- Apache Kafka (running in Docker containers)
- Apache MirrorMaker 2 (running in a Docker container)
- Docker & Docker Compose
- Python Libraries:
clickfor the CLI interfacepython-kafkafor interacting with Kafka (chosen overconfluent-kafka-pythonas this project is intended for testing and demonstration purposes only, not for production use where direct Kafka interaction is critical)
Prerequisites
Before you begin, ensure you have installed:
- Python 3.8+
- Docker
- Docker Compose
Setup and Installation
- Clone the repository:
git clone https://github.com/dmgiangi/kafka-mirror-kit.git cd kafka-mirror-kit
- Create and activate a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install Python dependencies:
pip install -e . # Install in development mode # or pip install . # Regular installation
Continuous Integration
This project uses GitHub Actions for continuous integration. When a pull request is made to the main branch, the
workflow automatically runs all tests to ensure code quality.
Workflow Details
The workflow performs the following steps:
- Sets up a Python 3.8 environment
- Installs all dependencies from pyproject.toml
- Runs all tests using pytest
You can view the workflow configuration in .github/workflows/tests.yml.
Release Process
This project uses Google's release-please-action to automate the release process. Release-please creates release PRs when changes are pushed to the main branch, and automatically updates version numbers and generates changelogs based on conventional commit messages.
How It Works
- When commits are pushed to the
mainbranch, the release-please action analyzes the commit messages. - If there are new features, bug fixes, or other notable changes, release-please creates or updates a release PR.
- The release PR updates the version in
pyproject.toml, updates the changelog, and makes any other necessary version-related changes. - When the release PR is merged, release-please automatically creates a GitHub release with the appropriate tag.
Commit Message Format
To properly trigger version bumps, commit messages should follow the Conventional Commits specification:
feat: add new feature- Triggers a minor version bump (0.1.0 -> 0.2.0)fix: resolve bug- Triggers a patch version bump (0.1.0 -> 0.1.1)docs: update documentation- Included in changelog but doesn't bump versionrefactor: improve code structure- Included in changelog but doesn't bump versionperf: improve performance- Included in changelog but doesn't bump versionchore: update dependencies- Not included in changelogtest: add tests- Not included in changelogci: update CI configuration- Not included in changelogbuild: update build process- Not included in changelog
For breaking changes, add BREAKING CHANGE: in the commit message body or append ! after the type:
feat!: add new feature with breaking changes
BREAKING CHANGE: This feature breaks backward compatibility
This will trigger a major version bump (1.0.0 -> 2.0.0).
You can view the release workflow configuration in .github/workflows/release-please.yml.
Usage
The tool provides a simple command-line interface. After installing the package with pip install -e ., you can use the
kmk command directly:
-
Deploy the infrastructure (Kafka Clusters + MirrorMaker 2):
kmk deployThis command will start the Docker containers for the two Kafka clusters and a MirrorMaker 2 instance configured to replicate all topics (or specific topics if configured) from the
primarycluster to thesecondarycluster. Replicated topics on thesecondarycluster will typically have a prefix (e.g.,primary.topic_name). -
Produce messages to the primary cluster:
kmk produce --topic my_topic --messages 10
This will send 10 sample messages to the
my_topictopic on theprimarycluster. -
Consume messages from the secondary cluster (for verification):
kmk consume --topic primary.my_topic --cluster secondary --messages 10
This will attempt to read 10 messages from the replicated
primary.my_topictopic on thesecondarycluster. -
Check status (optional, to be implemented):
kmk status -
Destroy the infrastructure:
kmk destroyThis will stop and remove all Docker containers created by the tool.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kafka_mirror_kit-0.1.0.tar.gz.
File metadata
- Download URL: kafka_mirror_kit-0.1.0.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ec302701397c1f641824c0c8068c71fbc7571560bc0e2d20414931f512075c4
|
|
| MD5 |
e3a354422e9b4cd0bca35f0d896d3042
|
|
| BLAKE2b-256 |
ab2a7b186d54ee2847f9e52553cbab63cc9c3a57abfe7b6a16bca3cf62ddef66
|
Provenance
The following attestation bundles were made for kafka_mirror_kit-0.1.0.tar.gz:
Publisher:
publish.yml on dmgiangi/kafka-mirror-kit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kafka_mirror_kit-0.1.0.tar.gz -
Subject digest:
0ec302701397c1f641824c0c8068c71fbc7571560bc0e2d20414931f512075c4 - Sigstore transparency entry: 227386244
- Sigstore integration time:
-
Permalink:
dmgiangi/kafka-mirror-kit@5cccb9010f950c4508e84c04f9304434fdc9f008 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dmgiangi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5cccb9010f950c4508e84c04f9304434fdc9f008 -
Trigger Event:
release
-
Statement type:
File details
Details for the file kafka_mirror_kit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kafka_mirror_kit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e49b448c1060604ac9ca43a6df1fc518bfbee88bda1aaf31313b0faf675fb0d
|
|
| MD5 |
095c2efb41bc0be181e2447872ab134f
|
|
| BLAKE2b-256 |
6f7b7b0212521d850b60f2a4c71b535442c10c3c4b366509ef5e16f0e665238a
|
Provenance
The following attestation bundles were made for kafka_mirror_kit-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on dmgiangi/kafka-mirror-kit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kafka_mirror_kit-0.1.0-py3-none-any.whl -
Subject digest:
1e49b448c1060604ac9ca43a6df1fc518bfbee88bda1aaf31313b0faf675fb0d - Sigstore transparency entry: 227386251
- Sigstore integration time:
-
Permalink:
dmgiangi/kafka-mirror-kit@5cccb9010f950c4508e84c04f9304434fdc9f008 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dmgiangi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5cccb9010f950c4508e84c04f9304434fdc9f008 -
Trigger Event:
release
-
Statement type: