to deploy and orchestrate geo-replicated Apache Kafka clusters using Apache MirrorMaker 2.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dmgiangi

These details have not been verified by PyPI

Project description

Kafka Geo-Replication Demo Orchestrator

Project Overview

This project provides a Python library and command-line interface (CLI) tool to deploy and orchestrate geo-replicated Apache Kafka clusters using Apache MirrorMaker 2.

The project is composed of two modules:

core: Exposes Python API to execute operations
cli: Command line interface for core

The goal is to simulate a multi-region or multi-cloud environment in a simplified manner and to demonstrate how data can be reliably replicated for High Availability (HA), Disaster Recovery (DR), and data centralization scenarios.

The inspiration for this project comes from a user request to Aiven discussing stretched clusters - highlighting the growing need for resilient and distributed data architectures where data must be accessible and protected across different failure domains.

This tool aims to demonstrate not only the functionality of Kafka and MirrorMaker 2 but also how automation can simplify the deployment and management of such configurations—a key concept for companies providing abstraction layers over complex cloud services.

Problem Solved / Use Case

In real-world scenarios, companies might need to:

Maintain a copy of their Kafka data in a separate geographical region for Disaster Recovery.
Migrate data from an on-premise Kafka cluster to a cloud cluster (or vice-versa, or between different clouds).
Aggregate data from multiple Kafka clusters (e.g., from microservices) into a central cluster for analytics or monitoring.
Ensure high availability of critical event streams.

This tool allows for simulating and demonstrating how MirrorMaker 2 addresses these needs.

Tool Features

Automated Kafka Cluster Deployment: Spins up two distinct Kafka clusters ("primary" and "secondary") locally using Docker Compose, simulating separate environments.
MirrorMaker 2 Configuration and Startup: Dynamically generates the configuration for MirrorMaker 2 and starts it to replicate data from the primary to the secondary cluster.
Integrated Event Producer: Allows for easily sending sample messages to the primary cluster to populate topics.
Integrated Event Consumer: Allows for reading messages from the secondary cluster to verify successful replication.
Simplified Management: CLI commands to start the entire stack, produce/consume messages, and clean up resources.

Technologies Used

Python 3.x
Apache Kafka (running in Docker containers)
Apache MirrorMaker 2 (running in a Docker container)
Docker & Docker Compose
Python Libraries:
- click for the CLI interface
- python-kafka for interacting with Kafka (chosen over confluent-kafka-python as this project is intended for testing and demonstration purposes only, not for production use where direct Kafka interaction is critical)

Prerequisites

Before you begin, ensure you have installed:

Python 3.8+
Docker
Docker Compose

Setup and Installation

Clone the repository:

git clone https://github.com/dmgiangi/kafka-mirror-kit.git
cd kafka-mirror-kit

Create and activate a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python dependencies:

pip install -e .  # Install in development mode
# or
pip install .     # Regular installation

Continuous Integration

This project uses GitHub Actions for continuous integration. When a pull request is made to the main branch, the workflow automatically runs all tests to ensure code quality.

Workflow Details

The workflow performs the following steps:

Sets up a Python 3.8 environment
Installs all dependencies from pyproject.toml
Runs all tests using pytest

You can view the workflow configuration in .github/workflows/tests.yml.

Release Process

This project uses Google's release-please-action to automate the release process. Release-please creates release PRs when changes are pushed to the main branch, and automatically updates version numbers and generates changelogs based on conventional commit messages.

How It Works

When commits are pushed to the main branch, the release-please action analyzes the commit messages.
If there are new features, bug fixes, or other notable changes, release-please creates or updates a release PR.
The release PR updates the version in pyproject.toml, updates the changelog, and makes any other necessary version-related changes.
When the release PR is merged, release-please automatically creates a GitHub release with the appropriate tag.

Commit Message Format

To properly trigger version bumps, commit messages should follow the Conventional Commits specification:

feat: add new feature - Triggers a minor version bump (0.1.0 -> 0.2.0)
fix: resolve bug - Triggers a patch version bump (0.1.0 -> 0.1.1)
docs: update documentation - Included in changelog but doesn't bump version
refactor: improve code structure - Included in changelog but doesn't bump version
perf: improve performance - Included in changelog but doesn't bump version
chore: update dependencies - Not included in changelog
test: add tests - Not included in changelog
ci: update CI configuration - Not included in changelog
build: update build process - Not included in changelog

For breaking changes, add BREAKING CHANGE: in the commit message body or append ! after the type:

feat!: add new feature with breaking changes

BREAKING CHANGE: This feature breaks backward compatibility

This will trigger a major version bump (1.0.0 -> 2.0.0).

You can view the release workflow configuration in .github/workflows/release-please.yml.

Usage

The tool provides a simple command-line interface. After installing the package with pip install -e ., you can use the kmk command directly:

Deploy the infrastructure (Kafka Clusters + MirrorMaker 2):
```
kmk deploy
```
This command will start the Docker containers for the two Kafka clusters and a MirrorMaker 2 instance configured to replicate all topics (or specific topics if configured) from the primary cluster to the secondary cluster. Replicated topics on the secondary cluster will typically have a prefix (e.g., primary.topic_name).
Produce messages to the primary cluster:
```
kmk produce --topic my_topic --messages 10
```
This will send 10 sample messages to the my_topic topic on the primary cluster.
Consume messages from the secondary cluster (for verification):
```
kmk consume --topic primary.my_topic --cluster secondary --messages 10
```
This will attempt to read 10 messages from the replicated primary.my_topic topic on the secondary cluster.
Check status (optional, to be implemented):
```
kmk status
```
Destroy the infrastructure:
```
kmk destroy
```
This will stop and remove all Docker containers created by the tool.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dmgiangi

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kafka_mirror_kit-0.1.0.tar.gz (16.9 kB view details)

Uploaded Jun 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kafka_mirror_kit-0.1.0-py3-none-any.whl (13.7 kB view details)

Uploaded Jun 2, 2025 Python 3

File details

Details for the file kafka_mirror_kit-0.1.0.tar.gz.

File metadata

Download URL: kafka_mirror_kit-0.1.0.tar.gz
Upload date: Jun 2, 2025
Size: 16.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kafka_mirror_kit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0ec302701397c1f641824c0c8068c71fbc7571560bc0e2d20414931f512075c4`
MD5	`e3a354422e9b4cd0bca35f0d896d3042`
BLAKE2b-256	`ab2a7b186d54ee2847f9e52553cbab63cc9c3a57abfe7b6a16bca3cf62ddef66`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kafka_mirror_kit-0.1.0.tar.gz:

Publisher: publish.yml on dmgiangi/kafka-mirror-kit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kafka_mirror_kit-0.1.0.tar.gz
- Subject digest: 0ec302701397c1f641824c0c8068c71fbc7571560bc0e2d20414931f512075c4
- Sigstore transparency entry: 227386244
- Sigstore integration time: Jun 2, 2025
Source repository:
- Permalink: dmgiangi/kafka-mirror-kit@5cccb9010f950c4508e84c04f9304434fdc9f008
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/dmgiangi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5cccb9010f950c4508e84c04f9304434fdc9f008
- Trigger Event: release

File details

Details for the file kafka_mirror_kit-0.1.0-py3-none-any.whl.

File metadata

Download URL: kafka_mirror_kit-0.1.0-py3-none-any.whl
Upload date: Jun 2, 2025
Size: 13.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for kafka_mirror_kit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e49b448c1060604ac9ca43a6df1fc518bfbee88bda1aaf31313b0faf675fb0d`
MD5	`095c2efb41bc0be181e2447872ab134f`
BLAKE2b-256	`6f7b7b0212521d850b60f2a4c71b535442c10c3c4b366509ef5e16f0e665238a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kafka_mirror_kit-0.1.0-py3-none-any.whl:

Publisher: publish.yml on dmgiangi/kafka-mirror-kit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kafka_mirror_kit-0.1.0-py3-none-any.whl
- Subject digest: 1e49b448c1060604ac9ca43a6df1fc518bfbee88bda1aaf31313b0faf675fb0d
- Sigstore transparency entry: 227386251
- Sigstore integration time: Jun 2, 2025
Source repository:
- Permalink: dmgiangi/kafka-mirror-kit@5cccb9010f950c4508e84c04f9304434fdc9f008
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/dmgiangi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5cccb9010f950c4508e84c04f9304434fdc9f008
- Trigger Event: release

kafka-mirror-kit 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Kafka Geo-Replication Demo Orchestrator

Project Overview

Problem Solved / Use Case

Tool Features

Technologies Used

Prerequisites

Setup and Installation

Continuous Integration

Workflow Details

Release Process

How It Works

Commit Message Format

Usage

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance