Skip to main content

A CLI tool to dump and replay Kafka messages using Parquet

Project description

kafka-replay-cli

A lightweight, local-first CLI tool for dumping and replaying Kafka messages using Parquet files. Built for observability, debugging, and safe testing of event streams.


Features

  • Dump Kafka topics into Parquet files
  • Replay messages from Parquet back into Kafka
  • Filter replays by timestamp range and key
  • Optional throttling during replay (simulate timing)

Installation

  1. Clone this repo:
git clone https://github.com/yourusername/kafka-replay-cli
cd kafka-replay-cli
  1. Install with dependencies:
pip install -e .

Usage

Dump messages from a topic to Parquet

kafka-replay-cli dump \
  --topic test-topic \
  --output test.parquet \
  --bootstrap-servers localhost:9092 \
  --max-messages 1000

Replay messages from a Parquet file

kafka-replay-cli replay \
  --input test.parquet \
  --topic replayed-topic \
  --bootstrap-servers localhost:9092 \
  --throttle-ms 100

Add timestamp and key filters

kafka-replay-cli replay \
  --input test.parquet \
  --topic replayed-topic \
  --start-ts "2024-01-01T00:00:00Z" \
  --end-ts "2024-01-02T00:00:00Z" \
  --key-filter "user-123"

🔍 Querying Kafka Messages with DuckDB

You can run SQL directly on dumped Parquet files using the query command:

kafka-replay-cli query \
  --input test.parquet \
  --sql "SELECT timestamp, CAST(key AS VARCHAR) FROM input WHERE CAST(value AS VARCHAR) LIKE '%login%'"

⚠️ Note: Kafka key and value fields are stored as binary (BLOB) in the Parquet file for full fidelity.
To search or filter them using LIKE, you must explicitly cast them to VARCHAR.


Output to file:

kafka-replay-cli query \
  --input test.parquet \
  --sql "SELECT key FROM input" \
  --output results.json

📜 License

MIT


🙋‍♂️ Maintainer

Konstantinas Mamonas
Feel free to fork, open issues, or suggest improvements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kafka_replay_cli-0.1.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kafka_replay_cli-0.1.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file kafka_replay_cli-0.1.0.tar.gz.

File metadata

  • Download URL: kafka_replay_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for kafka_replay_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c1915fbe48bb5158a22895dca2a4ccb4ede466cdcc1ff08f45e59caeaa58f857
MD5 cfa56d5e8e8bf318df25a01607f09112
BLAKE2b-256 a56efa71411c10eae014e2d8283ed17ccf5ae5630ac96d0da5ebfa4ea9c6d823

See more details on using hashes here.

File details

Details for the file kafka_replay_cli-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for kafka_replay_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e63e237863460efa5ae2c1de4dcbe601cb63218921d8fee4ac41f69c5451256
MD5 3a29b34a3d4fcbd431759fbb1af42ea8
BLAKE2b-256 860ea1816eace445d200221bd1184f35e281e11cf8b121a27c2fdad58860a7a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page