A CLI tool to dump and replay Kafka messages using Parquet
Project description
kafka-replay-cli
A lightweight, local-first CLI tool for dumping and replaying Kafka messages using Parquet files. Built for observability, debugging, and safe testing of event streams.
Features
- Dump Kafka topics into Parquet files
- Replay messages from Parquet back into Kafka
- Filter replays by timestamp range and key
- Optional throttling during replay (simulate timing)
Installation
- Clone this repo:
git clone https://github.com/yourusername/kafka-replay-cli
cd kafka-replay-cli
- Install with dependencies:
pip install -e .
Usage
Dump messages from a topic to Parquet
kafka-replay-cli dump \
--topic test-topic \
--output test.parquet \
--bootstrap-servers localhost:9092 \
--max-messages 1000
Replay messages from a Parquet file
kafka-replay-cli replay \
--input test.parquet \
--topic replayed-topic \
--bootstrap-servers localhost:9092 \
--throttle-ms 100
Add timestamp and key filters
kafka-replay-cli replay \
--input test.parquet \
--topic replayed-topic \
--start-ts "2024-01-01T00:00:00Z" \
--end-ts "2024-01-02T00:00:00Z" \
--key-filter "user-123"
🔍 Querying Kafka Messages with DuckDB
You can run SQL directly on dumped Parquet files using the query command:
kafka-replay-cli query \
--input test.parquet \
--sql "SELECT timestamp, CAST(key AS VARCHAR) FROM input WHERE CAST(value AS VARCHAR) LIKE '%login%'"
⚠️ Note: Kafka
keyandvaluefields are stored as binary (BLOB) in the Parquet file for full fidelity.
To search or filter them usingLIKE, you must explicitly cast them toVARCHAR.
Output to file:
kafka-replay-cli query \
--input test.parquet \
--sql "SELECT key FROM input" \
--output results.json
📜 License
MIT
🙋♂️ Maintainer
Konstantinas Mamonas
Feel free to fork, open issues, or suggest improvements.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kafka_replay_cli-0.1.0.tar.gz.
File metadata
- Download URL: kafka_replay_cli-0.1.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1915fbe48bb5158a22895dca2a4ccb4ede466cdcc1ff08f45e59caeaa58f857
|
|
| MD5 |
cfa56d5e8e8bf318df25a01607f09112
|
|
| BLAKE2b-256 |
a56efa71411c10eae014e2d8283ed17ccf5ae5630ac96d0da5ebfa4ea9c6d823
|
File details
Details for the file kafka_replay_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kafka_replay_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e63e237863460efa5ae2c1de4dcbe601cb63218921d8fee4ac41f69c5451256
|
|
| MD5 |
3a29b34a3d4fcbd431759fbb1af42ea8
|
|
| BLAKE2b-256 |
860ea1816eace445d200221bd1184f35e281e11cf8b121a27c2fdad58860a7a6
|