Skip to main content

A CLI tool to dump and replay Kafka messages using Parquet

Project description

kafka-replay-cli

A lightweight, local-first CLI tool for dumping and replaying Kafka messages using Parquet files. Built for observability, debugging, and safe testing of event streams.


Features

  • Dump Kafka topics into Parquet files
  • Replay messages from Parquet back into Kafka
  • Filter replays by timestamp range and key
  • Optional throttling during replay (simulate timing)

📦 Installation

Install from PyPI:

pip install kafka-replay-cli

Usage

Dump messages from a topic to Parquet

kafka-replay-cli dump \
  --topic test-topic \
  --output test.parquet \
  --bootstrap-servers localhost:9092 \
  --max-messages 1000

Replay messages from a Parquet file

kafka-replay-cli replay \
  --input test.parquet \
  --topic replayed-topic \
  --bootstrap-servers localhost:9092 \
  --throttle-ms 100

Add timestamp and key filters

kafka-replay-cli replay \
  --input test.parquet \
  --topic replayed-topic \
  --start-ts "2024-01-01T00:00:00Z" \
  --end-ts "2024-01-02T00:00:00Z" \
  --key-filter "user-123"

🔍 Querying Kafka Messages with DuckDB

You can run SQL directly on dumped Parquet files using the query command:

kafka-replay-cli query \
  --input test.parquet \
  --sql "SELECT timestamp, CAST(key AS VARCHAR) FROM input WHERE CAST(value AS VARCHAR) LIKE '%login%'"

⚠️ Note: Kafka key and value fields are stored as binary (BLOB) in the Parquet file for full fidelity.
To search or filter them using LIKE, you must explicitly cast them to VARCHAR.


Output to file:

kafka-replay-cli query \
  --input test.parquet \
  --sql "SELECT key FROM input" \
  --output results.json

📜 License

MIT


🙋‍♂️ Maintainer

Konstantinas Mamonas
Feel free to fork, open issues, or suggest improvements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kafka_replay_cli-0.1.1.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kafka_replay_cli-0.1.1-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file kafka_replay_cli-0.1.1.tar.gz.

File metadata

  • Download URL: kafka_replay_cli-0.1.1.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for kafka_replay_cli-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d146f39b49cfc39d551ece91c348aa9d0e5b9edb900ee03e2575ed3ecc27c35a
MD5 a17fce9429d276e4bb8d5bcc7dd7f393
BLAKE2b-256 86c3cb487170c6e048663e8f5ed688fb86f80de42a0d90baffac1551fef3c8d3

See more details on using hashes here.

File details

Details for the file kafka_replay_cli-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for kafka_replay_cli-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7a5fcfda1b5b4a00275e762b9f3e2872f2d9ab5b5fcf2db634d2447755ac8a00
MD5 3ecdfe84c1c69cdd13a5017afec72ae1
BLAKE2b-256 eb1255e83d3f57975c5fd26a8fff1db9289af6653dd6e40a24e4d88362685f50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page