Skip to main content

Kafka Store provides an easy way of archiving data from Kafka

Project description

## Kafka Store

[![Build Status](https://travis-ci.org/smyte/kafka_store.svg?branch=master)](https://travis-ci.org/smyte/kafka_store)

Kafka store provides a safe method for long term archiving of Kafka topics.

## Features

  • Simple guarantee. Properly set up Kafka Store ensures that every single message in a Kafka topic is backed up to Google Cloud Storage exactly once, with a predictable filename and in a fault tolerent manner.

  • Saves large compressed avro-encoded files to your server with low memory requirements.

  • Optionally logs files to a MySQL table with offset ranges for quicker lookup.

## Comparison to Secor

This tool is very similar to a previously released tool called [Secor](https://github.com/pinterest/secor). We started out using Secor, but our motivation for writing a replacement was primarily due to the predictable filename guarantee, as well as many production problems while trying to use a tool that was far more complicated than neccessary for our use case.

Our guarantee is stronger. By using the new timestamp feature of Kafka we can ensure that each message always lands up in the same file. Since our files are always named with the offset of the initial message, streaming from S3 is simplified since the filename of the next dump is predictable (final_offset + 1).

  • We only target long term archiving. There is no support for output partitioning, transformation, etc.

  • There is no statistics interface. We recommend alarming based on Kafka lag.

## Requirements

  • Timestamps must be enabled on your Kafka Broker. This requires newer versions of Kafka and minimum protocol 0.10.0.0 enabled.

  • Your librdkafka must be support timestamps. If you’re using compression you might want to check our [un-merged patch](https://github.com/edenhill/librdkafka/pull/858).

  • We do not (yet) support compacted topics.

## Example

## Future work

We’re releasing a product that works as required by us, but we’re very aware it won’t fulful all (or even most) of potential use cases. Unfortunately as a startup we don’t have the time to spare to complete these, but we’re happy to review pull requests and work with the community to get required features out the door.

  • Configuration file rather than taking all options via the command line. This will be a pre-requisite for most of the other tasks.

  • Full support for Google Cloud authentication. At the moment we’re running inside GCE so the default authentication just works.

  • Support for S3, Azure, and other long term storage systems.

  • Consuming from mulitple topics on the same instance. At the moment we only support a single topic.

History

0.1.0 (2016-11-04)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kafka_store-0.1.2.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

kafka_store-0.1.2-py2.py3-none-any.whl (12.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file kafka_store-0.1.2.tar.gz.

File metadata

  • Download URL: kafka_store-0.1.2.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for kafka_store-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b84aed4da2e2381a08658d6168af81b9e1e9a222580e908f48c45f0014142681
MD5 14fd47443a86ce384d9c64e8762c03f2
BLAKE2b-256 fcc70000258075edbc35367b5e17f28f7b9352ce2190038575b921c4ebbb15ee

See more details on using hashes here.

File details

Details for the file kafka_store-0.1.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for kafka_store-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b36065d9cb8a81fc252b0ab64d5f164304401919606d8cdb19c6acc2c9766786
MD5 5e83242d02cad86dee1b9c8b29bdbbea
BLAKE2b-256 a1abb7559df4f0c433e031dbf11dec1a4145ccd4babd151f5b935b2cfe79f5df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page