Skip to main content

A high-performance framework for dynamic data

Project description

KCK

The old two-hard-things joke ("There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors") starts with cache invalidation because it comes up so often. Caching can mean 10X or even 1000X performance improvements, so we developers love to cache things.

And, invariably, caching things means we have to come up with strategies for keeping the cached data fresh and in-sync with the source data. Event-based cache invalidation is one of the ways we can work to ensure that cached data is reasonably fresh. And it can be tricky.

KCK is a set of tools to build caches with less pain and it has some nifty tricks to squeeze another 10X or 100X out of the performance numbers for certain workloads.

Features

  • Sophisticated data pipelines can be written simply. Folks with SQL chops can build a backend for their new React application in an afternoon. With a little Python, it's pretty straightforward to turn petabytes of corporate data into simple statistics for the C-level dashboard.
  • It's really fast. KCK manages data flowing in and out of Postgres so it can keep its stable of data products up-to-date, but it serves data from a cache built on Cassandra and the data gets to the application immediately so long as it's in the cache. And KCK helps make sure the data is in the cache and fresh, before an application needs it.
  • Plays well with others. KCK makes it easy to use SQL and SQLAlchemy, but it's Django-friendly too and it'll happily accept data from any source you can talk to with Python.
  • Includes HTTP Server with JWT. KCK is accessible via an include HTTP server that can optionally use JWT to authenticate clients.
  • Makes tiny database servers look fast. Seriously. KCK reduces database pressure to a minimum, then it spreads it out so your database writes don't have to compete with a deluge of read traffic from your web servers and background tasks. And cached writes are on the TODO list.
  • Designed to scale. Both the HTTP servers and the Cassandra cluster on which KCK depends can scale horizontally.

Status

None of this code should be used in production.

With that said, the core parts are the cache, the http service, the refresh worker, and the process worker. The status of each is detailed below.

The cache

The cache is working and it's pretty nifty. Cache misses can cause primers to fire, returning the data and storing it in the cache on the way out so it'll be more quickly available the next time it's requested. Cache entries can be invalidated or they can be set to expire after a certain amount of time. But cache entries can also be automatically refreshed as data is updated, or at a set interval, or even when the system boots up.

The HTTP service

The HTTP service is working. It's very basic, just a /fetch and an /update and, optionally, JWT authentication so it can be used as a backend for mobile apps or newer Javascript web apps made with React or Angular. So there's a lot of power in a pretty simple wrapper and it's easily consumed by other services, languages, etc.

The refresh and process workers are in-progress

To be fully-functional, there needs to be a refresh worker and a process worker running and neither of those are working yet.

The process worker is pretty simple and a good chunk of use cases don't require it at all. It mostly just needs to run a single method every so often so that will go quickly once I sit down to write it.

The refresh worker, unfortunately, is more important than the process worker, so I'm working on it first. I've just completed an overhaul of the background refresh queue code and it's working in a very simple way, but it needs to be scalable and it needs to choose tasks to refresh a bit more carefully than it currently does before it's performing up to spec. So it's still a few weeks out.

System overview

Every piece is scalable and the most user-facing components are the most scalable. The diagram below shows the basics of the KCK system structure.

Scaling KCK

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kck-0.6.3.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

kck-0.6.3-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file kck-0.6.3.tar.gz.

File metadata

  • Download URL: kck-0.6.3.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.4.2 requests/2.19.1 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for kck-0.6.3.tar.gz
Algorithm Hash digest
SHA256 dd0b17956030945c23e0d84898b3a10c6642e596c043183e1a7caa2dd0607041
MD5 8c4a976fddaea9376bb7df3f0b5faeee
BLAKE2b-256 73d6053ac7fc763e1ee45bdb142322407baca6f61af59b329279f07dc0705e39

See more details on using hashes here.

File details

Details for the file kck-0.6.3-py3-none-any.whl.

File metadata

  • Download URL: kck-0.6.3-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.4.2 requests/2.19.1 setuptools/41.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for kck-0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e161f0fab0dc9bf347a93842b3f0b8b0d1e27944b975df3f8a6a48636524a165
MD5 029051bcb5218312f4993d7f5925edfd
BLAKE2b-256 2b53c9c90a175239afb986c067651e42cea8bdc27f131267bdfc6f66a076bf06

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page