Skip to main content

Zero-downtime, resumable, parallel re-encryption tool for Apache Cassandra

Project description

cassandra-rekey

Zero-downtime, resumable, parallel re-encryption tool for Apache Cassandra.

CI PyPI License Python

Why

Cassandra encryption keys must be rotated on a schedule (compliance, KMS lifecycle, leaked-key incidents). Re-encrypting petabytes of column-level encrypted blobs is hard:

  • Cannot afford downtime
  • Must resume after pod restarts, network blips, or operator pauses
  • Must not double-encrypt or skip rows under retries
  • Must throttle to protect read-path SLOs
  • Must support multiple tables under a single rotation job

cassandra-rekey does this generically. Plug in any encrypt/decrypt provider (Fernet, AWS KMS, GCP KMS, custom IDPS-style services). Run from a CLI. Resume any time.

How it works

flowchart LR
    A[CLI plan] --> B[Token-Range Planner]
    B --> C[(rekey_jobs / rekey_chunks<br/>meta tables)]
    A2[CLI run] --> D[Async Executor]
    D --> C
    D --> E[Worker pool<br/>asyncio + semaphore]
    E --> F[Crypto Provider<br/>decrypt → re-encrypt]
    F --> G[(Target tables)]
    E --> H[Backpressure monitor<br/>read p99]
    H --> E
  1. Plan: split the Murmur3 token ring into N chunks per table, persist to rekey_chunks.
  2. Run: workers claim PENDING chunks, paginate with token(pk) >= ? AND token(pk) < ?, decrypt with the old key, re-encrypt with the new key, write back with a key-version column. Idempotent on retry.
  3. State: every chunk transition (PENDING → RUNNING → DONE) hits the meta table. Pause = stop the executor; resume = re-run, executor only picks up non-DONE chunks.

Install

pip install cassandra-rekey
# or with KMS support
pip install 'cassandra-rekey[kms]'

Quick start

cassandra-rekey init-state --config config.yaml
cassandra-rekey doctor    --config config.yaml          # read-only pre-flight
cassandra-rekey plan      --config config.yaml
cassandra-rekey run       --config config.yaml --job-id <uuid> --workers 16
cassandra-rekey status    --config config.yaml --job-id <uuid>
cassandra-rekey pause     --config config.yaml --job-id <uuid>
cassandra-rekey resume    --config config.yaml --job-id <uuid>

Config (config.yaml)

cluster:
  contact_points: [cass-1, cass-2, cass-3]
  port: 9042
  local_dc: us-west-2
  keyspace: app_data

state:
  keyspace: rekey_meta

provider:
  type: fernet
  old_key_env: REKEY_OLD_KEY
  new_key_env: REKEY_NEW_KEY

tables:
  - name: users
    partition_key: [tenant_id, user_id]   # composite PK supported
    clustering_key: [event_time]          # used in UPDATE WHERE
    encrypted_columns: [ssn, email]
    chunks: 256
    preserve_ttl: true                    # read TTL(col), write USING TTL ?
    preserve_writetime: true              # read WRITETIME(col), write USING TIMESTAMP ?
    use_lwt_guard: true                   # IF key_version = old_version
    consistency:
      read: LOCAL_QUORUM
      write: LOCAL_QUORUM
  - name: accounts
    partition_key: [account_id]
    encrypted_columns: [account_number]
    chunks: 1024

execution:
  workers: 16
  read_p99_threshold_ms: 50
  pause_on_backpressure: true

Crypto providers

Built-in:

  • fernet — AES-128-CBC + HMAC, symmetric key from env var
  • aws_kms — envelope encryption with KMS data keys (optional, install with [kms])

Custom: implement EncryptProvider:

from cassandra_rekey.crypto.base import EncryptProvider

class MyProvider(EncryptProvider):
    def decrypt(self, ciphertext: bytes) -> bytes: ...
    def encrypt(self, plaintext: bytes) -> bytes: ...
    @property
    def key_version(self) -> str: ...

Register via entry point or pass as provider.module.

Cassandra concepts handled

Concept Handling
Composite partition key Full tuple passed to token(...) for chunk scans
Clustering key Included in UPDATE WHERE so wide partitions update precise rows
TTL preserve_ttl: true reads TTL(col) and writes USING TTL ? (uses min TTL across encrypted cols of a row)
WRITETIME preserve_writetime: true reads WRITETIME(col) and writes USING TIMESTAMP ?
Concurrent app writes use_lwt_guard: true adds IF key_version = <old> — races are reported, not corrupted
Counters Rejected at doctor/plan time — cannot be SET
Encrypted partition/clustering columns Rejected — requires copy-and-swap, out of scope
Encrypted columns with secondary indexes Rejected — index would have stale ciphertext after rekey
Tombstones / null rows Skipped before decrypt attempt
Schema drift mid-rekey schema_fingerprint snapshotted at plan, verified at run; aborts on drift
Consistency level Per-table consistency.read / consistency.write (defaults to LOCAL_QUORUM)
Multi-DC Run from one DC; use LOCAL_QUORUM reads/writes; do not run repair during the rekey window

Out of scope (intentionally)

  • Wide-partition cursoring (clustering-key paged scan) — file an issue if you hit OOM on huge partitions
  • Map / set / list / UDT element-wise re-encryption
  • Per-node permit pool
  • Janitor for stuck RUNNING chunks (planned)

Safety guarantees

  • Idempotent — each row carries a key_version column; workers skip rows already at the new version.
  • Resumable — chunk state is the source of truth, not in-memory progress.
  • Bounded blast radius — a worker crash only loses the in-flight chunk, which is retried.
  • Backpressure — read p99 monitor pauses workers when the cluster is hot.
  • Dry run--dry-run reads + decrypts but does not write, surfaces decode errors safely.

Architecture

See docs/architecture.md.

Roadmap

  • Core plan/run/resume
  • Multi-table fan-out
  • Cassandra-backed state store
  • Pause / resume CLI commands
  • Backpressure monitor (EWMA over read latency)
  • AWS KMS provider (envelope encryption)
  • Janitor for stuck RUNNING chunks
  • testcontainers integration test
  • Prometheus metrics endpoint
  • Web dashboard

Releasing

See docs/release.md. Releases are triggered by pushing a vX.Y.Z tag; the workflow uses PyPI Trusted Publisher OIDC, so no API token is stored in GitHub.

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cassandra_rekey-0.1.0.tar.gz (94.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cassandra_rekey-0.1.0-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file cassandra_rekey-0.1.0.tar.gz.

File metadata

  • Download URL: cassandra_rekey-0.1.0.tar.gz
  • Upload date:
  • Size: 94.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cassandra_rekey-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e2c9152c3f4287b966238a28830f79ab0154c1f167512da157c023042000233f
MD5 7e55daff5bca7ac9cf29e5e7dc4fea1d
BLAKE2b-256 ee3ff2f6d8cb17679d18974c7fd33aa6525e416609f29bb29a54218941ff0518

See more details on using hashes here.

Provenance

The following attestation bundles were made for cassandra_rekey-0.1.0.tar.gz:

Publisher: release.yml on ankit-dub/cassandra-rekey

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cassandra_rekey-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cassandra_rekey-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cassandra_rekey-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e06270ede708d2f6106acc6e67cae0d9740f227c0057ddfaa5f7d19e154ccff2
MD5 0a8e654a37231362726f723530633e30
BLAKE2b-256 4cc9af338713a1ece96e86bb4009739adb242e2b432b9be6fe03b599fdf6f389

See more details on using hashes here.

Provenance

The following attestation bundles were made for cassandra_rekey-0.1.0-py3-none-any.whl:

Publisher: release.yml on ankit-dub/cassandra-rekey

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page