Skip to main content

PQC-native per-tenant KV cache encryption for multi-tenant LLM inference. ML-KEM-768 derived session keys, AES-256-GCM per-entry encryption, automatic key rotation, tenant isolation enforcement.

Project description

PQC KV Cache Encryption

PQC Native ML-KEM-768 AES-256-GCM License Version

Per-tenant, quantum-safe encryption for the LLM KV cache. Multi-tenant inference servers store gigabytes of KV cache in shared host/device RAM. A side-channel or a compromised co-tenant can lift another user's private conversation state directly out of that cache. This library wraps every KV cache entry in a fresh AES-256-GCM envelope whose key is derived per session via ML-KEM-768, enforces strict tenant isolation at the cryptographic boundary, rotates keys on a configurable policy, and ships with an append-only audit log for every encrypt / decrypt / rotate / isolation-violation event.

The Problem

Long-context LLM inference keeps past token activations in the KV cache - a per-layer, per-position tensor store that can run to multiple GB. On a multi-tenant inference server (vLLM, TGI, or any production stack sharing a GPU across requests) that cache sits in plaintext process memory:

  • Side-channel reads. A malicious co-tenant with timing or page-table-based primitives can read another tenant's cache pages.
  • Cross-request leakage. A bug in cache eviction or session routing can hand one tenant's intermediate state to another.
  • Harvest-now-decrypt-later. Even if host-level encryption is on, classical key exchange (ECDH) recorded today is broken by a future CRQC.
  • Regulated workloads. Healthcare, finance, and legal inference pipelines have 7+ year retention requirements on conversation state; classical confidentiality alone no longer clears the audit bar.

The Solution

  • ML-KEM-768 derives a fresh 32-byte symmetric key per TenantSession. In production the tenant presents a KEM public key and the inference server runs Encapsulate; here we delegate to quantumshield.
  • AES-256-GCM encrypts every KVCacheEntry. One nonce per entry, AAD binds EntryMetadata + sequence_number + key_len so tampering with layer/position/sequence surfaces as a DecryptionError.
  • TenantIsolationManager holds a session per tenant and refuses cross-tenant decrypts even when asked explicitly; a misrouted ciphertext raises TenantIsolationError before AES touches the bytes.
  • KeyRotationPolicy rotates the per-session key after N entries or T seconds, resetting the sequence counter.
  • KVAuditLog is append-only and records encrypt, decrypt, rotate, and isolation-violation events.

Installation

pip install pqc-kv-cache-encryption

Development:

pip install -e ".[dev]"

Quick Start

import os

from pqc_kv_cache import (
    CacheDecryptor,
    CacheEncryptor,
    EntryMetadata,
    KVCacheEntry,
    TenantIdentity,
    establish_tenant_session,
)

# 1. Establish a per-tenant session (ML-KEM-768 derived AES-256-GCM key).
tenant = TenantIdentity(tenant_id="tenant-alice", display_name="Alice Corp")
session = establish_tenant_session(tenant)

# 2. Wrap a KV cache entry in a signed envelope.
meta = EntryMetadata(
    tenant_id=tenant.tenant_id,
    session_id=session.session_id,
    layer_idx=0,
    position=12,
    token_id=2048,
)
entry = KVCacheEntry(
    metadata=meta,
    key_tensor_bytes=os.urandom(64),   # raw bytes of K vector
    value_tensor_bytes=os.urandom(64), # raw bytes of V vector
)
enc = CacheEncryptor(session).encrypt_entry(entry)

# 3. Decrypt with the same session. AES-GCM verifies AAD, tenant, replay.
decrypted = CacheDecryptor(session).decrypt_entry(enc)
assert decrypted.key_tensor_bytes == entry.key_tensor_bytes

Multi-tenant with strict isolation:

from pqc_kv_cache import TenantIsolationManager, TenantIsolationError

mgr = TenantIsolationManager()
mgr.create_session(TenantIdentity(tenant_id="tenant-alice"))
mgr.create_session(TenantIdentity(tenant_id="tenant-bob"))

alice_enc = mgr.encrypt("tenant-alice", alice_entry)

# Bob can NEVER decrypt Alice's entry, even when using his own valid session.
try:
    mgr.decrypt("tenant-bob", alice_enc)
except TenantIsolationError:
    print("blocked at the isolation boundary")

Architecture

+-----------------------------+              +-----------------------------+
|  Tenant Alice               |              |  Tenant Bob                 |
|  (client)                   |              |  (client)                   |
+--------------+--------------+              +--------------+--------------+
               |                                             |
               |  ML-KEM-768 handshake (per session)         |
               v                                             v
+---------------------------------------------------------------------------+
|                   Inference Server (multi-tenant)                         |
|                                                                           |
|  TenantIsolationManager                                                   |
|    +------------------------+        +------------------------+           |
|    | TenantSession (alice)  |        | TenantSession (bob)    |           |
|    |   symmetric_key (32B)  |        |   symmetric_key (32B)  |           |
|    |   next_sequence        |        |   next_sequence        |           |
|    |   entries_encrypted    |        |   entries_encrypted    |           |
|    +----------+-------------+        +----------+-------------+           |
|               |                                 |                         |
|               v                                 v                         |
|    CacheEncryptor / CacheDecryptor   CacheEncryptor / CacheDecryptor      |
|       AES-256-GCM + AAD                 AES-256-GCM + AAD                 |
|       + tenant-id enforcement           + tenant-id enforcement           |
|               |                                 |                         |
|               v                                 v                         |
|    +---------------------+        +---------------------+                 |
|    | EncryptedEntry      |        | EncryptedEntry      |                 |
|    |  (alice ciphertext) |        |  (bob ciphertext)   |                 |
|    +---------+-----------+        +---------+-----------+                 |
|              |                              |                             |
|              +-----------+------------------+                             |
|                          v                                                |
|             +---------------------------+                                 |
|             |  KV cache in GPU/host RAM |  (only ciphertext lives here)   |
|             +---------------------------+                                 |
|                                                                           |
|  KeyRotationPolicy  -- rotates session keys on entry count / age          |
|  KVAuditLog         -- encrypt / decrypt / rotate / isolation-violation   |
+---------------------------------------------------------------------------+

Cryptography

Primitive Purpose Algorithm
Per-session key Fresh 32-byte symmetric key per tenant session ML-KEM-768
Per-entry encryption Confidentiality + integrity of K/V tensor bytes AES-256-GCM
AAD binding EntryMetadata + sequence_number + key_len -> tag AES-GCM tag
Session-key derivation SHA3-256 over KEM keypair bytes (production: Decapsulate) SHA3-256

Signing and KEM keys are delegated to quantumshield, which prefers real liboqs ML-KEM / ML-DSA when available and falls back to a transitional backend otherwise.

Threat Model

Adversary capability Coverage
Read KV cache pages for another tenant All entries are AES-256-GCM encrypted; attacker sees only ciphertext.
Replay a previously captured EncryptedEntry CacheDecryptor tracks seen nonces and raises NonceReplayError.
Tamper with EntryMetadata (layer_idx, position, tenant_id) AAD binding -> AES-GCM tag fails -> DecryptionError.
Submit another tenant's ciphertext through a valid session TenantIsolationError raised before AES touches bytes.
Long-lived session key exposure KeyRotationPolicy rotates on entry-count / age; sequence counter resets.
Session outlives its TTL SessionExpiredError on every encrypt/decrypt after expires_at.
Harvest-now-decrypt-later on the KEM handshake ML-KEM-768 provides IND-CCA2 security under quantum adversaries.
Orphaned tenant state after disconnect close_session() drops the session and its key from memory.

Performance Considerations

This library is written in pure Python and is intended as the cryptographic envelope for multi-tenant LLM inference, not a hot-path encryption kernel. Production deployments wrap the same patterns in:

  • A CUDA / ROCm kernel that operates on the K/V tensors in device memory.
  • A driver-side AES-GCM engine (H100 confidential compute, AMD SEV-SNP).
  • A batched nonce / sequence allocator to amortize session bookkeeping across a batch of requests.

The envelope formats (EncryptedEntry, AAD shape, TenantSession state machine) are deliberately portable so that the native kernel and the Python reference implementation produce interoperable ciphertexts.

API Reference

TenantIdentity

tenant_id: str, display_name: str = "" — frozen dataclass identifying a tenant.

establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession

Derive a fresh 32-byte symmetric key for tenant via ML-KEM-768 and return a TenantSession.

TenantSession

Holds symmetric_key, next_sequence, entries_encrypted, created_at, expires_at. Methods: is_valid(), check_valid(), consume_sequence(), rotate_key(new_key), to_public_dict().

KVCacheEntry / EncryptedEntry / EntryMetadata

KVCacheEntry holds metadata, key_tensor_bytes, value_tensor_bytes. EncryptedEntry holds metadata, nonce (hex), ciphertext (hex), key_len, sequence_number. EntryMetadata is frozen and carries tenant_id, session_id, layer_idx, position, token_id, kv_role.

CacheEncryptor(session) / CacheDecryptor(session)

encrypt_entry(KVCacheEntry) -> EncryptedEntry and decrypt_entry(EncryptedEntry) -> KVCacheEntry. Both enforce tenant-id match. Decryptor tracks nonces for replay protection.

KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)

should_rotate(session) -> (bool, RotationTrigger | None) and rotate(session) -> bytes (new 32-byte key). RotationTrigger is ENTRY_COUNT, TIME_ELAPSED, or MANUAL.

TenantIsolationManager

create_session(tenant), get_session(tenant_id), encrypt(tenant_id, entry), decrypt(tenant_id, enc), close_session(tenant_id), list_active_tenants().

KVAuditLog / KVAuditEntry

log_encrypt(...), log_decrypt(...), log_rotate(...), log_isolation_violation(...), entries(limit, tenant_id, operation), export_json().

Errors

All under KVCacheError: TenantIsolationError, SessionExpiredError, DecryptionError, NonceReplayError, KeyRotationRequiredError, UnknownTenantError.

Why PQC Matters for the KV Cache

Inference logs and intermediate conversation state are retained for 7+ years in regulated industries:

  • Healthcare (HIPAA): 6-year minimum retention on any PHI-bearing record, including the model context that reasoned over it.
  • Finance (SEC 17a-4, MiFID II): 5-7 year retention on all communications with a client, including AI-assisted drafting.
  • Legal (privilege / e-discovery): communications privilege only survives if the confidentiality chain is intact.

The same adversary who is recording your classical TLS session today - harvest-now-decrypt-later - is also recording the residual state of your inference servers. A PQC envelope around the KV cache is what keeps that state confidential past the arrival of a cryptographically relevant quantum computer.

Examples

  • examples/basic_kv_encryption.py - single tenant, encrypt/decrypt 3 entries, inspect audit log.
  • examples/multi_tenant_isolation.py - Alice and Bob co-resident, cross-tenant decrypt is rejected.
  • examples/key_rotation.py - KeyRotationPolicy with max_entries=5, observe rotation mid-stream.

License

Apache License 2.0 - see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqc_kv_cache_encryption-0.1.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pqc_kv_cache_encryption-0.1.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file pqc_kv_cache_encryption-0.1.0.tar.gz.

File metadata

  • Download URL: pqc_kv_cache_encryption-0.1.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pqc_kv_cache_encryption-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ca3e0d7be631b28772e1f14474ab1a0a875688dd64fbe53e1c7ead8ec29f4d2d
MD5 65f1f1d76d653a67e1d02175ab4d5f73
BLAKE2b-256 4c1820906b27a4b753eab7cd46cecdb12d01a8f2428f205b0ac9e734d594d5e3

See more details on using hashes here.

File details

Details for the file pqc_kv_cache_encryption-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pqc_kv_cache_encryption-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f2fc3bd544f4ef9874b4d3417dfa96b5529c9803504833e6d49d0ceb05804fd
MD5 ee2e7d4874a630cc4cf0180bede7d7fa
BLAKE2b-256 33235ca8cd1bbc78481226348eb45c977f726a3be406492d4fa638e68c5d9399

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page