PQC-native per-tenant KV cache encryption for multi-tenant LLM inference. ML-KEM-768 derived session keys, AES-256-GCM per-entry encryption, automatic key rotation, tenant isolation enforcement.
Project description
PQC KV Cache Encryption
Per-tenant, quantum-safe encryption for the LLM KV cache. Multi-tenant inference servers store gigabytes of KV cache in shared host/device RAM. A side-channel or a compromised co-tenant can lift another user's private conversation state directly out of that cache. This library wraps every KV cache entry in a fresh AES-256-GCM envelope whose key is derived per session via ML-KEM-768, enforces strict tenant isolation at the cryptographic boundary, rotates keys on a configurable policy, and ships with an append-only audit log for every encrypt / decrypt / rotate / isolation-violation event.
The Problem
Long-context LLM inference keeps past token activations in the KV cache - a per-layer, per-position tensor store that can run to multiple GB. On a multi-tenant inference server (vLLM, TGI, or any production stack sharing a GPU across requests) that cache sits in plaintext process memory:
- Side-channel reads. A malicious co-tenant with timing or page-table-based primitives can read another tenant's cache pages.
- Cross-request leakage. A bug in cache eviction or session routing can hand one tenant's intermediate state to another.
- Harvest-now-decrypt-later. Even if host-level encryption is on, classical key exchange (ECDH) recorded today is broken by a future CRQC.
- Regulated workloads. Healthcare, finance, and legal inference pipelines have 7+ year retention requirements on conversation state; classical confidentiality alone no longer clears the audit bar.
The Solution
- ML-KEM-768 derives a fresh 32-byte symmetric key per
TenantSession. In production the tenant presents a KEM public key and the inference server runs Encapsulate; here we delegate toquantumshield. - AES-256-GCM encrypts every
KVCacheEntry. One nonce per entry, AAD bindsEntryMetadata+sequence_number+key_lenso tampering with layer/position/sequence surfaces as aDecryptionError. TenantIsolationManagerholds a session per tenant and refuses cross-tenant decrypts even when asked explicitly; a misrouted ciphertext raisesTenantIsolationErrorbefore AES touches the bytes.KeyRotationPolicyrotates the per-session key after N entries or T seconds, resetting the sequence counter.KVAuditLogis append-only and recordsencrypt,decrypt,rotate, andisolation-violationevents.
Installation
pip install pqc-kv-cache-encryption
Development:
pip install -e ".[dev]"
Quick Start
import os
from pqc_kv_cache import (
CacheDecryptor,
CacheEncryptor,
EntryMetadata,
KVCacheEntry,
TenantIdentity,
establish_tenant_session,
)
# 1. Establish a per-tenant session (ML-KEM-768 derived AES-256-GCM key).
tenant = TenantIdentity(tenant_id="tenant-alice", display_name="Alice Corp")
session = establish_tenant_session(tenant)
# 2. Wrap a KV cache entry in a signed envelope.
meta = EntryMetadata(
tenant_id=tenant.tenant_id,
session_id=session.session_id,
layer_idx=0,
position=12,
token_id=2048,
)
entry = KVCacheEntry(
metadata=meta,
key_tensor_bytes=os.urandom(64), # raw bytes of K vector
value_tensor_bytes=os.urandom(64), # raw bytes of V vector
)
enc = CacheEncryptor(session).encrypt_entry(entry)
# 3. Decrypt with the same session. AES-GCM verifies AAD, tenant, replay.
decrypted = CacheDecryptor(session).decrypt_entry(enc)
assert decrypted.key_tensor_bytes == entry.key_tensor_bytes
Multi-tenant with strict isolation:
from pqc_kv_cache import TenantIsolationManager, TenantIsolationError
mgr = TenantIsolationManager()
mgr.create_session(TenantIdentity(tenant_id="tenant-alice"))
mgr.create_session(TenantIdentity(tenant_id="tenant-bob"))
alice_enc = mgr.encrypt("tenant-alice", alice_entry)
# Bob can NEVER decrypt Alice's entry, even when using his own valid session.
try:
mgr.decrypt("tenant-bob", alice_enc)
except TenantIsolationError:
print("blocked at the isolation boundary")
Architecture
+-----------------------------+ +-----------------------------+
| Tenant Alice | | Tenant Bob |
| (client) | | (client) |
+--------------+--------------+ +--------------+--------------+
| |
| ML-KEM-768 handshake (per session) |
v v
+---------------------------------------------------------------------------+
| Inference Server (multi-tenant) |
| |
| TenantIsolationManager |
| +------------------------+ +------------------------+ |
| | TenantSession (alice) | | TenantSession (bob) | |
| | symmetric_key (32B) | | symmetric_key (32B) | |
| | next_sequence | | next_sequence | |
| | entries_encrypted | | entries_encrypted | |
| +----------+-------------+ +----------+-------------+ |
| | | |
| v v |
| CacheEncryptor / CacheDecryptor CacheEncryptor / CacheDecryptor |
| AES-256-GCM + AAD AES-256-GCM + AAD |
| + tenant-id enforcement + tenant-id enforcement |
| | | |
| v v |
| +---------------------+ +---------------------+ |
| | EncryptedEntry | | EncryptedEntry | |
| | (alice ciphertext) | | (bob ciphertext) | |
| +---------+-----------+ +---------+-----------+ |
| | | |
| +-----------+------------------+ |
| v |
| +---------------------------+ |
| | KV cache in GPU/host RAM | (only ciphertext lives here) |
| +---------------------------+ |
| |
| KeyRotationPolicy -- rotates session keys on entry count / age |
| KVAuditLog -- encrypt / decrypt / rotate / isolation-violation |
+---------------------------------------------------------------------------+
Cryptography
| Primitive | Purpose | Algorithm |
|---|---|---|
| Per-session key | Fresh 32-byte symmetric key per tenant session | ML-KEM-768 |
| Per-entry encryption | Confidentiality + integrity of K/V tensor bytes | AES-256-GCM |
| AAD binding | EntryMetadata + sequence_number + key_len -> tag |
AES-GCM tag |
| Session-key derivation | SHA3-256 over KEM keypair bytes (production: Decapsulate) | SHA3-256 |
Signing and KEM keys are delegated to quantumshield, which prefers real liboqs ML-KEM / ML-DSA when available and falls back to a transitional backend otherwise.
Threat Model
| Adversary capability | Coverage |
|---|---|
| Read KV cache pages for another tenant | All entries are AES-256-GCM encrypted; attacker sees only ciphertext. |
Replay a previously captured EncryptedEntry |
CacheDecryptor tracks seen nonces and raises NonceReplayError. |
Tamper with EntryMetadata (layer_idx, position, tenant_id) |
AAD binding -> AES-GCM tag fails -> DecryptionError. |
| Submit another tenant's ciphertext through a valid session | TenantIsolationError raised before AES touches bytes. |
| Long-lived session key exposure | KeyRotationPolicy rotates on entry-count / age; sequence counter resets. |
| Session outlives its TTL | SessionExpiredError on every encrypt/decrypt after expires_at. |
| Harvest-now-decrypt-later on the KEM handshake | ML-KEM-768 provides IND-CCA2 security under quantum adversaries. |
| Orphaned tenant state after disconnect | close_session() drops the session and its key from memory. |
Performance Considerations
This library is written in pure Python and is intended as the cryptographic envelope for multi-tenant LLM inference, not a hot-path encryption kernel. Production deployments wrap the same patterns in:
- A CUDA / ROCm kernel that operates on the K/V tensors in device memory.
- A driver-side AES-GCM engine (H100 confidential compute, AMD SEV-SNP).
- A batched nonce / sequence allocator to amortize session bookkeeping across a batch of requests.
The envelope formats (EncryptedEntry, AAD shape, TenantSession state machine) are deliberately portable so that the native kernel and the Python reference implementation produce interoperable ciphertexts.
API Reference
TenantIdentity
tenant_id: str, display_name: str = "" — frozen dataclass identifying a tenant.
establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession
Derive a fresh 32-byte symmetric key for tenant via ML-KEM-768 and return a TenantSession.
TenantSession
Holds symmetric_key, next_sequence, entries_encrypted, created_at, expires_at. Methods: is_valid(), check_valid(), consume_sequence(), rotate_key(new_key), to_public_dict().
KVCacheEntry / EncryptedEntry / EntryMetadata
KVCacheEntry holds metadata, key_tensor_bytes, value_tensor_bytes. EncryptedEntry holds metadata, nonce (hex), ciphertext (hex), key_len, sequence_number. EntryMetadata is frozen and carries tenant_id, session_id, layer_idx, position, token_id, kv_role.
CacheEncryptor(session) / CacheDecryptor(session)
encrypt_entry(KVCacheEntry) -> EncryptedEntry and decrypt_entry(EncryptedEntry) -> KVCacheEntry. Both enforce tenant-id match. Decryptor tracks nonces for replay protection.
KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)
should_rotate(session) -> (bool, RotationTrigger | None) and rotate(session) -> bytes (new 32-byte key). RotationTrigger is ENTRY_COUNT, TIME_ELAPSED, or MANUAL.
TenantIsolationManager
create_session(tenant), get_session(tenant_id), encrypt(tenant_id, entry), decrypt(tenant_id, enc), close_session(tenant_id), list_active_tenants().
KVAuditLog / KVAuditEntry
log_encrypt(...), log_decrypt(...), log_rotate(...), log_isolation_violation(...), entries(limit, tenant_id, operation), export_json().
Errors
All under KVCacheError: TenantIsolationError, SessionExpiredError, DecryptionError, NonceReplayError, KeyRotationRequiredError, UnknownTenantError.
Why PQC Matters for the KV Cache
Inference logs and intermediate conversation state are retained for 7+ years in regulated industries:
- Healthcare (HIPAA): 6-year minimum retention on any PHI-bearing record, including the model context that reasoned over it.
- Finance (SEC 17a-4, MiFID II): 5-7 year retention on all communications with a client, including AI-assisted drafting.
- Legal (privilege / e-discovery): communications privilege only survives if the confidentiality chain is intact.
The same adversary who is recording your classical TLS session today - harvest-now-decrypt-later - is also recording the residual state of your inference servers. A PQC envelope around the KV cache is what keeps that state confidential past the arrival of a cryptographically relevant quantum computer.
Examples
examples/basic_kv_encryption.py- single tenant, encrypt/decrypt 3 entries, inspect audit log.examples/multi_tenant_isolation.py- Alice and Bob co-resident, cross-tenant decrypt is rejected.examples/key_rotation.py-KeyRotationPolicywithmax_entries=5, observe rotation mid-stream.
License
Apache License 2.0 - see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pqc_kv_cache_encryption-0.1.0.tar.gz.
File metadata
- Download URL: pqc_kv_cache_encryption-0.1.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca3e0d7be631b28772e1f14474ab1a0a875688dd64fbe53e1c7ead8ec29f4d2d
|
|
| MD5 |
65f1f1d76d653a67e1d02175ab4d5f73
|
|
| BLAKE2b-256 |
4c1820906b27a4b753eab7cd46cecdb12d01a8f2428f205b0ac9e734d594d5e3
|
File details
Details for the file pqc_kv_cache_encryption-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pqc_kv_cache_encryption-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f2fc3bd544f4ef9874b4d3417dfa96b5529c9803504833e6d49d0ceb05804fd
|
|
| MD5 |
ee2e7d4874a630cc4cf0180bede7d7fa
|
|
| BLAKE2b-256 |
33235ca8cd1bbc78481226348eb45c977f726a3be406492d4fa638e68c5d9399
|