PQC-native per-tenant KV cache encryption for multi-tenant LLM inference. ML-KEM-768 derived session keys, AES-256-GCM per-entry encryption, automatic key rotation, tenant isolation enforcement.

These details have not been verified by PyPI

Project description

PQC KV Cache Encryption

PQC Native ML-KEM-768 AES-256-GCM License Version

Per-tenant, quantum-safe encryption for the LLM KV cache. Multi-tenant inference servers store gigabytes of KV cache in shared host/device RAM. A side-channel or a compromised co-tenant can lift another user's private conversation state directly out of that cache. This library wraps every KV cache entry in a fresh AES-256-GCM envelope whose key is derived per session via ML-KEM-768, enforces strict tenant isolation at the cryptographic boundary, rotates keys on a configurable policy, and ships with an append-only audit log for every encrypt / decrypt / rotate / isolation-violation event.

The Problem

Long-context LLM inference keeps past token activations in the KV cache - a per-layer, per-position tensor store that can run to multiple GB. On a multi-tenant inference server (vLLM, TGI, or any production stack sharing a GPU across requests) that cache sits in plaintext process memory:

Side-channel reads. A malicious co-tenant with timing or page-table-based primitives can read another tenant's cache pages.
Cross-request leakage. A bug in cache eviction or session routing can hand one tenant's intermediate state to another.
Harvest-now-decrypt-later. Even if host-level encryption is on, classical key exchange (ECDH) recorded today is broken by a future CRQC.
Regulated workloads. Healthcare, finance, and legal inference pipelines have 7+ year retention requirements on conversation state; classical confidentiality alone no longer clears the audit bar.

The Solution

ML-KEM-768 derives a fresh 32-byte symmetric key per TenantSession. In production the tenant presents a KEM public key and the inference server runs Encapsulate; here we delegate to quantumshield.
AES-256-GCM encrypts every KVCacheEntry. One nonce per entry, AAD binds EntryMetadata + sequence_number + key_len so tampering with layer/position/sequence surfaces as a DecryptionError.
TenantIsolationManager holds a session per tenant and refuses cross-tenant decrypts even when asked explicitly; a misrouted ciphertext raises TenantIsolationError before AES touches the bytes.
KeyRotationPolicy rotates the per-session key after N entries or T seconds, resetting the sequence counter.
KVAuditLog is append-only and records encrypt, decrypt, rotate, and isolation-violation events.

Installation

pip install pqc-kv-cache-encryption

Development:

pip install -e ".[dev]"

Quick Start

import os

from pqc_kv_cache import (
    CacheDecryptor,
    CacheEncryptor,
    EntryMetadata,
    KVCacheEntry,
    TenantIdentity,
    establish_tenant_session,
)

# 1. Establish a per-tenant session (ML-KEM-768 derived AES-256-GCM key).
tenant = TenantIdentity(tenant_id="tenant-alice", display_name="Alice Corp")
session = establish_tenant_session(tenant)

# 2. Wrap a KV cache entry in a signed envelope.
meta = EntryMetadata(
    tenant_id=tenant.tenant_id,
    session_id=session.session_id,
    layer_idx=0,
    position=12,
    token_id=2048,
)
entry = KVCacheEntry(
    metadata=meta,
    key_tensor_bytes=os.urandom(64),   # raw bytes of K vector
    value_tensor_bytes=os.urandom(64), # raw bytes of V vector
)
enc = CacheEncryptor(session).encrypt_entry(entry)

# 3. Decrypt with the same session. AES-GCM verifies AAD, tenant, replay.
decrypted = CacheDecryptor(session).decrypt_entry(enc)
assert decrypted.key_tensor_bytes == entry.key_tensor_bytes

Multi-tenant with strict isolation:

from pqc_kv_cache import TenantIsolationManager, TenantIsolationError

mgr = TenantIsolationManager()
mgr.create_session(TenantIdentity(tenant_id="tenant-alice"))
mgr.create_session(TenantIdentity(tenant_id="tenant-bob"))

alice_enc = mgr.encrypt("tenant-alice", alice_entry)

# Bob can NEVER decrypt Alice's entry, even when using his own valid session.
try:
    mgr.decrypt("tenant-bob", alice_enc)
except TenantIsolationError:
    print("blocked at the isolation boundary")

Architecture

+-----------------------------+              +-----------------------------+
|  Tenant Alice               |              |  Tenant Bob                 |
|  (client)                   |              |  (client)                   |
+--------------+--------------+              +--------------+--------------+
               |                                             |
               |  ML-KEM-768 handshake (per session)         |
               v                                             v
+---------------------------------------------------------------------------+
|                   Inference Server (multi-tenant)                         |
|                                                                           |
|  TenantIsolationManager                                                   |
|    +------------------------+        +------------------------+           |
|    | TenantSession (alice)  |        | TenantSession (bob)    |           |
|    |   symmetric_key (32B)  |        |   symmetric_key (32B)  |           |
|    |   next_sequence        |        |   next_sequence        |           |
|    |   entries_encrypted    |        |   entries_encrypted    |           |
|    +----------+-------------+        +----------+-------------+           |
|               |                                 |                         |
|               v                                 v                         |
|    CacheEncryptor / CacheDecryptor   CacheEncryptor / CacheDecryptor      |
|       AES-256-GCM + AAD                 AES-256-GCM + AAD                 |
|       + tenant-id enforcement           + tenant-id enforcement           |
|               |                                 |                         |
|               v                                 v                         |
|    +---------------------+        +---------------------+                 |
|    | EncryptedEntry      |        | EncryptedEntry      |                 |
|    |  (alice ciphertext) |        |  (bob ciphertext)   |                 |
|    +---------+-----------+        +---------+-----------+                 |
|              |                              |                             |
|              +-----------+------------------+                             |
|                          v                                                |
|             +---------------------------+                                 |
|             |  KV cache in GPU/host RAM |  (only ciphertext lives here)   |
|             +---------------------------+                                 |
|                                                                           |
|  KeyRotationPolicy  -- rotates session keys on entry count / age          |
|  KVAuditLog         -- encrypt / decrypt / rotate / isolation-violation   |
+---------------------------------------------------------------------------+

Cryptography

Primitive	Purpose	Algorithm
Per-session key	Fresh 32-byte symmetric key per tenant session	ML-KEM-768
Per-entry encryption	Confidentiality + integrity of K/V tensor bytes	AES-256-GCM
AAD binding	`EntryMetadata` + `sequence_number` + `key_len` -> tag	AES-GCM tag
Session-key derivation	SHA3-256 over KEM keypair bytes (production: Decapsulate)	SHA3-256

Signing and KEM keys are delegated to quantumshield, which prefers real liboqs ML-KEM / ML-DSA when available and falls back to a transitional backend otherwise.

Threat Model

Adversary capability	Coverage
Read KV cache pages for another tenant	All entries are AES-256-GCM encrypted; attacker sees only ciphertext.
Replay a previously captured `EncryptedEntry`	`CacheDecryptor` tracks seen nonces and raises `NonceReplayError`.
Tamper with `EntryMetadata` (layer_idx, position, tenant_id)	AAD binding -> AES-GCM tag fails -> `DecryptionError`.
Submit another tenant's ciphertext through a valid session	`TenantIsolationError` raised before AES touches bytes.
Long-lived session key exposure	`KeyRotationPolicy` rotates on entry-count / age; sequence counter resets.
Session outlives its TTL	`SessionExpiredError` on every encrypt/decrypt after `expires_at`.
Harvest-now-decrypt-later on the KEM handshake	ML-KEM-768 provides IND-CCA2 security under quantum adversaries.
Orphaned tenant state after disconnect	`close_session()` drops the session and its key from memory.

Performance Considerations

This library is written in pure Python and is intended as the cryptographic envelope for multi-tenant LLM inference, not a hot-path encryption kernel. Production deployments wrap the same patterns in:

A CUDA / ROCm kernel that operates on the K/V tensors in device memory.
A driver-side AES-GCM engine (H100 confidential compute, AMD SEV-SNP).
A batched nonce / sequence allocator to amortize session bookkeeping across a batch of requests.

The envelope formats (EncryptedEntry, AAD shape, TenantSession state machine) are deliberately portable so that the native kernel and the Python reference implementation produce interoperable ciphertexts.

API Reference

`TenantIdentity`

tenant_id: str, display_name: str = "" — frozen dataclass identifying a tenant.

`establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession`

Derive a fresh 32-byte symmetric key for tenant via ML-KEM-768 and return a TenantSession.

`TenantSession`

Holds symmetric_key, next_sequence, entries_encrypted, created_at, expires_at. Methods: is_valid(), check_valid(), consume_sequence(), rotate_key(new_key), to_public_dict().

`KVCacheEntry` / `EncryptedEntry` / `EntryMetadata`

KVCacheEntry holds metadata, key_tensor_bytes, value_tensor_bytes. EncryptedEntry holds metadata, nonce (hex), ciphertext (hex), key_len, sequence_number. EntryMetadata is frozen and carries tenant_id, session_id, layer_idx, position, token_id, kv_role.

`CacheEncryptor(session)` / `CacheDecryptor(session)`

encrypt_entry(KVCacheEntry) -> EncryptedEntry and decrypt_entry(EncryptedEntry) -> KVCacheEntry. Both enforce tenant-id match. Decryptor tracks nonces for replay protection.

`KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)`

should_rotate(session) -> (bool, RotationTrigger | None) and rotate(session) -> bytes (new 32-byte key). RotationTrigger is ENTRY_COUNT, TIME_ELAPSED, or MANUAL.

`TenantIsolationManager`

create_session(tenant), get_session(tenant_id), encrypt(tenant_id, entry), decrypt(tenant_id, enc), close_session(tenant_id), list_active_tenants().

`KVAuditLog` / `KVAuditEntry`

log_encrypt(...), log_decrypt(...), log_rotate(...), log_isolation_violation(...), entries(limit, tenant_id, operation), export_json().

Errors

All under KVCacheError: TenantIsolationError, SessionExpiredError, DecryptionError, NonceReplayError, KeyRotationRequiredError, UnknownTenantError.

Why PQC Matters for the KV Cache

Inference logs and intermediate conversation state are retained for 7+ years in regulated industries:

Healthcare (HIPAA): 6-year minimum retention on any PHI-bearing record, including the model context that reasoned over it.
Finance (SEC 17a-4, MiFID II): 5-7 year retention on all communications with a client, including AI-assisted drafting.
Legal (privilege / e-discovery): communications privilege only survives if the confidentiality chain is intact.

The same adversary who is recording your classical TLS session today - harvest-now-decrypt-later - is also recording the residual state of your inference servers. A PQC envelope around the KV cache is what keeps that state confidential past the arrival of a cryptographically relevant quantum computer.

Examples

examples/basic_kv_encryption.py - single tenant, encrypt/decrypt 3 entries, inspect audit log.
examples/multi_tenant_isolation.py - Alice and Bob co-resident, cross-tenant decrypt is rejected.
examples/key_rotation.py - KeyRotationPolicy with max_entries=5, observe rotation mid-stream.

License

Apache License 2.0 - see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqc_kv_cache_encryption-0.1.0.tar.gz (19.1 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pqc_kv_cache_encryption-0.1.0-py3-none-any.whl (17.4 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file pqc_kv_cache_encryption-0.1.0.tar.gz.

File metadata

Download URL: pqc_kv_cache_encryption-0.1.0.tar.gz
Upload date: Apr 20, 2026
Size: 19.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pqc_kv_cache_encryption-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ca3e0d7be631b28772e1f14474ab1a0a875688dd64fbe53e1c7ead8ec29f4d2d`
MD5	`65f1f1d76d653a67e1d02175ab4d5f73`
BLAKE2b-256	`4c1820906b27a4b753eab7cd46cecdb12d01a8f2428f205b0ac9e734d594d5e3`

See more details on using hashes here.

File details

Details for the file pqc_kv_cache_encryption-0.1.0-py3-none-any.whl.

File metadata

Download URL: pqc_kv_cache_encryption-0.1.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 17.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pqc_kv_cache_encryption-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f2fc3bd544f4ef9874b4d3417dfa96b5529c9803504833e6d49d0ceb05804fd`
MD5	`ee2e7d4874a630cc4cf0180bede7d7fa`
BLAKE2b-256	`33235ca8cd1bbc78481226348eb45c977f726a3be406492d4fa638e68c5d9399`

See more details on using hashes here.

pqc-kv-cache-encryption 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PQC KV Cache Encryption

The Problem

The Solution

Installation

Quick Start

Architecture

Cryptography

Threat Model

Performance Considerations

API Reference

TenantIdentity

establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession

TenantSession

KVCacheEntry / EncryptedEntry / EntryMetadata

CacheEncryptor(session) / CacheDecryptor(session)

KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)

TenantIsolationManager

KVAuditLog / KVAuditEntry

Errors

Why PQC Matters for the KV Cache

Examples

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`TenantIdentity`

`establish_tenant_session(tenant, algorithm=KEMAlgorithm.ML_KEM_768, ttl_seconds=900) -> TenantSession`

`TenantSession`

`KVCacheEntry` / `EncryptedEntry` / `EntryMetadata`

`CacheEncryptor(session)` / `CacheDecryptor(session)`

`KeyRotationPolicy(max_entries=100_000, max_age_seconds=300)`

`TenantIsolationManager`

`KVAuditLog` / `KVAuditEntry`