Quantum-Safe Columnar Storage Format with row-granular lazy decryption
Project description
QPQT - Quantum-Safe Columnar Storage Format
A purpose-built binary columnar file format (.qpqt) with native
post-quantum cryptography and row-granular lazy decryption, a capability
no existing columnar format offers.
Cryptographic stack: ML-KEM-768 (FIPS 203) + HKDF-SHA-256 + AES-256-GCM (FIPS 197)
Quick Start
pip install qpqt
import qpqt
# Generate a quantum-safe keypair
pub, sec = qpqt.keygen()
kid = qpqt.generate_key_id()
# Encrypt - ssn column is ML-KEM-768 + AES-256-GCM protected
w = qpqt.Writer("customers.qpqt",
column_names=["id", "state", "ssn"],
column_types=["int32", "string", "string"],
pqc_columns=["ssn"],
public_key=pub, key_id=kid)
w.write_batch({"id":[1,2,3], "state":["CA","NY","TX"], "ssn":["111","222","333"]}, 3)
w.close()
# Read - lazy decryption, only matching rows decrypted
r = qpqt.Reader("customers.qpqt")
r.set_secret_key(sec)
data = r.query(where={"id": 2})
The wheel bundles liboqs and OpenSSL. No system dependencies needed.
The Problem
Enterprises face a dual mandate: regulatory pressure to adopt post-quantum cryptography (CNSA 2.0, NIST FIPS 203, deadline 2035) and the need to maintain query performance on large-scale columnar data warehouses.
The naive approach - applying ML-KEM-768 at the row level - costs 9,600ms for 1M rows even with 4-core parallelization. That establishes the upper bound of the problem: PQC done wrong is unusable at analytical query scale.
The Solution
QPQT redesigns the storage format around PQC cost:
-
Hybrid KEM construction - ML-KEM-768 is used once per 4,096-row page to encapsulate an AES-256-GCM page key. This reduces KEM operations from 1M to 250 per million rows.
-
Fully separated column sections - structural (unencrypted) and PQC columns are physically isolated on disk at 4KB OS page boundaries. Predicates run on structural columns without loading the PQC section into CPU cache.
-
Row-granular lazy decryption - predicates execute on cheap structural columns first. Only the individual rows that survive the predicate trigger KEM decapsulation and AES-GCM decryption.
-
O(1) manifest lookup - a flat crypto manifest in the footer maps any row to its page key via pointer arithmetic.
Performance - Honest Three-Baseline Comparison
Benchmarked on Kaggle Xeon CPU (4 cores), 1M rows, real ML-KEM-768 + AES-256-GCM.
Two baselines are measured, not estimated:
- Naive per-row PQC - row-level ML-KEM encapsulation. Establishes the upper
bound of the problem. This is what a quick
liboqsintegration produces. - Competent per-page PQC - the correct hybrid KEM construction (per-page ML-KEM + AES-GCM, exactly like QPQT) but stored in a plain layout with no column separation and no lazy decryption. Decrypts every row in the queried column because decryption is chunk-granular. This isolates QPQT's actual contribution.
| Selectivity | Naive per-row | Competent per-page | QPQT | QPQT vs competent |
|---|---|---|---|---|
| 1% | 9,600ms | 2,150ms | 78ms | 27.6x |
| 5% | 9,600ms | 2,111ms | 163ms | 12.9x |
| 10% | 9,600ms | 2,113ms | 264ms | 8.0x |
| 25% | 9,600ms | 2,103ms | 557ms | 3.8x |
| 50% | 9,600ms | 2,148ms | 1,055ms | 2.0x |
| 100% | 9,600ms | 2,147ms | 2,098ms | 1.02x (no advantage) |
QPQT's contribution is row-granular lazy decryption. At low selectivity - the common case for analytical queries - it decrypts far fewer rows than a competent columnar-unaware implementation, giving 8-27x. As selectivity approaches 100%, the advantage shrinks to parity. At 100% selectivity QPQT offers no advantage over competent per-page PQC - when every row survives the predicate, there is nothing to skip.
| Metric | Value |
|---|---|
| Write throughput (1M rows) | 534K rows/sec (1,871ms) |
| Structural scan (no crypto) | 5ms, 188M rows/sec |
| File size (1M rows) | 80MB |
| Storage vs naive per-row ML-KEM | 80MB vs ~1,084MB (92% reduction) |
Cryptographic Design
ML-KEM-768 keypair -> secret key stored in KMS (file holds only key_id)
|
Per page (4,096 rows):
ML-KEM-768 encapsulate(public_key)
|-- kem_ciphertext -> CRYPTO MANIFEST
+-- shared_secret (32 bytes)
|
HKDF-SHA-256(shared_secret, page_context)
+-- aes_page_key (32 bytes, unique per page)
|
AES-256-GCM per row
|-- IV (12B, deterministic)
|-- ciphertext (= plaintext length)
+-- auth_tag (16B, tamper detection)
IV construction and GCM nonce safety
QPQT uses deterministic AES-GCM IVs. This is safe because nonce uniqueness is
guaranteed within every key scope. Each 4,096-row page derives its own unique
AES-256 key via ML-KEM encapsulation + HKDF-SHA-256. The IV only needs to be
unique under a given key, and within a single page key the (row_index, column_index) tuple is unique by construction. The file_uuid component
prevents cross-file collision in the event a page key is ever reused across
files. There is no nonce reuse under any single key - the failure mode that
breaks GCM does not occur.
All components are NIST-approved and quantum-safe:
- ML-KEM-768: FIPS 203 (replaces RSA/ECDH for key establishment)
- AES-256-GCM: FIPS 197 (quantum-safe symmetrically; Grover's only halves the effective key strength, leaving 128-bit security)
- HKDF-SHA-256: RFC 5869 / SP 800-56C
Why a Separate Format (and not Parquet)?
Parquet already has Modular Encryption - why not derive its AES key from ML-KEM and get quantum-safe Parquet today?
For encryption alone, you could. The encryption is not the contribution.
The contribution is row-granular lazy decryption. Parquet supports predicate pushdown and can skip entire encrypted column chunks via footer statistics. What it cannot do is decrypt only the surviving rows within a chunk that the predicate did not eliminate wholesale. Parquet decrypts at chunk granularity, not surviving-row granularity. Closing that gap requires physically separated structural columns and a per-row-addressable key manifest - a different file layout.
The three conditions no existing format satisfies simultaneously:
- Structural columns physically separated from encrypted columns at OS-page boundaries, so the filter never pages the encrypted section into cache.
- Every row's decryption key addressable in O(1) without decrypting anything first - the flat manifest in the footer.
- Decryption expressible at single-row granularity within a page. Parquet treats the chunk as an atomic encrypted unit.
The idea is simple. The format that makes it executable is the contribution.
File Format
+-----------------------------------------------------+
| FILE HEADER (48 bytes) |
| magic + version + file_uuid + total_rows + offsets |
+-----------------------------------------------------+
| SCHEMA BLOCK (variable) |
+-----------------------------------------------------+
| KEY REFERENCE BLOCK (32 bytes) - key_id, not the key|
+-----------------------------------------------------+
| ROW GROUP 0 (100,000 rows) |
| |-- SECTION 1: Structural columns (unencrypted) |
| | [tightly packed, padded to 4KB boundary] |
| +-- SECTION 2: PQC columns (AES-256-GCM per row) |
| [starts on 4KB OS page boundary] |
+-----------------------------------------------------+
| ROW GROUP 1 ... N |
+-----------------------------------------------------+
| FILE FOOTER |
| |-- Row group offset table |
| |-- CRYPTO MANIFEST (flat array, O(1) lookup) |
| +-- FOOTER HEADER (40 bytes) + CRC32 |
+-----------------------------------------------------+
Key Management
# Python
pub, sec = qpqt.keygen()
kid = qpqt.generate_key_id()
# CLI (build from source)
./qpqt keygen --out-pub pub.bin --out-sec sec.bin
- Public key (1184 bytes) - safe to share with writers.
- Secret key (2400 bytes) - never share, never commit.
- Key ID (16 bytes) - stored in the file header, not the key itself.
If you lose the secret key, data encrypted with its public key is permanently unrecoverable.
| Environment | Recommended key storage |
|---|---|
| Local dev | Outside repo, e.g. ~/.qpqt/keys/ |
| AWS | AWS KMS + Secrets Manager |
| Azure | Azure Key Vault |
| GCP | Cloud KMS |
| Databricks | dbutils.secrets |
| On-premise | HashiCorp Vault or HSM |
Key rotation never requires rewriting existing data files - QPQT stores a
key_id reference in the header, not the key itself.
Build from Source
For CLI usage or contributing:
# Prerequisites: Ubuntu 22.04+, CMake 3.16+, C++17, OpenSSL dev headers
bash scripts/install_deps.sh # builds liboqs from source
mkdir build && cd build
cmake .. && make -j$(nproc)
./qpqt_tests # run all 39 tests
./qpqt_bench # reproduce the benchmark table
Ecosystem Integration
| Tool | How |
|---|---|
| Python / pandas | pip install qpqt |
| CLI | qpqt encrypt/decrypt/inspect on CSV or Parquet (build from source) |
| DuckDB / Polars / Spark | qpqt_arrow export produces structural columns as Arrow IPC |
Roadmap
- v0.1 (current): PyPI wheel, full crypto stack, CLI, Python bindings, Arrow export, 39 tests
- v0.2: pandas
read_qpqt/to_qpqtone-liners, Parquet read/write in CLI, DuckDB recipe - v1.0: Spark DataSource connector, ML-DSA-65 metadata signatures, threat model doc
- v2.0: Distributed operation, S3/Azure direct integration
License
MIT
Author
Rohan Prabhakar
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qpqt-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: qpqt-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6b90e2d5a7c32550abf8e5b532dfeeb7f9e5abe91c4365e46c90fed0f46c931
|
|
| MD5 |
e02ab4b2c4b7ed9d5ffa5a595e70851e
|
|
| BLAKE2b-256 |
7fd22521daa10c5d42d3a8a18dce766510915d315018a9ea4aaecad66dd44253
|
Provenance
The following attestation bundles were made for qpqt-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl:
Publisher:
wheels.yml on Rohan-Prabhakar/QPQT
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qpqt-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl -
Subject digest:
a6b90e2d5a7c32550abf8e5b532dfeeb7f9e5abe91c4365e46c90fed0f46c931 - Sigstore transparency entry: 1712326658
- Sigstore integration time:
-
Permalink:
Rohan-Prabhakar/QPQT@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Rohan-Prabhakar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Trigger Event:
push
-
Statement type:
File details
Details for the file qpqt-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: qpqt-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfd733d83515bb2e97ab919d994fe9265df0b657b6170ed13c82d5940c387405
|
|
| MD5 |
4b3e535e2a695603da3e2b79d02d6c63
|
|
| BLAKE2b-256 |
5619fef1805e011bb265be5c22278a0eecb62761465923ca8220982bc67c2ba6
|
Provenance
The following attestation bundles were made for qpqt-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl:
Publisher:
wheels.yml on Rohan-Prabhakar/QPQT
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qpqt-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl -
Subject digest:
bfd733d83515bb2e97ab919d994fe9265df0b657b6170ed13c82d5940c387405 - Sigstore transparency entry: 1712326760
- Sigstore integration time:
-
Permalink:
Rohan-Prabhakar/QPQT@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Rohan-Prabhakar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Trigger Event:
push
-
Statement type:
File details
Details for the file qpqt-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: qpqt-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
115731aa1cb621054b15999b2a0a8d8c35d0dc9a5c333a854426dd295e24da00
|
|
| MD5 |
597e822eb147a3310eb31c30fa4b0d1e
|
|
| BLAKE2b-256 |
bd6eee0071cb72a950b88840f0f7d4ee3920968984b3f6db8278917e326c6d0c
|
Provenance
The following attestation bundles were made for qpqt-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl:
Publisher:
wheels.yml on Rohan-Prabhakar/QPQT
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qpqt-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl -
Subject digest:
115731aa1cb621054b15999b2a0a8d8c35d0dc9a5c333a854426dd295e24da00 - Sigstore transparency entry: 1712327090
- Sigstore integration time:
-
Permalink:
Rohan-Prabhakar/QPQT@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Rohan-Prabhakar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Trigger Event:
push
-
Statement type:
File details
Details for the file qpqt-0.2.0-cp39-cp39-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: qpqt-0.2.0-cp39-cp39-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.9, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
688995d2c72a9f7b27a5a58f6485cf7428778fb16928119bd6ba406ad7c7da57
|
|
| MD5 |
3fda0bcffbfe19220e48c23da6ef992d
|
|
| BLAKE2b-256 |
0b5e5290f46a437b3dcc6a5fff1a7a0b238780e22c84c08dfecb01d0c9c639a5
|
Provenance
The following attestation bundles were made for qpqt-0.2.0-cp39-cp39-manylinux_2_28_x86_64.whl:
Publisher:
wheels.yml on Rohan-Prabhakar/QPQT
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qpqt-0.2.0-cp39-cp39-manylinux_2_28_x86_64.whl -
Subject digest:
688995d2c72a9f7b27a5a58f6485cf7428778fb16928119bd6ba406ad7c7da57 - Sigstore transparency entry: 1712326169
- Sigstore integration time:
-
Permalink:
Rohan-Prabhakar/QPQT@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Rohan-Prabhakar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Trigger Event:
push
-
Statement type:
File details
Details for the file qpqt-0.2.0-cp38-cp38-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: qpqt-0.2.0-cp38-cp38-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.8, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
281b11cf9de5cfd41abe6abf26c61a251a73ea134c57485b06c8589c097d0ed0
|
|
| MD5 |
a82e1efe587eb4d7d60f89c53e2a58fe
|
|
| BLAKE2b-256 |
de9b484d544a6a3a5b65549ec6eb72b866d6a64dbcdf15c1238c02a7bc9ceced
|
Provenance
The following attestation bundles were made for qpqt-0.2.0-cp38-cp38-manylinux_2_28_x86_64.whl:
Publisher:
wheels.yml on Rohan-Prabhakar/QPQT
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qpqt-0.2.0-cp38-cp38-manylinux_2_28_x86_64.whl -
Subject digest:
281b11cf9de5cfd41abe6abf26c61a251a73ea134c57485b06c8589c097d0ed0 - Sigstore transparency entry: 1712326311
- Sigstore integration time:
-
Permalink:
Rohan-Prabhakar/QPQT@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Rohan-Prabhakar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@eb42edb89e0bb99694962ba8e5b7adb0c5acdb0a -
Trigger Event:
push
-
Statement type: