Parquet encryption support for Polars, AES-256-GCM page-level encryption, not-production ready
Project description
polars-parquet-encrypt
Parquet encryption support for Polars with AES-256-GCM page-level encryption.
Features
- AES-256-GCM encryption: Industry-standard authenticated encryption
- Page-level encryption: Each data and dictionary page encrypted independently
- Optimized performance:
- Context reuse per column chunk (1000× fewer allocations)
- In-place decryption with scratch buffer reuse (zero-copy plaintext extraction)
- Simple API: Easy-to-use
encryption_keyparameter - Cross-platform: Pre-built wheels for macOS (Intel & ARM) and Linux (x86_64 & ARM64)
Installation
pip install polars-parquet-encrypt
Usage
Basic Encryption/Decryption
import polars as pl
import os
# Generate 32-byte key for AES-256
key = os.urandom(32)
# Write encrypted parquet file
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5],
"name": ["Alice", "Bob", "Charlie", "David", "Eve"],
"salary": [50000, 60000, 75000, 80000, 95000]
})
df.write_parquet("encrypted.parquet", encryption_key=key)
# Read encrypted parquet file
df_read = pl.read_parquet("encrypted.parquet", encryption_key=key)
print(df_read)
Lazy Scanning with Encryption
# Lazy scan with encryption
lf = pl.scan_parquet("encrypted.parquet", encryption_key=key)
result = lf.filter(pl.col("salary") > 70000).collect()
print(result)
Multiple Row Groups
# Write with specific row group size
df.write_parquet(
"encrypted.parquet",
encryption_key=key,
row_group_size=1000 # Optimize for your workload
)
Security Features
Encryption
- Confidentiality: Page content encrypted with AES-256-GCM
- Integrity: GCM authentication tag (16 bytes) prevents tampering
- Unique nonces: Each page gets a random 12-byte nonce
- Format:
[nonce(12) | ciphertext | tag(16)]
What's Encrypted
- ✅ Data pages: All column values encrypted
- ✅ Dictionary pages: Dictionary-encoded values encrypted
- ❌ Footer metadata: Schema, row counts, column names remain unencrypted (Plaintext Footer Mode)
What's Protected
| Threat | Protected |
|---|---|
| Data confidentiality | ✅ Yes - AES-256-GCM encryption |
| Tampering detection | ✅ Yes - GCM authentication tag |
| Wrong key detection | ✅ Yes - Decryption fails with wrong key |
| Metadata leakage | ❌ No - Footer is plaintext |
| Page reordering | ⚠️ Limited - Empty AAD (no position binding) |
Performance
Optimizations
Write Path:
- Encryption context created once per column chunk (not per page)
- Eliminates per-page key cloning and context allocation
- Better CPU cache locality
Read Path:
- In-place decryption using
decrypt_in_place_detached() - Scratch buffer reused across all pages in column chunk
- Zero-copy plaintext extraction with
split_off() - 1999× fewer allocations, 1000× less memory copying
Overhead
File size overhead = 28 bytes × number of pages
Example:
- 100 MB file with 10,000 pages
- Overhead: 28 × 10,000 = 280 KB (~0.27% increase)
Requirements
- Python: >= 3.10
- Key size: Exactly 32 bytes (AES-256 only, AES-128/192 not supported)
- Polars: >= 0.20.0
Key Management
⚠️ Important: This library only handles encryption/decryption. You must:
- Generate secure random keys:
os.urandom(32)or proper KMS - Store keys securely (not in code or version control)
- Manage key distribution to authorized users
- Handle key rotation (requires rewriting files)
Example: Environment Variable
import os
# Store key as base64 in environment variable
import base64
# Generate and save key (one time)
key = os.urandom(32)
print(f"export PARQUET_KEY={base64.b64encode(key).decode()}")
# Load key from environment
key = base64.b64decode(os.environ["PARQUET_KEY"])
df.write_parquet("encrypted.parquet", encryption_key=key)
Platform Support
Pre-built wheels available for:
- macOS: ARM64 (Apple Silicon), x86_64 (Intel)
- Linux: x86_64, ARM64 (aarch64)
- Python: 3.10, 3.11, 3.12
For other platforms, installation will build from source (requires Rust toolchain).
Error Handling
try:
df = pl.read_parquet("encrypted.parquet", encryption_key=wrong_key)
except pl.ComputeError as e:
if "aead::Error" in str(e):
print("Wrong encryption key or corrupted data")
else:
raise
Technical Details
- Algorithm: AES-256-GCM (Galois/Counter Mode)
- Key size: 32 bytes (256 bits)
- Nonce size: 12 bytes (96 bits, random per page)
- Authentication tag: 16 bytes (128 bits)
- AAD: Empty (simplified approach, no ordinal tracking)
For more details, see PARQUET_ENCRYPTION_DESIGN.md
License
MIT License - see LICENSE file for details.
Contributing
Issues and pull requests welcome at: https://gitlab.com/anonym1/polars/-/issues
Acknowledgments
Built on top of Polars - blazingly fast DataFrames in Rust and Python.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_parquet_encrypt-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: polars_parquet_encrypt-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 178.7 kB
- Tags: CPython 3.10+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06b5c1dc986bce8033262cf7820a6011fe788f5b260b59da9850257b83158ece
|
|
| MD5 |
d292163db04911c82b47fae4d77120f8
|
|
| BLAKE2b-256 |
5f835bf62e1fa459409e75aeec30867d123d9b87e0a20e07cbd82883b77c4603
|
File details
Details for the file polars_parquet_encrypt-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: polars_parquet_encrypt-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 163.1 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40ce2d1d5a4da4b40ce9643f10ed238a869f411dd635424b1ce1aa672ca657ce
|
|
| MD5 |
13de52004fd870b6a005c729b1fe234f
|
|
| BLAKE2b-256 |
bc530a23656e9e427989728e24ed8881bb127315c953c344ca9103f321091600
|