Skip to main content

Blazingly fast DataFrame library with Parquet encryption support (AES-256-GCM), not production ready

Project description

polars-parquet-encrypt

Blazingly fast DataFrame library with Parquet encryption support

This package is a full replacement for Polars with built-in AES-256-GCM page-level encryption for Parquet files.

⚠️ Not production ready - This is a test/research package

Why This Package?

The official PyPI polars package doesn't include encryption support. This package provides:

  • Full Polars functionality - Everything from standard Polars
  • Encryption built-in - No need to build from source
  • Drop-in replacement - Just pip install and use

Installation

pip install polars-parquet-encrypt

That's it! No Rust toolchain, no maturin, no source builds required.

Usage

Basic Encryption/Decryption

import polars as pl
import os

# Generate 32-byte key for AES-256
key = os.urandom(32)

# Write encrypted parquet file
df = pl.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "salary": [50000, 60000, 75000, 80000, 95000]
})

df.write_parquet("encrypted.parquet", encryption_key=key)

# Read encrypted parquet file
df_read = pl.read_parquet("encrypted.parquet", encryption_key=key)
print(df_read)

Lazy Scanning with Encryption

# Lazy scan with encryption
lf = pl.scan_parquet("encrypted.parquet", encryption_key=key)
result = lf.filter(pl.col("salary") > 70000).collect()
print(result)

Cloud Storage (Azure, S3, etc.)

# Works with cloud storage too
storage_options = {"account_name": "myaccount"}

df.write_parquet(
    "abfs://container/encrypted.parquet",
    encryption_key=key,
    storage_options=storage_options
)

df_read = pl.read_parquet(
    "abfs://container/encrypted.parquet",
    encryption_key=key,
    storage_options=storage_options
)

Security Features

Encryption

  • Algorithm: AES-256-GCM (authenticated encryption)
  • Key size: Exactly 32 bytes (256 bits)
  • Nonce: Unique 12-byte random nonce per page
  • Authentication tag: 16-byte GCM tag for integrity
  • Format: [nonce(12) | ciphertext | tag(16)] per page

What's Encrypted

  • Data pages: All column values encrypted
  • Dictionary pages: Dictionary-encoded values encrypted
  • Footer metadata: Schema, row counts, column names remain plaintext

Performance

Optimizations

  • Encryption context created once per column chunk (not per page)
  • In-place decryption using decrypt_in_place_detached()
  • Scratch buffer reused across all pages in column chunk
  • Zero-copy plaintext extraction with split_off()

Platform Support

Pre-built wheels available for:

  • macOS: ARM64 (Apple Silicon), x86_64 (Intel)
  • Linux: x86_64, ARM64 (aarch64)
  • Python: 3.10, 3.11, 3.12+

Requirements

  • Python: >= 3.10
  • Encryption key: Exactly 32 bytes for AES-256

License

MIT License - see LICENSE file for details.

Building from Source

Pre-built wheels are available on PyPI, but if you need to build from source:

macOS (Current Platform)

./quick-build.sh

Linux (Without Docker)

See BUILD-LINUX.md for complete instructions, or:

# On your Linux machine
./build-linux-native.sh

Quick reference: QUICK-START-LINUX.md

All Platforms

See BUILD.md for comprehensive build documentation.

Acknowledgments

Built on Polars - blazingly fast DataFrames in Rust and Python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_parquet_encrypt-0.2.0-cp310-abi3-manylinux_2_39_x86_64.whl (47.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.39+ x86-64

polars_parquet_encrypt-0.2.0-cp310-abi3-macosx_11_0_arm64.whl (43.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file polars_parquet_encrypt-0.2.0-cp310-abi3-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for polars_parquet_encrypt-0.2.0-cp310-abi3-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 ccf983b7359249afb539ae0201d96f818f294f5e74157095d27834803e52ba65
MD5 32049983492bf284a3e173f894ac7ea7
BLAKE2b-256 8961e43618c2788d285a45512002f5269adb6e00a1b2ce3e99bc85d2dd01b92c

See more details on using hashes here.

File details

Details for the file polars_parquet_encrypt-0.2.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_parquet_encrypt-0.2.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 99eed5700fb2542cbf1320a70c8bf99531f383173224878688b69a56af4f7bfe
MD5 d41ca7cfb216fefa6cb33f6bc4bdbef6
BLAKE2b-256 d5a88818ae24af0aaed181ab36531185c13bc752eadca0c00dce2e8859ad147d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page