SuperTable — versioned data lake library for SQL analytics on Parquet + Redis.
Project description
SuperTable
SuperTable — versioned data lake library for SQL analytics.
SuperTable stores structured data as immutable Parquet snapshots on object storage (S3, MinIO, Azure Blob, GCP Cloud Storage, or local disk), keeps metadata, locks, and audit state in Redis, and queries everything through DuckDB (embedded) or Spark SQL. It is a Python library — there is no separate server process.
Installation
pip install supertable # core + local storage
pip install "supertable[s3]" # AWS S3
pip install "supertable[minio]" # MinIO
pip install "supertable[azure]" # Azure Blob
pip install "supertable[gcp]" # Google Cloud Storage
pip install "supertable[all]" # everything
Requirements: Python 3.10+, a reachable Redis 6+, and a configured storage backend (or local disk for development). See docs/02_configuration.md for environment variables.
Architecture
┌──────────────────────────────────────────────────┐
│ Python application │
│ (notebooks, ETL jobs, FastAPI handlers, etc.) │
└──────────┬─────────────────────────┬──────────────┘
│ DataWriter / DataReader │
▼ ▼
┌───────────────┐ ┌────────────────────┐
│ RedisCatalog │ │ StorageInterface │
│ metadata │ │ Parquet files │
│ locks │ │ S3 / MinIO / │
│ audit chain │ │ Azure / GCP / │
└───────────────┘ │ Local │
└────────────────────┘
Data is organised as Organization → SuperTable → SimpleTable. Each
SimpleTable is a versioned, append-only collection of Parquet files
backed by a snapshot linked list — every write produces a new immutable
snapshot whose previous_snapshot points at the predecessor.
| Layer | Technology |
|---|---|
| Language | Python 3.10+ |
| Metadata store | Redis 6+ (standalone or Sentinel HA) |
| Query engine (primary) | DuckDB |
| Query engine (large) | Spark SQL via Thrift |
| Data format | Apache Parquet |
| Object storage | MinIO / S3 / Azure / GCP / local |
| Mirror formats | Delta Lake, Apache Iceberg, Parquet |
| Audit storage | Redis Streams + Parquet |
Quick example
from supertable import SuperTable, DataWriter, DataReader, engine
# Bootstrap catalogue + storage
SuperTable(super_name="example", organization="my-org")
# Write
dw = DataWriter(super_name="example", organization="my-org")
columns, rows, inserted, deleted = dw.write(
role_name="superadmin",
simple_name="facts",
data=arrow_table,
overwrite_columns=["day", "client"],
lineage={"source_type": "manual", "source_id": "my-job"},
)
# Read
dr = DataReader(
super_name="example",
organization="my-org",
query="SELECT day, sum(value) FROM facts GROUP BY day LIMIT 10",
)
df, status, message = dr.execute(role_name="superadmin", engine=engine.AUTO)
Demos
The package ships two runnable demos under supertable.demo:
# Numbered tutorial — runs the full lifecycle end-to-end.
supertable-demo-quickstart
# or
python -m supertable.demo.quickstart
# Synthetic webshop dataset.
supertable-demo-webshop-generate # build ~1.2M rows on disk
supertable-demo-webshop-load # load them into SuperTable
supertable-demo-webshop-topup # continuous incremental refresh
Both demos are also runnable as module steps. Examples:
python -m supertable.demo.quickstart.s01_01_01_create_super_table
python -m supertable.demo.quickstart.s03_08_read_snapshot_history
python -m supertable.demo.webshop.generate
See supertable/demo/README.md for the full script index.
What's included
- Versioned tables with snapshot isolation, upsert (
overwrite_columns), soft deletes (delete_only=True), schema evolution, and staleness filtering - DuckDB query engine — embedded, zero-copy reads from object storage
- Spark SQL via Thrift — for queries exceeding DuckDB memory limits
- RBAC — role types (superadmin, admin, writer, reader, meta) with row-level and column-level security enforced through view chains
- Audit logging — tamper-evident SHA-256 hash chain in Redis Streams with Parquet export
- Monitoring —
MonitoringWriterpushes read/write/metric payloads to Redis lists; structured JSON logging with correlation IDs - Ingestion — staging areas (
Staging) and automated ingestion pipes (SuperPipe) - Mirroring — optional Delta Lake / Iceberg / Parquet export after every write
- Snapshot history — every write chains to
previous_snapshot, enabling point-in-time inspection without separate historical tables
Documentation
See docs/00_index.md for the full table of contents.
| # | Document | Description |
|---|---|---|
| 01 | Platform Overview | Architecture, package layout, deployment, data flow |
| 02 | Configuration | Environment variables and runtime settings |
| 03 | Data Model | Organization → SuperTable → SimpleTable hierarchy |
| 04 | Storage Backends | StorageInterface, S3, MinIO, Azure, GCP, local |
| 05 | Redis Catalog | Metadata store, key naming, operations, CAS |
| 06 | Data Writer | Write pipeline, locking, dedup, tombstones |
| 07 | Ingestion & Pipes | Staging areas, automated ingestion pipes |
| 08 | Distributed Locking | Redis locks, file locks, deadlock prevention |
| 09 | Query Engine | DuckDB Lite/Pro, Spark SQL, auto selection |
| 10 | Data Reader | Read facade, snapshot history, view chain |
| 11 | RBAC & Access Control | Roles, users, row/column security |
| 12 | Audit Logging | SHA-256 hash chain, DORA/SOC 2, SIEM |
| 13 | Table Mirroring | Delta Lake, Iceberg, Parquet export |
| 14 | Monitoring | Metrics writer, structured logging |
| 15 | Python SDK | Core classes, demos, example index |
License
SuperTable is licensed under the Functional Source License, Version 1.1,
ALv2 Future License (FSL-1.1-ALv2).
You may use, copy, modify, create derivative works, publicly perform, publicly display, and redistribute the software for any permitted purpose other than a Competing Use.
A Competing Use means making the software available to others in a commercial product or service that:
- substitutes for SuperTable;
- substitutes for another product or service offered by Kladna Soft Kft. using SuperTable; or
- offers the same or substantially similar functionality as SuperTable.
Permitted purposes include internal use, non-commercial education, non-commercial research, and professional services provided to a licensee using the software in accordance with the license.
Each version of the software becomes available under the Apache License 2.0 on the second anniversary of the date that version is made available.
See LICENSE for the full license terms.
Copyright © 2024-2026 Kladna Soft Kft.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file supertable-2.1.1.tar.gz.
File metadata
- Download URL: supertable-2.1.1.tar.gz
- Upload date:
- Size: 417.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bdd25c7739aa9e29635dc0b39822f4888af2a1864871e35e136be8d3d047968
|
|
| MD5 |
026c4c29902b449f5bc87f2204d144b9
|
|
| BLAKE2b-256 |
0cc6172dbc761319eaee4b6b08fa48868bc3cb5981e883ce60d9fe4e63b1f7ee
|
File details
Details for the file supertable-2.1.1-py3-none-any.whl.
File metadata
- Download URL: supertable-2.1.1-py3-none-any.whl
- Upload date:
- Size: 485.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
394bd137763af9d6d7c926d62fdb053a3c5b41cdd5bc35baedc0f2148ac6d269
|
|
| MD5 |
2964932fdfd70aab24119fd2ce0251ff
|
|
| BLAKE2b-256 |
b8d518b08f981f02969d577dccb9022819a4b6c516859f8ad5a6be28ddd772d6
|