Skip to main content

SuperTable — versioned data lake library for SQL analytics on Parquet + Redis.

Project description

SuperTable

Python License Version

SuperTable — versioned data lake library for SQL analytics.

SuperTable stores structured data as immutable Parquet snapshots on object storage (S3, MinIO, Azure Blob, GCP Cloud Storage, or local disk), keeps metadata, locks, and audit state in Redis, and queries everything through DuckDB (embedded) or Spark SQL. It is a Python library — there is no separate server process.

Installation

pip install supertable                # core + local storage
pip install "supertable[s3]"          # AWS S3
pip install "supertable[minio]"       # MinIO
pip install "supertable[azure]"       # Azure Blob
pip install "supertable[gcp]"         # Google Cloud Storage
pip install "supertable[all]"         # everything

Requirements: Python 3.10+, a reachable Redis 6+, and a configured storage backend (or local disk for development). See docs/02_configuration.md for environment variables.


Architecture

┌──────────────────────────────────────────────────┐
│                Python application                 │
│   (notebooks, ETL jobs, FastAPI handlers, etc.)   │
└──────────┬─────────────────────────┬──────────────┘
           │ DataWriter / DataReader │
           ▼                         ▼
   ┌───────────────┐        ┌────────────────────┐
   │  RedisCatalog │        │  StorageInterface  │
   │  metadata     │        │  Parquet files     │
   │  locks        │        │  S3 / MinIO /      │
   │  audit chain  │        │  Azure / GCP /     │
   └───────────────┘        │  Local             │
                            └────────────────────┘

Data is organised as Organization → SuperTable → SimpleTable. Each SimpleTable is a versioned, append-only collection of Parquet files backed by a snapshot linked list — every write produces a new immutable snapshot whose previous_snapshot points at the predecessor.

Layer Technology
Language Python 3.10+
Metadata store Redis 6+ (standalone or Sentinel HA)
Query engine (primary) DuckDB
Query engine (large) Spark SQL via Thrift
Data format Apache Parquet
Object storage MinIO / S3 / Azure / GCP / local
Mirror formats Delta Lake, Apache Iceberg, Parquet
Audit storage Redis Streams + Parquet

Quick example

from supertable import SuperTable, DataWriter, DataReader, engine

# Bootstrap catalogue + storage
SuperTable(super_name="example", organization="my-org")

# Write
dw = DataWriter(super_name="example", organization="my-org")
columns, rows, inserted, deleted = dw.write(
    role_name="superadmin",
    simple_name="facts",
    data=arrow_table,
    overwrite_columns=["day", "client"],
    lineage={"source_type": "manual", "source_id": "my-job"},
)

# Read
dr = DataReader(
    super_name="example",
    organization="my-org",
    query="SELECT day, sum(value) FROM facts GROUP BY day LIMIT 10",
)
df, status, message = dr.execute(role_name="superadmin", engine=engine.AUTO)

Demos

The package ships two runnable demos under supertable.demo:

# Numbered tutorial — runs the full lifecycle end-to-end.
supertable-demo-quickstart
# or
python -m supertable.demo.quickstart

# Synthetic webshop dataset.
supertable-demo-webshop-generate    # build ~1.2M rows on disk
supertable-demo-webshop-load        # load them into SuperTable
supertable-demo-webshop-topup       # continuous incremental refresh

Both demos are also runnable as module steps. Examples:

python -m supertable.demo.quickstart.s01_01_01_create_super_table
python -m supertable.demo.quickstart.s03_08_read_snapshot_history
python -m supertable.demo.webshop.generate

See supertable/demo/README.md for the full script index.


What's included

  • Versioned tables with snapshot isolation, upsert (overwrite_columns), soft deletes (delete_only=True), schema evolution, and staleness filtering
  • DuckDB query engine — embedded, zero-copy reads from object storage
  • Spark SQL via Thrift — for queries exceeding DuckDB memory limits
  • RBAC — role types (superadmin, admin, writer, reader, meta) with row-level and column-level security enforced through view chains
  • Audit logging — tamper-evident SHA-256 hash chain in Redis Streams with Parquet export
  • MonitoringMonitoringWriter pushes read/write/metric payloads to Redis lists; structured JSON logging with correlation IDs
  • Ingestion — staging areas (Staging) and automated ingestion pipes (SuperPipe)
  • Mirroring — optional Delta Lake / Iceberg / Parquet export after every write
  • Snapshot history — every write chains to previous_snapshot, enabling point-in-time inspection without separate historical tables

Documentation

See docs/00_index.md for the full table of contents.

# Document Description
01 Platform Overview Architecture, package layout, deployment, data flow
02 Configuration Environment variables and runtime settings
03 Data Model Organization → SuperTable → SimpleTable hierarchy
04 Storage Backends StorageInterface, S3, MinIO, Azure, GCP, local
05 Redis Catalog Metadata store, key naming, operations, CAS
06 Data Writer Write pipeline, locking, dedup, tombstones
07 Ingestion & Pipes Staging areas, automated ingestion pipes
08 Distributed Locking Redis locks, file locks, deadlock prevention
09 Query Engine DuckDB Lite/Pro, Spark SQL, auto selection
10 Data Reader Read facade, snapshot history, view chain
11 RBAC & Access Control Roles, users, row/column security
12 Audit Logging SHA-256 hash chain, DORA/SOC 2, SIEM
13 Table Mirroring Delta Lake, Iceberg, Parquet export
14 Monitoring Metrics writer, structured logging
15 Python SDK Core classes, demos, example index

License

SuperTable is licensed under the Functional Source License, Version 1.1, ALv2 Future License (FSL-1.1-ALv2).

You may use, copy, modify, create derivative works, publicly perform, publicly display, and redistribute the software for any permitted purpose other than a Competing Use.

A Competing Use means making the software available to others in a commercial product or service that:

  1. substitutes for SuperTable;
  2. substitutes for another product or service offered by Kladna Soft Kft. using SuperTable; or
  3. offers the same or substantially similar functionality as SuperTable.

Permitted purposes include internal use, non-commercial education, non-commercial research, and professional services provided to a licensee using the software in accordance with the license.

Each version of the software becomes available under the Apache License 2.0 on the second anniversary of the date that version is made available.

See LICENSE for the full license terms.

Copyright © 2024-2026 Kladna Soft Kft.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supertable-2.0.5.tar.gz (411.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

supertable-2.0.5-py3-none-any.whl (478.2 kB view details)

Uploaded Python 3

File details

Details for the file supertable-2.0.5.tar.gz.

File metadata

  • Download URL: supertable-2.0.5.tar.gz
  • Upload date:
  • Size: 411.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for supertable-2.0.5.tar.gz
Algorithm Hash digest
SHA256 6c5e5937aa42be101fcd2491d535b465e30f7efc9d0ff4c4cf88a4766240e660
MD5 58156645f61280a07454ae79eaa04af6
BLAKE2b-256 d14eb81c087fc1e33ce55d9739ef0b64e2ac759e131a8d7814131e4550a9f4fe

See more details on using hashes here.

File details

Details for the file supertable-2.0.5-py3-none-any.whl.

File metadata

  • Download URL: supertable-2.0.5-py3-none-any.whl
  • Upload date:
  • Size: 478.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for supertable-2.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a7bd3fca92aaa1803560c8e98ade7afd0eb08377bcf0b250cc65c88125cd1071
MD5 cb424b9bb5b342711b8aba7ad0591fcc
BLAKE2b-256 c9019e2ae57a7998ff8d88d5bc2a25454051ddbdf6fc98c379dcb999b030c5b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page