Skip to main content

Django database backend powered by Apache Iceberg and Polars - bringing time travel, cloud-native storage, and blazing-fast analytics to Django

Project description

🚀 Django Iceberg: Django Database Backend for Apache Iceberg + Polars

License: MIT Python 3.12+ Django 6.0+ Apache Iceberg Polars

A revolutionary Django database backend that replaces traditional SQL databases with Apache Iceberg and Polars, bringing time travel, cloud-native storage, and blazing-fast analytics to your Django applications.

# Write normal Django code
class Article(TimeTravelMixin, models.Model):
    title = models.CharField(max_length=200)
    content = models.TextField()
    published_at = models.DateTimeField()

# Get superpowers: Query data from last week
Article.objects.as_of(timezone.now() - timedelta(days=7)).all()

# See complete history of any record
article.history()

# Run lightning-fast analytics with Polars
df = Article.objects.values('published_at', 'views').to_polars()
summary = df.groupby_dynamic('published_at', every='1w').agg(pl.col('views').sum())

Why Django Iceberg?

Traditional databases (PostgreSQL, MySQL) were designed 40+ years ago. Django Iceberg leverages modern data infrastructure to give Django apps:

  • 🕰️ Time Travel Queries - Query data as it existed at any point in history
  • ☁️ Cloud-Native Storage - Store your database on S3/GCS/Azure at 1/10th the cost
  • 10-100x Faster Analytics - Polars makes aggregations blazingly fast
  • 🔄 Zero-Downtime Schema Changes - Iceberg schema evolution without locks
  • 🔒 ACID on Object Storage - Full transactional guarantees on cloud storage
  • 📊 Open Table Format - Your data works with Spark, DuckDB, Trino, etc.
  • 🌍 Multi-Cloud Portable - No vendor lock-in, run anywhere

See WHY.md for the full story of why this is the future of databases.


Quick Start

Installation

pip install django-iceberg

Configuration

In your Django settings.py:

DATABASES = {
    "default": {
        "ENGINE": "polars_iceberg.backend",
        "WAREHOUSE": "data/warehouse",              # Local path or s3://bucket/path
        "CATALOG_URI": "sqlite:///data/catalog.db", # SQLite for local, REST for prod
        "NAMESPACE": "default",
    }
}

Run Migrations

python manage.py migrate

Start Building

# models.py
from django.db import models
from polars_iceberg.timetravel import TimeTravelMixin

class Order(TimeTravelMixin, models.Model):
    customer_email = models.EmailField()
    total = models.DecimalField(max_digits=10, decimal_places=2)
    created_at = models.DateTimeField(auto_now_add=True)

# Use it like any Django model
Order.objects.create(customer_email="alice@example.com", total=99.99)

# Plus time travel
orders_yesterday = Order.objects.as_of(timezone.now() - timedelta(days=1))

Features

Time Travel Queries

Query historical data without complex triggers or audit tables:

# As of specific timestamp
User.objects.as_of(datetime(2026, 1, 1)).filter(is_active=True)

# List all snapshots
snapshots = User.objects.snapshots()
for snap in snapshots:
    print(f"Snapshot {snap.snapshot_id} at {snap.committed_at}")

# Complete history of a record
user = User.objects.get(pk=123)
for version in user.history():
    print(f"{version.snapshot_timestamp}: {version.email}")

Cloud-Native Storage

Deploy with infinitely scalable object storage:

DATABASES = {
    "default": {
        "ENGINE": "polars_iceberg.backend",
        "WAREHOUSE": "s3://my-bucket/warehouse",
        "CATALOG_URI": "https://catalog.example.com",  # REST catalog
        "NAMESPACE": "production",
        "OPTIONS": {
            "s3.access-key-id": "AKIAIOSFODNN7EXAMPLE",
            "s3.secret-access-key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
            "s3.region": "us-west-2",
        }
    }
}

Supported backends:

  • AWS S3 / S3 Express One Zone
  • Google Cloud Storage
  • Azure Blob Storage
  • MinIO
  • Local filesystem (development)

Blazing Fast Analytics

Polars is 10-100x faster than Pandas for DataFrame operations:

# Export to Polars DataFrame
orders_df = Order.objects.values('customer_email', 'total', 'created_at').to_polars()

# Run complex aggregations at lightning speed
summary = (
    orders_df
    .groupby('customer_email')
    .agg([
        pl.col('total').sum().alias('total_spent'),
        pl.col('total').count().alias('order_count'),
        pl.col('created_at').max().alias('last_order'),
    ])
    .sort('total_spent', descending=True)
)

Schema Evolution

Add, remove, or change columns without downtime:

# Django migrations just work
class Migration(migrations.Migration):
    operations = [
        migrations.AddField(
            model_name='article',
            name='view_count',
            field=models.IntegerField(default=0),
        ),
    ]

Iceberg applies schema changes instantly without rewriting data.

ACID Transactions

Full ACID guarantees on object storage:

from django.db import transaction

with transaction.atomic():
    account.balance -= 100
    account.save()

    other_account.balance += 100
    other_account.save()

    # Both updates commit atomically via Iceberg snapshot

Application-Level Constraints

Foreign keys, unique constraints, and NOT NULL validation:

class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.PROTECT)
    order_number = models.CharField(max_length=20, unique=True)
    total = models.DecimalField(max_digits=10, decimal_places=2)

# Constraints enforced before writes
Order.objects.create(
    customer_id=999,  # Raises IntegrityError if customer doesn't exist
    order_number="ORD-123",  # Raises IntegrityError if duplicate
    total=None,  # Raises IntegrityError if NOT NULL
)

Architecture

Django ORM
    ↓
SQL Query (with parameters)
    ↓
QueryCompiler (SQL → Polars)
    ↓
Polars DataFrames (in-memory operations)
    ↓
IcebergManager (catalog + table operations)
    ↓
Apache Iceberg (ACID transactions)
    ↓
Parquet Files (S3 / GCS / Azure / Local)

Key Components:

  • query_compiler.py (713 lines) - Translates Django SQL to Polars operations
  • iceberg_manager.py (534 lines) - Manages Iceberg catalog and table I/O
  • base.py (521 lines) - Django database wrapper and cursor implementation
  • schema.py (431 lines) - Handles Django migrations and schema evolution
  • constraints.py (425 lines) - Application-level FK, unique, and NOT NULL validation
  • timetravel.py (275 lines) - Time travel QuerySet and Manager API

See polars_iceberg/CLAUDE.md for detailed architecture documentation.


Performance

Operation PostgreSQL Django Iceberg Speedup
SELECT (indexed) 5ms 3-8ms ~1x
Aggregation (10M rows) 2,000ms 50-200ms 10-40x
Time travel query N/A 10ms
Schema change 5,000ms (locks) <1ms (instant) 5000x
Storage cost/TB/month $200 $20 10x savings

Best for:

  • Read-heavy workloads with analytics
  • Time-series data (logs, events, metrics)
  • Compliance requiring audit trails
  • Multi-tenant SaaS applications
  • Cloud-native architectures

Not ideal for:

  • High-frequency trading (microsecond latency)
  • Write-heavy OLTP (>100K writes/sec)
  • Complex JOIN-heavy queries
  • Very small datasets (<1GB)

Use Cases

SaaS with Audit Requirements

Challenge: Healthcare app needs HIPAA-compliant audit trails.

Solution: Time travel provides complete history of all data changes. Query who accessed what and when without performance overhead.

# Compliance report: Show patient record at time of access
patient_at_access = Patient.objects.as_of(access_timestamp).get(pk=patient_id)

E-Commerce Analytics

Challenge: Daily sales reports on millions of orders.

Solution: Polars aggregates data 50x faster than traditional GROUP BY queries.

# Daily sales summary in seconds, not minutes
df = Order.objects.values('created_at', 'total').to_polars()
daily_sales = df.groupby_dynamic('created_at', every='1d').agg(pl.col('total').sum())

Multi-Tenant Application

Challenge: Isolated data per customer, efficient analytics per tenant.

Solution: Partition by tenant_id, scale compute independently from storage.

# Each tenant's data physically partitioned in Iceberg
class TenantData(models.Model):
    tenant_id = models.IntegerField(db_index=True)
    # ... other fields

# Fast per-tenant queries via Iceberg partition pruning
TenantData.objects.filter(tenant_id=42).count()  # Scans only tenant 42's files

Deployment

Local Development

# Use SQLite catalog and local filesystem
DATABASES = {
    "default": {
        "ENGINE": "polars_iceberg.backend",
        "WAREHOUSE": "data/warehouse",
        "CATALOG_URI": "sqlite:///data/catalog.db",
        "NAMESPACE": "default",
    }
}

Production on AWS

DATABASES = {
    "default": {
        "ENGINE": "polars_iceberg.backend",
        "WAREHOUSE": "s3://prod-data-lake/warehouse",
        "CATALOG_URI": "https://catalog.prod.example.com",  # REST catalog
        "NAMESPACE": "production",
        "OPTIONS": {
            "s3.region": "us-east-1",
            "s3.access-key-id": os.environ["AWS_ACCESS_KEY_ID"],
            "s3.secret-access-key": os.environ["AWS_SECRET_ACCESS_KEY"],
        }
    }
}

Docker Compose

version: '3.8'
services:
  django:
    build: .
    environment:
      - DATABASE_WAREHOUSE=s3://my-bucket/warehouse
      - DATABASE_CATALOG_URI=http://catalog:8080
    depends_on:
      - catalog

  catalog:
    image: apache/iceberg-rest-catalog:latest
    ports:
      - "8080:8080"

Comparison with Traditional Databases

Feature PostgreSQL MySQL MongoDB Django Iceberg
Django ORM Support ✅ Full ✅ Full ⚠️ Via ODM ✅ Full
Time Travel ⚠️ Change streams ✅ Native
Cloud-Native ⚠️ Partial ⚠️ Partial ⚠️ Partial ✅ Full
Schema Evolution 🐢 Slow 🐢 Slow ✅ Fast ✅ Instant
Analytics Performance 🐢 Moderate 🐢 Moderate 🐢 Moderate ⚡ 10-100x
Storage Cost 💰 High 💰 High 💰 High 💰 10x Lower
Open Format ✅ Iceberg
Multi-Cloud ⚠️ Atlas only ✅ Any cloud

Limitations

Django Iceberg is production-ready for many use cases, but has known limitations:

  • No database-level JOINs: Django handles joins in Python (ORM does this anyway)
  • Full table scans for UPDATE/DELETE: Efficient for small-medium datasets, not for billions of rows
  • Write throughput: Optimized for <10K writes/sec per table (sufficient for most apps)
  • Transaction scope: Per-table only (multi-table transactions not yet supported)
  • Django features: Some advanced features disabled (see features.py)

See polars_iceberg/CLAUDE.md for detailed limitations.


Roadmap

  • Multi-table transactions
  • Query caching layer
  • Background compaction scheduler
  • DuckDB integration for complex queries
  • GraphQL subscriptions with time travel
  • Django Admin integration for snapshot browsing
  • Prometheus metrics exporter
  • Terraform module for AWS deployment

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Ways to contribute:

  • Report bugs and request features via GitHub Issues
  • Improve documentation
  • Add tests
  • Submit pull requests

Community

  • GitHub Discussions: Ask questions and share ideas
  • Discord: Join our community server (link TBD)
  • Twitter: Follow @djangoiceberg for updates

License

MIT License - see LICENSE for details.


Acknowledgments

Django Iceberg stands on the shoulders of giants:

  • Apache Iceberg: Netflix, Apple, LinkedIn, and the open source community
  • Polars: Ritchie Vink and contributors
  • Django: Django Software Foundation
  • PyArrow: Apache Arrow community

Credits

Created by the Django Iceberg team. Powered by modern data infrastructure.

Built with: Apache Iceberg 🧊 | Polars 🐻‍❄️ | Django 🦄 | PyArrow 🏹


Ready to join the database revolution? Get started now or read why this is the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_iceberg-0.1.0.tar.gz (49.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

django_iceberg-0.1.0-py3-none-any.whl (44.1 kB view details)

Uploaded Python 3

File details

Details for the file django_iceberg-0.1.0.tar.gz.

File metadata

  • Download URL: django_iceberg-0.1.0.tar.gz
  • Upload date:
  • Size: 49.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for django_iceberg-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ec2e33da1ea76887a33ce38d834a7a22c8abad65d740863c4bfc161072282fbd
MD5 f449ca2300e6809e5de196559861876b
BLAKE2b-256 7fde72d2a2ef78d25d7c271766bfaa74861b903e6faa5dee1cf31efbfa77fbd6

See more details on using hashes here.

Provenance

The following attestation bundles were made for django_iceberg-0.1.0.tar.gz:

Publisher: publish.yml on theserverkid/django-iceberg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file django_iceberg-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: django_iceberg-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for django_iceberg-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa8d007024a496866ad423d00584cf4be6781fcfb2a33f0c4bf16a5b6dc6f607
MD5 e5e7d591f0820511612d5616ed4ea8f7
BLAKE2b-256 22766b43918ecea420d1db2911dd01fd06449c3cc9984e8dbc3ac7c4daf6d568

See more details on using hashes here.

Provenance

The following attestation bundles were made for django_iceberg-0.1.0-py3-none-any.whl:

Publisher: publish.yml on theserverkid/django-iceberg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page