Django database backend powered by Apache Iceberg and Polars - bringing time travel, cloud-native storage, and blazing-fast analytics to Django

These details have not been verified by PyPI

Project description

🚀 Django Iceberg: Django Database Backend for Apache Iceberg + Polars

A revolutionary Django database backend that replaces traditional SQL databases with Apache Iceberg and Polars, bringing time travel, cloud-native storage, and blazing-fast analytics to your Django applications.

# Write normal Django code
class Article(TimeTravelMixin, models.Model):
    title = models.CharField(max_length=200)
    content = models.TextField()
    published_at = models.DateTimeField()

# Get superpowers: Query data from last week
Article.objects.as_of(timezone.now() - timedelta(days=7)).all()

# See complete history of any record
article.history()

# Run lightning-fast analytics with Polars
df = Article.objects.values('published_at', 'views').to_polars()
summary = df.groupby_dynamic('published_at', every='1w').agg(pl.col('views').sum())

Why Django Iceberg?

Traditional databases (PostgreSQL, MySQL) were designed 40+ years ago. Django Iceberg leverages modern data infrastructure to give Django apps:

🕰️ Time Travel Queries - Query data as it existed at any point in history
☁️ Cloud-Native Storage - Store your database on S3/GCS/Azure at 1/10th the cost
⚡ 10-100x Faster Analytics - Polars makes aggregations blazingly fast
🔄 Zero-Downtime Schema Changes - Iceberg schema evolution without locks
🔒 ACID on Object Storage - Full transactional guarantees on cloud storage
📊 Open Table Format - Your data works with Spark, DuckDB, Trino, etc.
🌍 Multi-Cloud Portable - No vendor lock-in, run anywhere

See WHY.md for the full story of why this is the future of databases.

Quick Start

Installation

pip install django-iceberg

Configuration

In your Django settings.py:

DATABASES = {
    "default": {
        "ENGINE": "polars_iceberg.backend",
        "WAREHOUSE": "data/warehouse",              # Local path or s3://bucket/path
        "CATALOG_URI": "sqlite:///data/catalog.db", # SQLite for local, REST for prod
        "NAMESPACE": "default",
    }
}

Run Migrations

python manage.py migrate

Start Building

# models.py
from django.db import models
from polars_iceberg.timetravel import TimeTravelMixin

class Order(TimeTravelMixin, models.Model):
    customer_email = models.EmailField()
    total = models.DecimalField(max_digits=10, decimal_places=2)
    created_at = models.DateTimeField(auto_now_add=True)

# Use it like any Django model
Order.objects.create(customer_email="alice@example.com", total=99.99)

# Plus time travel
orders_yesterday = Order.objects.as_of(timezone.now() - timedelta(days=1))

Features

Time Travel Queries

Query historical data without complex triggers or audit tables:

# As of specific timestamp
User.objects.as_of(datetime(2026, 1, 1)).filter(is_active=True)

# List all snapshots
snapshots = User.objects.snapshots()
for snap in snapshots:
    print(f"Snapshot {snap.snapshot_id} at {snap.committed_at}")

# Complete history of a record
user = User.objects.get(pk=123)
for version in user.history():
    print(f"{version.snapshot_timestamp}: {version.email}")

Cloud-Native Storage

Deploy with infinitely scalable object storage:

DATABASES = {
    "default": {
        "ENGINE": "polars_iceberg.backend",
        "WAREHOUSE": "s3://my-bucket/warehouse",
        "CATALOG_URI": "https://catalog.example.com",  # REST catalog
        "NAMESPACE": "production",
        "OPTIONS": {
            "s3.access-key-id": "AKIAIOSFODNN7EXAMPLE",
            "s3.secret-access-key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
            "s3.region": "us-west-2",
        }
    }
}

Supported backends:

AWS S3 / S3 Express One Zone
Google Cloud Storage
Azure Blob Storage
MinIO
Local filesystem (development)

Blazing Fast Analytics

Polars is 10-100x faster than Pandas for DataFrame operations:

# Export to Polars DataFrame
orders_df = Order.objects.values('customer_email', 'total', 'created_at').to_polars()

# Run complex aggregations at lightning speed
summary = (
    orders_df
    .groupby('customer_email')
    .agg([
        pl.col('total').sum().alias('total_spent'),
        pl.col('total').count().alias('order_count'),
        pl.col('created_at').max().alias('last_order'),
    ])
    .sort('total_spent', descending=True)
)

Schema Evolution

Add, remove, or change columns without downtime:

# Django migrations just work
class Migration(migrations.Migration):
    operations = [
        migrations.AddField(
            model_name='article',
            name='view_count',
            field=models.IntegerField(default=0),
        ),
    ]

Iceberg applies schema changes instantly without rewriting data.

ACID Transactions

Full ACID guarantees on object storage:

from django.db import transaction

with transaction.atomic():
    account.balance -= 100
    account.save()

    other_account.balance += 100
    other_account.save()

    # Both updates commit atomically via Iceberg snapshot

Application-Level Constraints

Foreign keys, unique constraints, and NOT NULL validation:

class Order(models.Model):
    customer = models.ForeignKey(Customer, on_delete=models.PROTECT)
    order_number = models.CharField(max_length=20, unique=True)
    total = models.DecimalField(max_digits=10, decimal_places=2)

# Constraints enforced before writes
Order.objects.create(
    customer_id=999,  # Raises IntegrityError if customer doesn't exist
    order_number="ORD-123",  # Raises IntegrityError if duplicate
    total=None,  # Raises IntegrityError if NOT NULL
)

Architecture

Django ORM
    ↓
SQL Query (with parameters)
    ↓
QueryCompiler (SQL → Polars)
    ↓
Polars DataFrames (in-memory operations)
    ↓
IcebergManager (catalog + table operations)
    ↓
Apache Iceberg (ACID transactions)
    ↓
Parquet Files (S3 / GCS / Azure / Local)

Key Components:

query_compiler.py (713 lines) - Translates Django SQL to Polars operations
iceberg_manager.py (534 lines) - Manages Iceberg catalog and table I/O
base.py (521 lines) - Django database wrapper and cursor implementation
schema.py (431 lines) - Handles Django migrations and schema evolution
constraints.py (425 lines) - Application-level FK, unique, and NOT NULL validation
timetravel.py (275 lines) - Time travel QuerySet and Manager API

See polars_iceberg/CLAUDE.md for detailed architecture documentation.

Performance

Operation	PostgreSQL	Django Iceberg	Speedup
SELECT (indexed)	5ms	3-8ms	~1x
Aggregation (10M rows)	2,000ms	50-200ms	10-40x
Time travel query	N/A	10ms	∞
Schema change	5,000ms (locks)	<1ms (instant)	5000x
Storage cost/TB/month	$200	$20	10x savings

Best for:

Read-heavy workloads with analytics
Time-series data (logs, events, metrics)
Compliance requiring audit trails
Multi-tenant SaaS applications
Cloud-native architectures

Not ideal for:

High-frequency trading (microsecond latency)
Write-heavy OLTP (>100K writes/sec)
Complex JOIN-heavy queries
Very small datasets (<1GB)

Use Cases

SaaS with Audit Requirements

Challenge: Healthcare app needs HIPAA-compliant audit trails.

Solution: Time travel provides complete history of all data changes. Query who accessed what and when without performance overhead.

# Compliance report: Show patient record at time of access
patient_at_access = Patient.objects.as_of(access_timestamp).get(pk=patient_id)

E-Commerce Analytics

Challenge: Daily sales reports on millions of orders.

Solution: Polars aggregates data 50x faster than traditional GROUP BY queries.

# Daily sales summary in seconds, not minutes
df = Order.objects.values('created_at', 'total').to_polars()
daily_sales = df.groupby_dynamic('created_at', every='1d').agg(pl.col('total').sum())

Multi-Tenant Application

Challenge: Isolated data per customer, efficient analytics per tenant.

Solution: Partition by tenant_id, scale compute independently from storage.

# Each tenant's data physically partitioned in Iceberg
class TenantData(models.Model):
    tenant_id = models.IntegerField(db_index=True)
    # ... other fields

# Fast per-tenant queries via Iceberg partition pruning
TenantData.objects.filter(tenant_id=42).count()  # Scans only tenant 42's files

Deployment

Local Development

# Use SQLite catalog and local filesystem
DATABASES = {
    "default": {
        "ENGINE": "polars_iceberg.backend",
        "WAREHOUSE": "data/warehouse",
        "CATALOG_URI": "sqlite:///data/catalog.db",
        "NAMESPACE": "default",
    }
}

Production on AWS

DATABASES = {
    "default": {
        "ENGINE": "polars_iceberg.backend",
        "WAREHOUSE": "s3://prod-data-lake/warehouse",
        "CATALOG_URI": "https://catalog.prod.example.com",  # REST catalog
        "NAMESPACE": "production",
        "OPTIONS": {
            "s3.region": "us-east-1",
            "s3.access-key-id": os.environ["AWS_ACCESS_KEY_ID"],
            "s3.secret-access-key": os.environ["AWS_SECRET_ACCESS_KEY"],
        }
    }
}

Docker Compose

version: '3.8'
services:
  django:
    build: .
    environment:
      - DATABASE_WAREHOUSE=s3://my-bucket/warehouse
      - DATABASE_CATALOG_URI=http://catalog:8080
    depends_on:
      - catalog

  catalog:
    image: apache/iceberg-rest-catalog:latest
    ports:
      - "8080:8080"

Comparison with Traditional Databases

Feature	PostgreSQL	MySQL	MongoDB	Django Iceberg
Django ORM Support	✅ Full	✅ Full	⚠️ Via ODM	✅ Full
Time Travel	❌	❌	⚠️ Change streams	✅ Native
Cloud-Native	⚠️ Partial	⚠️ Partial	⚠️ Partial	✅ Full
Schema Evolution	🐢 Slow	🐢 Slow	✅ Fast	✅ Instant
Analytics Performance	🐢 Moderate	🐢 Moderate	🐢 Moderate	⚡ 10-100x
Storage Cost	💰 High	💰 High	💰 High	💰 10x Lower
Open Format	❌	❌	❌	✅ Iceberg
Multi-Cloud	❌	❌	⚠️ Atlas only	✅ Any cloud

Limitations

Django Iceberg is production-ready for many use cases, but has known limitations:

No database-level JOINs: Django handles joins in Python (ORM does this anyway)
Full table scans for UPDATE/DELETE: Efficient for small-medium datasets, not for billions of rows
Write throughput: Optimized for <10K writes/sec per table (sufficient for most apps)
Transaction scope: Per-table only (multi-table transactions not yet supported)
Django features: Some advanced features disabled (see features.py)

See polars_iceberg/CLAUDE.md for detailed limitations.

Roadmap

Multi-table transactions
Query caching layer
Background compaction scheduler
DuckDB integration for complex queries
GraphQL subscriptions with time travel
Django Admin integration for snapshot browsing
Prometheus metrics exporter
Terraform module for AWS deployment

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Ways to contribute:

Report bugs and request features via GitHub Issues
Improve documentation
Add tests
Submit pull requests

Community

GitHub Discussions: Ask questions and share ideas
Discord: Join our community server (link TBD)
Twitter: Follow @djangoiceberg for updates

License

MIT License - see LICENSE for details.

Acknowledgments

Django Iceberg stands on the shoulders of giants:

Apache Iceberg: Netflix, Apple, LinkedIn, and the open source community
Polars: Ritchie Vink and contributors
Django: Django Software Foundation
PyArrow: Apache Arrow community

Credits

Created by the Django Iceberg team. Powered by modern data infrastructure.

Built with: Apache Iceberg 🧊 | Polars 🐻‍❄️ | Django 🦄 | PyArrow 🏹

Ready to join the database revolution? Get started now or read why this is the future.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_iceberg-0.1.0.tar.gz (49.0 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

django_iceberg-0.1.0-py3-none-any.whl (44.1 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file django_iceberg-0.1.0.tar.gz.

File metadata

Download URL: django_iceberg-0.1.0.tar.gz
Upload date: Mar 15, 2026
Size: 49.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for django_iceberg-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ec2e33da1ea76887a33ce38d834a7a22c8abad65d740863c4bfc161072282fbd`
MD5	`f449ca2300e6809e5de196559861876b`
BLAKE2b-256	`7fde72d2a2ef78d25d7c271766bfaa74861b903e6faa5dee1cf31efbfa77fbd6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for django_iceberg-0.1.0.tar.gz:

Publisher: publish.yml on theserverkid/django-iceberg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: django_iceberg-0.1.0.tar.gz
- Subject digest: ec2e33da1ea76887a33ce38d834a7a22c8abad65d740863c4bfc161072282fbd
- Sigstore transparency entry: 1108179571
- Sigstore integration time: Mar 15, 2026
Source repository:
- Permalink: theserverkid/django-iceberg@ce91a4efdb3ab8a7fad6ae62eef73cc1934c8c20
- Branch / Tag: refs/heads/master
- Owner: https://github.com/theserverkid
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ce91a4efdb3ab8a7fad6ae62eef73cc1934c8c20
- Trigger Event: push

File details

Details for the file django_iceberg-0.1.0-py3-none-any.whl.

File metadata

Download URL: django_iceberg-0.1.0-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 44.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for django_iceberg-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa8d007024a496866ad423d00584cf4be6781fcfb2a33f0c4bf16a5b6dc6f607`
MD5	`e5e7d591f0820511612d5616ed4ea8f7`
BLAKE2b-256	`22766b43918ecea420d1db2911dd01fd06449c3cc9984e8dbc3ac7c4daf6d568`

See more details on using hashes here.

Provenance

The following attestation bundles were made for django_iceberg-0.1.0-py3-none-any.whl:

Publisher: publish.yml on theserverkid/django-iceberg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: django_iceberg-0.1.0-py3-none-any.whl
- Subject digest: aa8d007024a496866ad423d00584cf4be6781fcfb2a33f0c4bf16a5b6dc6f607
- Sigstore transparency entry: 1108179575
- Sigstore integration time: Mar 15, 2026
Source repository:
- Permalink: theserverkid/django-iceberg@ce91a4efdb3ab8a7fad6ae62eef73cc1934c8c20
- Branch / Tag: refs/heads/master
- Owner: https://github.com/theserverkid
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ce91a4efdb3ab8a7fad6ae62eef73cc1934c8c20
- Trigger Event: push

django-iceberg 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🚀 Django Iceberg: Django Database Backend for Apache Iceberg + Polars

Why Django Iceberg?

Quick Start

Installation

Configuration

Run Migrations

Start Building

Features

Time Travel Queries

Cloud-Native Storage

Blazing Fast Analytics

Schema Evolution

ACID Transactions

Application-Level Constraints

Architecture

Performance

Use Cases

SaaS with Audit Requirements

E-Commerce Analytics

Multi-Tenant Application

Deployment

Local Development

Production on AWS

Docker Compose

Comparison with Traditional Databases

Limitations

Roadmap

Contributing

Community

License

Acknowledgments

Credits

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance