Django database backend powered by Apache Iceberg and Polars - bringing time travel, cloud-native storage, and blazing-fast analytics to Django
Project description
🚀 Django Iceberg: Django Database Backend for Apache Iceberg + Polars
A revolutionary Django database backend that replaces traditional SQL databases with Apache Iceberg and Polars, bringing time travel, cloud-native storage, and blazing-fast analytics to your Django applications.
# Write normal Django code
class Article(TimeTravelMixin, models.Model):
title = models.CharField(max_length=200)
content = models.TextField()
published_at = models.DateTimeField()
# Get superpowers: Query data from last week
Article.objects.as_of(timezone.now() - timedelta(days=7)).all()
# See complete history of any record
article.history()
# Run lightning-fast analytics with Polars
df = Article.objects.values('published_at', 'views').to_polars()
summary = df.groupby_dynamic('published_at', every='1w').agg(pl.col('views').sum())
Why Django Iceberg?
Traditional databases (PostgreSQL, MySQL) were designed 40+ years ago. Django Iceberg leverages modern data infrastructure to give Django apps:
- 🕰️ Time Travel Queries - Query data as it existed at any point in history
- ☁️ Cloud-Native Storage - Store your database on S3/GCS/Azure at 1/10th the cost
- ⚡ 10-100x Faster Analytics - Polars makes aggregations blazingly fast
- 🔄 Zero-Downtime Schema Changes - Iceberg schema evolution without locks
- 🔒 ACID on Object Storage - Full transactional guarantees on cloud storage
- 📊 Open Table Format - Your data works with Spark, DuckDB, Trino, etc.
- 🌍 Multi-Cloud Portable - No vendor lock-in, run anywhere
See WHY.md for the full story of why this is the future of databases.
Quick Start
Installation
pip install django-iceberg
Configuration
In your Django settings.py:
DATABASES = {
"default": {
"ENGINE": "polars_iceberg.backend",
"WAREHOUSE": "data/warehouse", # Local path or s3://bucket/path
"CATALOG_URI": "sqlite:///data/catalog.db", # SQLite for local, REST for prod
"NAMESPACE": "default",
}
}
Run Migrations
python manage.py migrate
Start Building
# models.py
from django.db import models
from polars_iceberg.timetravel import TimeTravelMixin
class Order(TimeTravelMixin, models.Model):
customer_email = models.EmailField()
total = models.DecimalField(max_digits=10, decimal_places=2)
created_at = models.DateTimeField(auto_now_add=True)
# Use it like any Django model
Order.objects.create(customer_email="alice@example.com", total=99.99)
# Plus time travel
orders_yesterday = Order.objects.as_of(timezone.now() - timedelta(days=1))
Features
Time Travel Queries
Query historical data without complex triggers or audit tables:
# As of specific timestamp
User.objects.as_of(datetime(2026, 1, 1)).filter(is_active=True)
# List all snapshots
snapshots = User.objects.snapshots()
for snap in snapshots:
print(f"Snapshot {snap.snapshot_id} at {snap.committed_at}")
# Complete history of a record
user = User.objects.get(pk=123)
for version in user.history():
print(f"{version.snapshot_timestamp}: {version.email}")
Cloud-Native Storage
Deploy with infinitely scalable object storage:
DATABASES = {
"default": {
"ENGINE": "polars_iceberg.backend",
"WAREHOUSE": "s3://my-bucket/warehouse",
"CATALOG_URI": "https://catalog.example.com", # REST catalog
"NAMESPACE": "production",
"OPTIONS": {
"s3.access-key-id": "AKIAIOSFODNN7EXAMPLE",
"s3.secret-access-key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"s3.region": "us-west-2",
}
}
}
Supported backends:
- AWS S3 / S3 Express One Zone
- Google Cloud Storage
- Azure Blob Storage
- MinIO
- Local filesystem (development)
Blazing Fast Analytics
Polars is 10-100x faster than Pandas for DataFrame operations:
# Export to Polars DataFrame
orders_df = Order.objects.values('customer_email', 'total', 'created_at').to_polars()
# Run complex aggregations at lightning speed
summary = (
orders_df
.groupby('customer_email')
.agg([
pl.col('total').sum().alias('total_spent'),
pl.col('total').count().alias('order_count'),
pl.col('created_at').max().alias('last_order'),
])
.sort('total_spent', descending=True)
)
Schema Evolution
Add, remove, or change columns without downtime:
# Django migrations just work
class Migration(migrations.Migration):
operations = [
migrations.AddField(
model_name='article',
name='view_count',
field=models.IntegerField(default=0),
),
]
Iceberg applies schema changes instantly without rewriting data.
ACID Transactions
Full ACID guarantees on object storage:
from django.db import transaction
with transaction.atomic():
account.balance -= 100
account.save()
other_account.balance += 100
other_account.save()
# Both updates commit atomically via Iceberg snapshot
Application-Level Constraints
Foreign keys, unique constraints, and NOT NULL validation:
class Order(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.PROTECT)
order_number = models.CharField(max_length=20, unique=True)
total = models.DecimalField(max_digits=10, decimal_places=2)
# Constraints enforced before writes
Order.objects.create(
customer_id=999, # Raises IntegrityError if customer doesn't exist
order_number="ORD-123", # Raises IntegrityError if duplicate
total=None, # Raises IntegrityError if NOT NULL
)
Architecture
Django ORM
↓
SQL Query (with parameters)
↓
QueryCompiler (SQL → Polars)
↓
Polars DataFrames (in-memory operations)
↓
IcebergManager (catalog + table operations)
↓
Apache Iceberg (ACID transactions)
↓
Parquet Files (S3 / GCS / Azure / Local)
Key Components:
- query_compiler.py (713 lines) - Translates Django SQL to Polars operations
- iceberg_manager.py (534 lines) - Manages Iceberg catalog and table I/O
- base.py (521 lines) - Django database wrapper and cursor implementation
- schema.py (431 lines) - Handles Django migrations and schema evolution
- constraints.py (425 lines) - Application-level FK, unique, and NOT NULL validation
- timetravel.py (275 lines) - Time travel QuerySet and Manager API
See polars_iceberg/CLAUDE.md for detailed architecture documentation.
Performance
| Operation | PostgreSQL | Django Iceberg | Speedup |
|---|---|---|---|
| SELECT (indexed) | 5ms | 3-8ms | ~1x |
| Aggregation (10M rows) | 2,000ms | 50-200ms | 10-40x |
| Time travel query | N/A | 10ms | ∞ |
| Schema change | 5,000ms (locks) | <1ms (instant) | 5000x |
| Storage cost/TB/month | $200 | $20 | 10x savings |
Best for:
- Read-heavy workloads with analytics
- Time-series data (logs, events, metrics)
- Compliance requiring audit trails
- Multi-tenant SaaS applications
- Cloud-native architectures
Not ideal for:
- High-frequency trading (microsecond latency)
- Write-heavy OLTP (>100K writes/sec)
- Complex JOIN-heavy queries
- Very small datasets (<1GB)
Use Cases
SaaS with Audit Requirements
Challenge: Healthcare app needs HIPAA-compliant audit trails.
Solution: Time travel provides complete history of all data changes. Query who accessed what and when without performance overhead.
# Compliance report: Show patient record at time of access
patient_at_access = Patient.objects.as_of(access_timestamp).get(pk=patient_id)
E-Commerce Analytics
Challenge: Daily sales reports on millions of orders.
Solution: Polars aggregates data 50x faster than traditional GROUP BY queries.
# Daily sales summary in seconds, not minutes
df = Order.objects.values('created_at', 'total').to_polars()
daily_sales = df.groupby_dynamic('created_at', every='1d').agg(pl.col('total').sum())
Multi-Tenant Application
Challenge: Isolated data per customer, efficient analytics per tenant.
Solution: Partition by tenant_id, scale compute independently from storage.
# Each tenant's data physically partitioned in Iceberg
class TenantData(models.Model):
tenant_id = models.IntegerField(db_index=True)
# ... other fields
# Fast per-tenant queries via Iceberg partition pruning
TenantData.objects.filter(tenant_id=42).count() # Scans only tenant 42's files
Deployment
Local Development
# Use SQLite catalog and local filesystem
DATABASES = {
"default": {
"ENGINE": "polars_iceberg.backend",
"WAREHOUSE": "data/warehouse",
"CATALOG_URI": "sqlite:///data/catalog.db",
"NAMESPACE": "default",
}
}
Production on AWS
DATABASES = {
"default": {
"ENGINE": "polars_iceberg.backend",
"WAREHOUSE": "s3://prod-data-lake/warehouse",
"CATALOG_URI": "https://catalog.prod.example.com", # REST catalog
"NAMESPACE": "production",
"OPTIONS": {
"s3.region": "us-east-1",
"s3.access-key-id": os.environ["AWS_ACCESS_KEY_ID"],
"s3.secret-access-key": os.environ["AWS_SECRET_ACCESS_KEY"],
}
}
}
Docker Compose
version: '3.8'
services:
django:
build: .
environment:
- DATABASE_WAREHOUSE=s3://my-bucket/warehouse
- DATABASE_CATALOG_URI=http://catalog:8080
depends_on:
- catalog
catalog:
image: apache/iceberg-rest-catalog:latest
ports:
- "8080:8080"
Comparison with Traditional Databases
| Feature | PostgreSQL | MySQL | MongoDB | Django Iceberg |
|---|---|---|---|---|
| Django ORM Support | ✅ Full | ✅ Full | ⚠️ Via ODM | ✅ Full |
| Time Travel | ❌ | ❌ | ⚠️ Change streams | ✅ Native |
| Cloud-Native | ⚠️ Partial | ⚠️ Partial | ⚠️ Partial | ✅ Full |
| Schema Evolution | 🐢 Slow | 🐢 Slow | ✅ Fast | ✅ Instant |
| Analytics Performance | 🐢 Moderate | 🐢 Moderate | 🐢 Moderate | ⚡ 10-100x |
| Storage Cost | 💰 High | 💰 High | 💰 High | 💰 10x Lower |
| Open Format | ❌ | ❌ | ❌ | ✅ Iceberg |
| Multi-Cloud | ❌ | ❌ | ⚠️ Atlas only | ✅ Any cloud |
Limitations
Django Iceberg is production-ready for many use cases, but has known limitations:
- No database-level JOINs: Django handles joins in Python (ORM does this anyway)
- Full table scans for UPDATE/DELETE: Efficient for small-medium datasets, not for billions of rows
- Write throughput: Optimized for <10K writes/sec per table (sufficient for most apps)
- Transaction scope: Per-table only (multi-table transactions not yet supported)
- Django features: Some advanced features disabled (see features.py)
See polars_iceberg/CLAUDE.md for detailed limitations.
Roadmap
- Multi-table transactions
- Query caching layer
- Background compaction scheduler
- DuckDB integration for complex queries
- GraphQL subscriptions with time travel
- Django Admin integration for snapshot browsing
- Prometheus metrics exporter
- Terraform module for AWS deployment
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Ways to contribute:
- Report bugs and request features via GitHub Issues
- Improve documentation
- Add tests
- Submit pull requests
Community
- GitHub Discussions: Ask questions and share ideas
- Discord: Join our community server (link TBD)
- Twitter: Follow @djangoiceberg for updates
License
MIT License - see LICENSE for details.
Acknowledgments
Django Iceberg stands on the shoulders of giants:
- Apache Iceberg: Netflix, Apple, LinkedIn, and the open source community
- Polars: Ritchie Vink and contributors
- Django: Django Software Foundation
- PyArrow: Apache Arrow community
Credits
Created by the Django Iceberg team. Powered by modern data infrastructure.
Built with: Apache Iceberg 🧊 | Polars 🐻❄️ | Django 🦄 | PyArrow 🏹
Ready to join the database revolution? Get started now or read why this is the future.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file django_iceberg-0.1.0.tar.gz.
File metadata
- Download URL: django_iceberg-0.1.0.tar.gz
- Upload date:
- Size: 49.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec2e33da1ea76887a33ce38d834a7a22c8abad65d740863c4bfc161072282fbd
|
|
| MD5 |
f449ca2300e6809e5de196559861876b
|
|
| BLAKE2b-256 |
7fde72d2a2ef78d25d7c271766bfaa74861b903e6faa5dee1cf31efbfa77fbd6
|
Provenance
The following attestation bundles were made for django_iceberg-0.1.0.tar.gz:
Publisher:
publish.yml on theserverkid/django-iceberg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
django_iceberg-0.1.0.tar.gz -
Subject digest:
ec2e33da1ea76887a33ce38d834a7a22c8abad65d740863c4bfc161072282fbd - Sigstore transparency entry: 1108179571
- Sigstore integration time:
-
Permalink:
theserverkid/django-iceberg@ce91a4efdb3ab8a7fad6ae62eef73cc1934c8c20 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/theserverkid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ce91a4efdb3ab8a7fad6ae62eef73cc1934c8c20 -
Trigger Event:
push
-
Statement type:
File details
Details for the file django_iceberg-0.1.0-py3-none-any.whl.
File metadata
- Download URL: django_iceberg-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa8d007024a496866ad423d00584cf4be6781fcfb2a33f0c4bf16a5b6dc6f607
|
|
| MD5 |
e5e7d591f0820511612d5616ed4ea8f7
|
|
| BLAKE2b-256 |
22766b43918ecea420d1db2911dd01fd06449c3cc9984e8dbc3ac7c4daf6d568
|
Provenance
The following attestation bundles were made for django_iceberg-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on theserverkid/django-iceberg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
django_iceberg-0.1.0-py3-none-any.whl -
Subject digest:
aa8d007024a496866ad423d00584cf4be6781fcfb2a33f0c4bf16a5b6dc6f607 - Sigstore transparency entry: 1108179575
- Sigstore integration time:
-
Permalink:
theserverkid/django-iceberg@ce91a4efdb3ab8a7fad6ae62eef73cc1934c8c20 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/theserverkid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ce91a4efdb3ab8a7fad6ae62eef73cc1934c8c20 -
Trigger Event:
push
-
Statement type: