A utility to export DuckLake database metadata to Delta Lake transaction logs.
Project description
DuckLake Delta Exporter
A Python package for exporting DuckLake snapshots as Delta Lake checkpoint files, enabling compatibility with Delta Lake readers, support local path, s3 and gcs, for onelake use mounted storage as azure storage is not supported
this is just a fun project, please vote for a proper support in duckdb https://github.com/duckdb/duckdb-delta/issues/218
Repository
https://github.com/djouallah/ducklake_delta_exporter
Installation
pip install ducklake-delta-exporter
Usage
from ducklake_delta_exporter import generate_latest_delta_log
# Export all tables from a DuckLake database
generate_latest_delta_log("/path/to/ducklake.db")
# Specify a custom data root directory
generate_latest_delta_log("/path/to/ducklake.db", data_root="/custom/data/path")
What it does
This package converts DuckLake table snapshots into Delta Lake format by:
- Reading DuckLake metadata - Extracts table schemas, file paths, and snapshot information
- Creating Delta checkpoint files - Generates
.checkpoint.parquetfiles with Delta Lake metadata - Writing JSON transaction logs - Creates minimal
.jsonlog files for Spark compatibility - Mapping data types - Converts DuckDB types to Spark SQL equivalents
Features
- ✅ Spark Compatible - Generated Delta files can be read by Spark and other Delta Lake tools
- ✅ Type Mapping - Automatic conversion between DuckDB and Spark data types
- ✅ Batch Processing - Exports all tables in a DuckLake database
- ✅ Error Handling - Graceful handling of missing snapshots and other issues
- ✅ Progress Reporting - Clear feedback on export progress and results
Requirements
- Python 3.8+
- DuckDB
File Structure
After running the exporter, your Delta tables will have the following structure:
your_table/
├── data_file_1.parquet
├── data_file_2.parquet
└── _delta_log/
├── 00000000000000000000.json
├── 00000000000000000000.checkpoint.parquet
└── _last_checkpoint
Type Mapping
The exporter automatically maps DuckDB types to Spark SQL types:
| DuckDB Type | Spark Type |
|---|---|
| INTEGER | integer |
| BIGINT | long |
| FLOAT | double |
| DOUBLE | double |
| DECIMAL | decimal(10,0) |
| BOOLEAN | boolean |
| TIMESTAMP | timestamp |
| DATE | date |
| VARCHAR | string |
| Others | string |
Error Handling
The exporter handles various error conditions:
- Missing snapshots - Skips tables with no data
- Existing checkpoints - Avoids overwriting existing files
- Schema changes - Uses the latest schema for each table
- File system errors - Reports and continues with other tables
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ducklake_delta_exporter-0.3.0.tar.gz.
File metadata
- Download URL: ducklake_delta_exporter-0.3.0.tar.gz
- Upload date:
- Size: 13.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acf06d9ddf36eb8ebc91514fe6d8957268a34057f2f68e2bf578303392c0ad77
|
|
| MD5 |
d0b58f04dffbd2caf78625600345b057
|
|
| BLAKE2b-256 |
ab7ae5da66febb8ed2400cd5e3f895a1067ce2c4debb06046f0ba4a1f27fbc92
|
File details
Details for the file ducklake_delta_exporter-0.3.0-py3-none-any.whl.
File metadata
- Download URL: ducklake_delta_exporter-0.3.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb6ff1377a2638fbeab0bae1a60c161aa2ac41eb7ba0e232b15741b1f99a340a
|
|
| MD5 |
b44849dae2000572de6df5c6acff572f
|
|
| BLAKE2b-256 |
fd3ecc64c1f4cedfece26f6a7a31d7959b2ef24c9bd64f1fe5727eb4184ee73b
|