Skip to main content

A utility to export DuckLake database metadata to Delta Lake transaction logs.

Project description

DuckLake Delta Exporter

A Python package for exporting DuckLake snapshots as Delta Lake checkpoint files, enabling compatibility with Delta Lake readers, support local path, s3 and gcs, for onelake use mounted storage as azure storage is not supported

this is just a fun project, please vote for a proper support in duckdb https://github.com/duckdb/duckdb-delta/issues/218

Repository

https://github.com/djouallah/ducklake_delta_exporter

Installation

pip install ducklake-delta-exporter

Usage

from ducklake_delta_exporter import generate_latest_delta_log

# Export all tables from a DuckLake database
generate_latest_delta_log("/path/to/ducklake.db")

# Specify a custom data root directory
generate_latest_delta_log("/path/to/ducklake.db", data_root="/custom/data/path")

What it does

This package converts DuckLake table snapshots into Delta Lake format by:

  1. Reading DuckLake metadata - Extracts table schemas, file paths, and snapshot information
  2. Creating Delta checkpoint files - Generates .checkpoint.parquet files with Delta Lake metadata
  3. Writing JSON transaction logs - Creates minimal .json log files for Spark compatibility
  4. Mapping data types - Converts DuckDB types to Spark SQL equivalents

Features

  • Spark Compatible - Generated Delta files can be read by Spark and other Delta Lake tools
  • Type Mapping - Automatic conversion between DuckDB and Spark data types
  • Batch Processing - Exports all tables in a DuckLake database
  • Error Handling - Graceful handling of missing snapshots and other issues
  • Progress Reporting - Clear feedback on export progress and results

Requirements

  • Python 3.8+
  • DuckDB

File Structure

After running the exporter, your Delta tables will have the following structure:

your_table/
├── data_file_1.parquet
├── data_file_2.parquet
└── _delta_log/
    ├── 00000000000000000000.json
    ├── 00000000000000000000.checkpoint.parquet
    └── _last_checkpoint

Type Mapping

The exporter automatically maps DuckDB types to Spark SQL types:

DuckDB Type Spark Type
INTEGER integer
BIGINT long
FLOAT double
DOUBLE double
DECIMAL decimal(10,0)
BOOLEAN boolean
TIMESTAMP timestamp
DATE date
VARCHAR string
Others string

Error Handling

The exporter handles various error conditions:

  • Missing snapshots - Skips tables with no data
  • Existing checkpoints - Avoids overwriting existing files
  • Schema changes - Uses the latest schema for each table
  • File system errors - Reports and continues with other tables

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ducklake_delta_exporter-0.2.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ducklake_delta_exporter-0.2.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file ducklake_delta_exporter-0.2.0.tar.gz.

File metadata

  • Download URL: ducklake_delta_exporter-0.2.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for ducklake_delta_exporter-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6d9e4e461068afdb47e6880a464c25026182dc91a1f3b253610b5703eb8cd1b3
MD5 e926c4a48e6bb277fe0e689c055be02d
BLAKE2b-256 ff66a962dd5b76d2230ecb4bec55d79c37459fbdc02b58e10bc183e926c0d125

See more details on using hashes here.

File details

Details for the file ducklake_delta_exporter-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ducklake_delta_exporter-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b644abec56f1561d3a08dd4d3740c8931ba1b5cf791ba28f8ccd156062304e28
MD5 784e244bb659555b4c2bb842e577413d
BLAKE2b-256 f25d932b685d9694bdedf4431c3fd64d46962a45b1e603b92a3f694be8838080

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page