Skip to main content

A utility to export DuckLake database metadata to Delta Lake transaction logs.

Project description

DuckLake Delta Exporter

A Python package for exporting DuckLake snapshots as Delta Lake checkpoint files, enabling compatibility with Delta Lake readers, support local path, s3 and gcs, for onelake use mounted storage as azure storage is not supported

this is just a fun project, please vote for a proper support in duckdb https://github.com/duckdb/duckdb-delta/issues/218

Repository

https://github.com/djouallah/ducklake_delta_exporter

Installation

pip install ducklake-delta-exporter

Usage

from ducklake_delta_exporter import generate_latest_delta_log

# Export all tables from a DuckLake database
generate_latest_delta_log("/path/to/ducklake.db")

# Specify a custom data root directory
generate_latest_delta_log("/path/to/ducklake.db", data_root="/custom/data/path")

What it does

This package converts DuckLake table snapshots into Delta Lake format by:

  1. Reading DuckLake metadata - Extracts table schemas, file paths, and snapshot information
  2. Creating Delta checkpoint files - Generates .checkpoint.parquet files with Delta Lake metadata
  3. Writing JSON transaction logs - Creates minimal .json log files for Spark compatibility
  4. Mapping data types - Converts DuckDB types to Spark SQL equivalents

Features

  • Spark Compatible - Generated Delta files can be read by Spark and other Delta Lake tools
  • Type Mapping - Automatic conversion between DuckDB and Spark data types
  • Batch Processing - Exports all tables in a DuckLake database
  • Error Handling - Graceful handling of missing snapshots and other issues
  • Progress Reporting - Clear feedback on export progress and results

Requirements

  • Python 3.8+
  • DuckDB

File Structure

After running the exporter, your Delta tables will have the following structure:

your_table/
├── data_file_1.parquet
├── data_file_2.parquet
└── _delta_log/
    ├── 00000000000000000000.json
    ├── 00000000000000000000.checkpoint.parquet
    └── _last_checkpoint

Type Mapping

The exporter automatically maps DuckDB types to Spark SQL types:

DuckDB Type Spark Type
INTEGER integer
BIGINT long
FLOAT double
DOUBLE double
DECIMAL decimal(10,0)
BOOLEAN boolean
TIMESTAMP timestamp
DATE date
VARCHAR string
Others string

Error Handling

The exporter handles various error conditions:

  • Missing snapshots - Skips tables with no data
  • Existing checkpoints - Avoids overwriting existing files
  • Schema changes - Uses the latest schema for each table
  • File system errors - Reports and continues with other tables

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ducklake_delta_exporter-0.3.0.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ducklake_delta_exporter-0.3.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file ducklake_delta_exporter-0.3.0.tar.gz.

File metadata

  • Download URL: ducklake_delta_exporter-0.3.0.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for ducklake_delta_exporter-0.3.0.tar.gz
Algorithm Hash digest
SHA256 acf06d9ddf36eb8ebc91514fe6d8957268a34057f2f68e2bf578303392c0ad77
MD5 d0b58f04dffbd2caf78625600345b057
BLAKE2b-256 ab7ae5da66febb8ed2400cd5e3f895a1067ce2c4debb06046f0ba4a1f27fbc92

See more details on using hashes here.

File details

Details for the file ducklake_delta_exporter-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ducklake_delta_exporter-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb6ff1377a2638fbeab0bae1a60c161aa2ac41eb7ba0e232b15741b1f99a340a
MD5 b44849dae2000572de6df5c6acff572f
BLAKE2b-256 fd3ecc64c1f4cedfece26f6a7a31d7959b2ef24c9bd64f1fe5727eb4184ee73b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page