Skip to main content

DuckLake provider for Apache Airflow (based on DuckDB)

Project description

DuckLake Provider for Apache Airflow

This is a custom provider for integrating DuckLake (based on DuckDB) with Apache Airflow.

DuckLake Configuration

The DuckLakeHook uses Airflow connection fields and extras to configure the connection. Standard fields are relabeled for common use:

  • Host: Used for metadata host (e.g., Postgres/MySQL host) or file path (e.g., for DuckDB/SQLite metadata file).
  • Login: Username (for Postgres/MySQL).
  • Password: Password (for Postgres/MySQL).
  • Schema: Metadata schema (defaults to 'duckdb').
  • Extra: JSON dict for all other settings (required for engine, storage_type, and conditional fields).

Example extras JSON (adjust based on engine and storage_type):

{
  "engine": "postgres",
  "dbname": "my_ducklake",
  "pgdbname": "dev_nophiml_db",
  "storage_type": "s3",
  "s3_bucket": "your-s3-bucket",
  "s3_path": "your/s3/path/",
  "aws_access_key_id": "your-access-key-id",
  "aws_secret_access_key": "your-secret-access-key",
  "aws_region": "us-east-1",
  "install_extensions": ["spatial"],  # Optional: Inherited from DuckDB provider
  "load_extensions": ["spatial"],     # Optional
  "connect_stack": [                  # Optional: override default DuckLake install/load commands
    "INSTALL httpfs;",
    "LOAD httpfs;",
    "INSTALL ducklake;",
    "LOAD ducklake;"
  ]
}

Supported Engines (set in extras['engine'])

  • duckdb: Requires 'metadata_file' in extras or host as file path.
  • sqlite: Requires 'metadata_file' in extras or host as file path.
  • postgres: Requires host, login, password, and 'pgdbname' in extras.
  • mysql: Requires host, login, password, and 'mysqldbname' in extras.

Supported Storage Types (set in extras['storage_type'], default 's3')

  • s3: Requires 's3_bucket', 's3_path'; optional AWS creds.
  • azure: Requires 'azure_account_name', 'azure_container', 'azure_path'; optional connection_string.
  • gcs: Requires 'gcs_bucket', 'gcs_path'; optional service_account_key (JSON string).
  • local: Requires 'local_data_path'.

The UI shows core fields; use extras for engine/storage-specific ones. For dynamic behavior, select engine/storage in extras and provide corresponding keys. If you need to customize the static DuckLake connection commands (for example to install additional extensions), provide a connect_stack list in extras. Commands that depend on runtime variables (secrets, thread settings, attachments, etc.) are always appended automatically by the hook.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_provider_ducklake-0.0.9.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_provider_ducklake-0.0.9-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file airflow_provider_ducklake-0.0.9.tar.gz.

File metadata

File hashes

Hashes for airflow_provider_ducklake-0.0.9.tar.gz
Algorithm Hash digest
SHA256 f2c71b49c228db62e70431319d7c127004fda9896ef4c8e4ee1de4a5147f7012
MD5 f24c9a58f1eb2a76a3309bca65cc1df2
BLAKE2b-256 484d801bd366506dcefff99369b8629fbe7a635f4f84aab0d29e248e50945c78

See more details on using hashes here.

File details

Details for the file airflow_provider_ducklake-0.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_provider_ducklake-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f2a4d820a2e4ab1be46bf20070de7480f393f6eee85981591ef747485b7178ee
MD5 96da9cea6de74751dc2ccb0c0cdc6f73
BLAKE2b-256 1fa6fa9c6bbfee20b73798095770cec7315952c3e63bcbe2a221050b4d52d368

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page