Skip to main content

DuckLake provider for Apache Airflow (based on DuckDB)

Project description

DuckLake Provider for Apache Airflow

This is a custom provider for integrating DuckLake (based on DuckDB) with Apache Airflow.

DuckLake Configuration

The DuckLakeHook uses Airflow connection fields and extras to configure the connection. Standard fields are relabeled for common use:

  • Host: Used for metadata host (e.g., Postgres/MySQL host) or file path (e.g., for DuckDB/SQLite metadata file).
  • Login: Username (for Postgres/MySQL).
  • Password: Password (for Postgres/MySQL).
  • Schema: Metadata schema (defaults to 'duckdb').
  • Extra: JSON dict for all other settings (required for engine, storage_type, and conditional fields).

Example extras JSON (adjust based on engine and storage_type): { "engine": "postgres", "dbname": "my_ducklake", "pgdbname": "dev_nophiml_db", "storage_type": "s3", "s3_bucket": "your-s3-bucket", "s3_path": "your/s3/path/", "aws_access_key_id": "your-access-key-id", "aws_secret_access_key": "your-secret-access-key", "aws_region": "us-east-1", "install_extensions": ["spatial"], # Optional: Inherited from DuckDB provider "load_extensions": ["spatial"] # Optional }

Supported Engines (set in extras['engine'])

  • duckdb: Requires 'metadata_file' in extras or host as file path.
  • sqlite: Requires 'metadata_file' in extras or host as file path.
  • postgres: Requires host, login, password, and 'pgdbname' in extras.
  • mysql: Requires host, login, password, and 'mysqldbname' in extras.

Supported Storage Types (set in extras['storage_type'], default 's3')

  • s3: Requires 's3_bucket', 's3_path'; optional AWS creds.
  • azure: Requires 'azure_account_name', 'azure_container', 'azure_path'; optional connection_string.
  • gcs: Requires 'gcs_bucket', 'gcs_path'; optional service_account_key (JSON string).
  • local: Requires 'local_data_path'.

The UI shows core fields; use extras for engine/storage-specific ones. For dynamic behavior, select engine/storage in extras and provide corresponding keys.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_provider_ducklake-0.0.5.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_provider_ducklake-0.0.5-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file airflow_provider_ducklake-0.0.5.tar.gz.

File metadata

File hashes

Hashes for airflow_provider_ducklake-0.0.5.tar.gz
Algorithm Hash digest
SHA256 c9bc0bfd0d493cd2c21e5e951e4082114edfefb767e1ff5247c14c19f8c65bfd
MD5 09979494daabcca0ca3175820875db62
BLAKE2b-256 ed8be45b88c59d9aab44f81590bdd8ecca7445668599aa5138b9e88061b00f1d

See more details on using hashes here.

File details

Details for the file airflow_provider_ducklake-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_provider_ducklake-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3c9b1af38652a2a3718b55c71453668f4295a1bac93e0deb3139bd690db09630
MD5 3e2128e2d501faedbb73bf5e342e1f71
BLAKE2b-256 985b96bf04a89533f08d440e73791620fcc1a76cc7cde99c1c1d4a5fb6b019d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page