Skip to main content

DuckLake provider for Apache Airflow (based on DuckDB)

Project description

DuckLake Provider for Apache Airflow

This is a custom provider for integrating DuckLake (based on DuckDB) with Apache Airflow.

DuckLake Configuration

The DuckLakeHook uses Airflow connection fields and extras to configure the connection. Standard fields are relabeled for common use:

  • Host: Used for metadata host (e.g., Postgres/MySQL host) or file path (e.g., for DuckDB/SQLite metadata file).
  • Login: Username (for Postgres/MySQL).
  • Password: Password (for Postgres/MySQL).
  • Schema: Metadata schema (defaults to 'duckdb').
  • Extra: JSON dict for all other settings (required for engine, storage_type, and conditional fields).

Example extras JSON (adjust based on engine and storage_type): { "engine": "postgres", "dbname": "my_ducklake", "pgdbname": "dev_nophiml_db", "storage_type": "s3", "s3_bucket": "your-s3-bucket", "s3_path": "your/s3/path/", "aws_access_key_id": "your-access-key-id", "aws_secret_access_key": "your-secret-access-key", "aws_region": "us-east-1", "install_extensions": ["spatial"], # Optional: Inherited from DuckDB provider "load_extensions": ["spatial"] # Optional }

Supported Engines (set in extras['engine'])

  • duckdb: Requires 'metadata_file' in extras or host as file path.
  • sqlite: Requires 'metadata_file' in extras or host as file path.
  • postgres: Requires host, login, password, and 'pgdbname' in extras.
  • mysql: Requires host, login, password, and 'mysqldbname' in extras.

Supported Storage Types (set in extras['storage_type'], default 's3')

  • s3: Requires 's3_bucket', 's3_path'; optional AWS creds.
  • azure: Requires 'azure_account_name', 'azure_container', 'azure_path'; optional connection_string.
  • gcs: Requires 'gcs_bucket', 'gcs_path'; optional service_account_key (JSON string).
  • local: Requires 'local_data_path'.

The UI shows core fields; use extras for engine/storage-specific ones. For dynamic behavior, select engine/storage in extras and provide corresponding keys.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_provider_ducklake-0.0.1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_provider_ducklake-0.0.1-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file airflow_provider_ducklake-0.0.1.tar.gz.

File metadata

File hashes

Hashes for airflow_provider_ducklake-0.0.1.tar.gz
Algorithm Hash digest
SHA256 67bf71972824b14e634209638bfcc5ecd14ca062eec0425fa3a0b74a8bb089c6
MD5 07c2e9a57ff7eb7a49e3c6db9489ae82
BLAKE2b-256 b860f1b902aad1558dbbd675c88da4872ef817b441523152cc01d0f1ebc82d0e

See more details on using hashes here.

File details

Details for the file airflow_provider_ducklake-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_provider_ducklake-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd6743b014595c6a180eb7d476a9820417583eb104b4fb786ab40c2d256b8806
MD5 d1e5915b8fa640acbc5d419f349589cd
BLAKE2b-256 5e991e1f48babf3af47da0edb231c5dd7ed6793f0bd54660b459dc54bf47d688

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page