Skip to main content

DuckLake provider for Apache Airflow (based on DuckDB)

Project description

DuckLake Provider for Apache Airflow

This is a custom provider for integrating DuckLake (based on DuckDB) with Apache Airflow.

DuckLake Configuration

The DuckLakeHook uses Airflow connection fields and extras to configure the connection. Standard fields are relabeled for common use:

  • Host: Used for metadata host (e.g., Postgres/MySQL host) or file path (e.g., for DuckDB/SQLite metadata file).
  • Login: Username (for Postgres/MySQL).
  • Password: Password (for Postgres/MySQL).
  • Schema: Metadata schema (defaults to 'duckdb').
  • Extra: JSON dict for all other settings (required for engine, storage_type, and conditional fields).

Example extras JSON (adjust based on engine and storage_type): { "engine": "postgres", "dbname": "my_ducklake", "pgdbname": "dev_nophiml_db", "storage_type": "s3", "s3_bucket": "your-s3-bucket", "s3_path": "your/s3/path/", "aws_access_key_id": "your-access-key-id", "aws_secret_access_key": "your-secret-access-key", "aws_region": "us-east-1", "install_extensions": ["spatial"], # Optional: Inherited from DuckDB provider "load_extensions": ["spatial"] # Optional }

Supported Engines (set in extras['engine'])

  • duckdb: Requires 'metadata_file' in extras or host as file path.
  • sqlite: Requires 'metadata_file' in extras or host as file path.
  • postgres: Requires host, login, password, and 'pgdbname' in extras.
  • mysql: Requires host, login, password, and 'mysqldbname' in extras.

Supported Storage Types (set in extras['storage_type'], default 's3')

  • s3: Requires 's3_bucket', 's3_path'; optional AWS creds.
  • azure: Requires 'azure_account_name', 'azure_container', 'azure_path'; optional connection_string.
  • gcs: Requires 'gcs_bucket', 'gcs_path'; optional service_account_key (JSON string).
  • local: Requires 'local_data_path'.

The UI shows core fields; use extras for engine/storage-specific ones. For dynamic behavior, select engine/storage in extras and provide corresponding keys.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_provider_ducklake-0.0.2.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_provider_ducklake-0.0.2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file airflow_provider_ducklake-0.0.2.tar.gz.

File metadata

File hashes

Hashes for airflow_provider_ducklake-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4f01531e93d0fb21c2959f39f3d01703a8477733eb7073b5a1220127887b8de6
MD5 051f8dd79b148b2274c85882211843aa
BLAKE2b-256 97403e99607f94ab80b8928811e4e05752b43a6590d262794d9f1ab949e5da2a

See more details on using hashes here.

File details

Details for the file airflow_provider_ducklake-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_provider_ducklake-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ff4d5c01f16d558ee6569b033fc5773c60ffda9740e08f4d04f2711ba578d472
MD5 b3a4d4bfaff51ccb854e885fe05b1922
BLAKE2b-256 8f6dbe39ed5cc2fa09d7b2f8f13f23ab89ccee116d33857b63b29eb5c59a6d2d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page