Skip to main content

The ClickZetta adapter plugin for dbt

Project description

dbt-clickzetta

The dbt adapter for ClickZetta Lakehouse.

查看 examples/ 目录获取各功能的完整示例。

Installation

pip install dbt-clickzetta

Requires Python 3.8+ and dbt-core 1.8+.

Quickstart

1. Configure profiles.yml

my_project:
  target: dev
  outputs:
    dev:
      type: clickzetta
      service: cn-shanghai-alicloud.api.clickzetta.com
      instance: your_instance
      workspace: your_workspace
      username: your_username
      password: your_password
      schema: your_schema
      vcluster: default_ap

2. Test connection

dbt debug

3. Run your project

dbt run
dbt test
dbt docs generate

Supported Features

Feature Supported
table materialization
view materialization
incremental materialization
ephemeral materialization
snapshot (SCD Type 2)
dynamic_table materialization
materialized_view materialization
dbt test (generic + singular)
dbt seed
dbt docs generate ✅ (含行数、大小、最后修改时间)
dbt source freshness
persist_docs (relation + columns)
Partitioned tables
Clustered tables
Python models
on_schema_change ✅ (append_new_columns, sync_all_columns)
grants
clone materialization ✅ (零拷贝克隆 + Time Travel 克隆)
Indexes (Bloomfilter / Inverted / Vector) ✅ (通过 indexes config 自动创建)
Table Stream as source ✅ (通过 sources.yml 声明,source() 引用)
VCluster per-model 切换 ✅ (通过 vcluster config)

Incremental Strategies

Strategy Description
merge (default) MERGE INTO with unique_key
append INSERT INTO without deduplication
insert_overwrite INSERT OVERWRITE with dynamic partition mode
delete+insert DELETE matching keys then INSERT, suitable for partition replacement without a primary key
{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key='id'
) }}

Indexes

支持 Bloomfilter、Inverted、Vector 三种索引,建表后自动创建:

{{ config(
    materialized='table',
    indexes=[
        {'type': 'bloomfilter', 'columns': ['order_id']},
        {'type': 'inverted', 'columns': ['status'], 'analyzer': 'unicode'},
        {'type': 'vector', 'columns': ['embedding'], 'distance_function': 'cosine_distance', 'scalar_type': 'f32'}
    ]
) }}

VCluster per-model

为单个模型指定计算集群,实现大小模型资源隔离:

{{ config(
    materialized='table',
    vcluster='large_ap'   -- 该模型使用 large_ap 集群运行
) }}

Utility Macros

通过 dbt run-operation 调用的运维宏:

# 小文件合并(高频增量写入后使用)
dbt run-operation optimize_table --args '{relation: my_schema.my_table}'
dbt run-operation optimize_table --args '{relation: my_schema.my_table, where: "dt >= current_date() - interval 7 days"}'

# 切换 VCluster
dbt run-operation use_vcluster --args '{vcluster: large_ap}'

# 查看可恢复的已删除对象
dbt run-operation show_tables_history --args '{schema: my_schema}'

# 恢复误删对象(支持普通表、动态表、物化视图、Table Stream)
dbt run-operation undrop --args '{relation: my_schema.my_table}'

# 删除对象(type: table | view | dynamic_table | materialized_view | stream)
dbt run-operation drop_relation --args '{relation: my_schema.my_table, type: table}'

# 手动刷新动态表
dbt run-operation refresh_dynamic_table --args '{model_name: my_dynamic_table}'

Dynamic Table

{{ config(
    materialized='dynamic_table',
    refresh_interval='5 minutes',
    refresh_vc='default_ap'
) }}
select id, name, amount
from {{ ref('orders') }}

After creation, the table is automatically refreshed once (equivalent to Snowflake's initialize=ON_CREATE). Subsequent refreshes run on the configured interval.

Snapshot

Snapshots use standard dbt SCD Type 2 via MERGE INTO on regular tables (no delta/iceberg required).

{% snapshot orders_snapshot %}
{{ config(
    target_schema='snapshots',
    unique_key='id',
    strategy='timestamp',
    updated_at='updated_at'
) }}
select * from {{ source('raw', 'orders') }}
{% endsnapshot %}

Connection Parameters

Parameter Required Description
type Must be clickzetta
service API endpoint, e.g. cn-shanghai-alicloud.api.clickzetta.com
instance Instance name
workspace Workspace name
username Username
password Password
schema Default schema
vcluster VCluster name, e.g. default_ap
connect_retries Connection retry count (default: 3)

Known Limitations

限制 说明
HAVINGGROUP BY ClickZetta 支持无 GROUP BYHAVING,但 SELECT 中必须包含聚合函数。SELECT 只有常量或普通列时会报错。写 dbt test 时用子查询 + WHERE 替代。
SHOW GRANTS 在 dbt generic test 中不可用 dbt generic test 会将 SQL 包裹在 select count(*) from (...) 中,而 SHOW GRANTS 不支持被这种方式包装。需用 run_query + {% if execute %} 的 singular test 方式验证权限。注意:ClickZetta 大多数 SHOW 命令支持子查询,SHOW GRANTS 是例外。
动态表不支持修改 SQL 定义 支持 ALTER DYNAMIC TABLE 的 suspend / resume / rename column / set comment,但不支持修改查询 SQL 或刷新间隔。需变更定义时使用 dbt run --full-refresh 重建。
物化视图 CREATE OR REPLACE 有限制 不能直接 CREATE OR REPLACE MATERIALIZED VIEW,需要特定参数组合才能使用。dbt 的处理方式是先 DROPCREATE,期间视图短暂不可查询。

Development

# Clone
git clone https://github.com/clickzetta/dbt-clickzetta.git
cd dbt-clickzetta

# Install in editable mode
pip install -e .

# Run unit tests
pip install pytest
pytest tests/unit/

# Run functional tests (requires a real Lakehouse connection)
cp test.env.example test.env
# Fill in test.env with your connection details
pytest tests/functional/

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_clickzetta-1.5.2.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_clickzetta-1.5.2-py3-none-any.whl (39.3 kB view details)

Uploaded Python 3

File details

Details for the file dbt_clickzetta-1.5.2.tar.gz.

File metadata

  • Download URL: dbt_clickzetta-1.5.2.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dbt_clickzetta-1.5.2.tar.gz
Algorithm Hash digest
SHA256 60b586bd694097357ef26c5e64bab42a6873a2c698163c09a9cb735962c80408
MD5 2665e385ce0e0da17edf037faf9c3acb
BLAKE2b-256 53adcd7f7bd6c442beaafa11888a8575481b4ecf75c8838d0c8da3ece1b7a18e

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_clickzetta-1.5.2.tar.gz:

Publisher: release.yml on clickzetta/dbt-clickzetta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbt_clickzetta-1.5.2-py3-none-any.whl.

File metadata

  • Download URL: dbt_clickzetta-1.5.2-py3-none-any.whl
  • Upload date:
  • Size: 39.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dbt_clickzetta-1.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 999fe3996d1d2251b33096e86866c02f98a01ae9387c3781c9b31fe5ffe9774e
MD5 908875ee265e0074dc180b8e0fc9f680
BLAKE2b-256 7df4b51925bf554d2f8cad5663c7bbab910b1f17968bbe1cfa6ed32e95282563

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_clickzetta-1.5.2-py3-none-any.whl:

Publisher: release.yml on clickzetta/dbt-clickzetta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page