Skip to main content

The ClickZetta adapter plugin for dbt

Project description

dbt-clickzetta

The dbt adapter for ClickZetta Lakehouse.

查看 examples/ 目录获取各功能的完整示例。

Installation

pip install dbt-clickzetta

Requires Python 3.8+ and dbt-core 1.8+.

Quickstart

1. Configure profiles.yml

my_project:
  target: dev
  outputs:
    dev:
      type: clickzetta
      service: cn-shanghai-alicloud.api.clickzetta.com
      instance: your_instance
      workspace: your_workspace
      username: your_username
      password: your_password
      schema: your_schema
      vcluster: default_ap

2. Test connection

dbt debug

3. Run your project

dbt run
dbt test
dbt docs generate

Supported Features

Feature Supported
table materialization
view materialization
incremental materialization
ephemeral materialization
snapshot (SCD Type 2)
dynamic_table materialization
materialized_view materialization
dbt test (generic + singular)
dbt seed
dbt docs generate ✅ (含行数、大小、最后修改时间)
dbt source freshness
persist_docs (relation + columns)
Partitioned tables
Clustered tables
Python models
on_schema_change ✅ (append_new_columns, sync_all_columns)
grants
clone materialization ✅ (零拷贝克隆 + Time Travel 克隆)
Indexes (Bloomfilter / Inverted / Vector) ✅ (通过 indexes config 自动创建)
Table Stream as source ✅ (通过 sources.yml 声明,source() 引用)
VCluster per-model 切换 ✅ (通过 vcluster config)

Incremental Strategies

Strategy Description
merge (default) MERGE INTO with unique_key
append INSERT INTO without deduplication
insert_overwrite INSERT OVERWRITE with dynamic partition mode
delete+insert DELETE matching keys then INSERT, suitable for partition replacement without a primary key
{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key='id'
) }}

Indexes

支持 Bloomfilter、Inverted、Vector 三种索引,建表后自动创建:

{{ config(
    materialized='table',
    indexes=[
        {'type': 'bloomfilter', 'columns': ['order_id']},
        {'type': 'inverted', 'columns': ['status'], 'analyzer': 'unicode'},
        {'type': 'vector', 'columns': ['embedding'], 'distance_function': 'cosine_distance', 'scalar_type': 'f32'}
    ]
) }}

VCluster per-model

为单个模型指定计算集群,实现大小模型资源隔离:

{{ config(
    materialized='table',
    vcluster='large_ap'   -- 该模型使用 large_ap 集群运行
) }}

Utility Macros

通过 dbt run-operation 调用的运维宏:

# 小文件合并(高频增量写入后使用)
dbt run-operation optimize_table --args '{relation: my_schema.my_table}'
dbt run-operation optimize_table --args '{relation: my_schema.my_table, where: "dt >= current_date() - interval 7 days"}'

# 切换 VCluster
dbt run-operation use_vcluster --args '{vcluster: large_ap}'

# 查看可恢复的已删除对象
dbt run-operation show_tables_history --args '{schema: my_schema}'

# 恢复误删对象(支持普通表、动态表、物化视图、Table Stream)
dbt run-operation undrop --args '{relation: my_schema.my_table}'

# 删除对象(type: table | view | dynamic_table | materialized_view | stream)
dbt run-operation drop_object --args '{relation: my_schema.my_table, type: table}'

# 手动刷新动态表
dbt run-operation refresh_dynamic_table --args '{model_name: my_dynamic_table}'

Dynamic Table

{{ config(
    materialized='dynamic_table',
    refresh_interval='5 minutes',
    refresh_vc='default_ap'
) }}
select id, name, amount
from {{ ref('orders') }}

After creation, the table is automatically refreshed once (equivalent to Snowflake's initialize=ON_CREATE). Subsequent refreshes run on the configured interval.

Snapshot

Snapshots use standard dbt SCD Type 2 via MERGE INTO on regular tables (no delta/iceberg required).

{% snapshot orders_snapshot %}
{{ config(
    target_schema='snapshots',
    unique_key='id',
    strategy='timestamp',
    updated_at='updated_at'
) }}
select * from {{ source('raw', 'orders') }}
{% endsnapshot %}

Connection Parameters

Parameter Required Description
type Must be clickzetta
service API endpoint, e.g. cn-shanghai-alicloud.api.clickzetta.com
instance Instance name
workspace Workspace name
username Username
password Password
schema Default schema
vcluster VCluster name, e.g. default_ap
connect_retries Connection retry count (default: 3)

Known Limitations

限制 说明
HAVINGGROUP BY ClickZetta 支持无 GROUP BYHAVING,但 SELECT 中必须包含聚合函数。SELECT 只有常量或普通列时会报错。写 dbt test 时用子查询 + WHERE 替代。
SHOW GRANTS 在 dbt generic test 中不可用 dbt generic test 会将 SQL 包裹在 select count(*) from (...) 中,而 SHOW GRANTS 不支持被这种方式包装。需用 run_query + {% if execute %} 的 singular test 方式验证权限。注意:ClickZetta 大多数 SHOW 命令支持子查询,SHOW GRANTS 是例外。
动态表不支持修改 SQL 定义 支持 ALTER DYNAMIC TABLE 的 suspend / resume / rename column / set comment,但不支持修改查询 SQL 或刷新间隔。需变更定义时使用 dbt run --full-refresh 重建。
物化视图 CREATE OR REPLACE 有限制 不能直接 CREATE OR REPLACE MATERIALIZED VIEW,需要特定参数组合才能使用。dbt 的处理方式是先 DROPCREATE,期间视图短暂不可查询。

Development

# Clone
git clone https://github.com/clickzetta/dbt-clickzetta.git
cd dbt-clickzetta

# Install in editable mode
pip install -e .

# Run unit tests
pip install pytest
pytest tests/unit/

# Run functional tests (requires a real Lakehouse connection)
cp test.env.example test.env
# Fill in test.env with your connection details
pytest tests/functional/

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_clickzetta-1.6.0.tar.gz (32.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_clickzetta-1.6.0-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file dbt_clickzetta-1.6.0.tar.gz.

File metadata

  • Download URL: dbt_clickzetta-1.6.0.tar.gz
  • Upload date:
  • Size: 32.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dbt_clickzetta-1.6.0.tar.gz
Algorithm Hash digest
SHA256 09bc3972f1106fd68fbed3502c95ef4a62d5e60dc059ac59ff0dacaea07ffd5e
MD5 037c1886bc7a4874b2646c1743784928
BLAKE2b-256 a316e47f0956964de8cfbb0d52d51444db75ec7a526857040c16614426aa8229

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_clickzetta-1.6.0.tar.gz:

Publisher: release.yml on clickzetta/dbt-clickzetta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbt_clickzetta-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: dbt_clickzetta-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 39.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dbt_clickzetta-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55f74c8543c21d7518af92449e232e7ba90eb84540eb7b7796727884a2bd4252
MD5 3438977267c1e88249fbd9dbca55d95f
BLAKE2b-256 6517d420365768ea458730b2bcd3ba9224cc963d57c27cde816bfe1227453fe7

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_clickzetta-1.6.0-py3-none-any.whl:

Publisher: release.yml on clickzetta/dbt-clickzetta

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page