Skip to main content

Curate a Semantic Model for ClickZetta Lakehouse

Project description

semantic-model-generator

The ClickZetta Semantic Model Generator is a Streamlit companion built for ClickZetta teams. Use it to explore Lakehouse metadata, author and refine semantic YAML, and plug into partner workflows—everything runs against ClickZetta’s Lakehouse APIs and backed volumes by default.

Requirements

  • Python 3.11
  • Access to a ClickZetta workspace (service URL, instance, workspace, schema, vcluster, username, password)
  • A connections.json file in one of the standard ClickZetta locations (~/.clickzetta/connections.json, config/connections.json, config/lakehouse_connection/connections.json, or /app/.clickzetta/lakehouse_connection/connections.json). The structure matches the template from mcp-clickzetta-server. Set "is_default": true for the connection the app should use.
{
  "system_config": {
    "embedding": {
      "provider": "dashscope",
      "dashscope": {
        "api_key": "dashscope_api_key",
        "model": "qwen-plus-latest"
      }
    }
  },
  "connections": [
    {
      "connection_name": "dev",
      "is_default": true,
      "service": "cn-shanghai-alicloud.api.clickzetta.com",
      "instance": "your_instance",
      "workspace": "quick_start",
      "schema": "PUBLIC",
      "username": "user",
      "password": "password",
      "vcluster": "default_ap"
    }
  ]
}

Environment variables such as CLICKZETTA_SERVICE, CLICKZETTA_USERNAME, etc. override the JSON values when present.

App Overview

The Streamlit homepage highlights how to use this toolkit alongside the ClickZetta platform:

  • Local companion for semantic modeling. Iterate quickly on YAML, inspect metadata, and validate changes before promoting them back to your Lakehouse.
  • Keep production work in ClickZetta. Build and manage canonical models in the ClickZetta console, then switch to this app when you need richer editing, partner integrations, or AI enrichment—both share the same volumes for frictionless workflows.
  • Why semantics matter. A curated semantic layer standardizes measures, joins, and business logic so LLMs understand context, avoid hallucinations, and deliver consistent analytics for data teams and business users.
  • Typical workflows covered: author and refine models from table metadata, safely edit existing YAML with ClickZetta validation, generate/test SQL through the chat assistant, and auto-enrich documentation via DashScope.
  • Use it as a sandbox. Pull models from a volume, experiment with the editor and chat assistant, then push the refined YAML back once it passes validation.

Semantic model generator architecture

Installation from Docker

If you prefer not to install Python dependencies locally, pull the published Docker image:

docker pull czqiliang/semantic-model-generator:latest
docker run --rm -p 8501:8501 \
  -v $(pwd)/connections.json:/app/.clickzetta/connections.json \
  czqiliang/semantic-model-generator:latest

Mount your connections.json (see example below) so the container can pick up ClickZetta and DashScope credentials.

The Streamlit UI will be available at http://localhost:8501.

Docker Compose example (docker-compose.yml):

version: "3.9"
services:
  app:
    image: czqiliang/semantic-model-generator:latest
    ports:
      - "8501:8501"
    volumes:
      - ~/.clickzetta:/app/.clickzetta         # macOS 默认配置挂载

Run it with docker compose up.

Linux 主机通常会把 ClickZetta 配置放在 /opt/clickzetta 下,Compose 配置可以改成:

    volumes:
      - /opt/clickzetta:/app/.clickzetta

or Installation from source code

# optional: conda env using environment.yml
conda env create -f environment.yml
conda activate clickzetta_env

# or install via poetry/pip
poetry install
# pip install .

The app depends on clickzetta-connector-python and clickzetta-zettapark-python; ensure they are installed via the commands above.

Running the Streamlit app

# inside the Poetry environment
poetry run streamlit run app.py

# or, after activating the env, run:
python -m streamlit run app.py

When the app launches it will:

  1. Load credentials from the ClickZetta connection config or environment.
  2. Default file operations to volume:user://~/semantic_models/ inside your user volume.
  3. Provide workflows for generating semantic YAML, editing YAML, validating (basic checks), and importing partner specs (dbt, etc.).

DashScope 使用提示

  • 语义补全调用 DashScope 官方 SDK 默认端点,无需也无法通过 base_url 重写。
  • 即便在 connections.json 或环境变量里设置 DASHSCOPE_BASE_URL/兼容端点,应用也不会使用这些值。
  • 仍需提供 DASHSCOPE_API_KEY 与模型名称(如 qwen-plus);其他参数保持默认即可避免常见 InvalidParameter: url error
  • 仅当你明确需要 OpenAI 兼容模式时才应使用兼容端点;当前 Streamlit 应用未对兼容端点提供支持。

Key behaviours

  • Volume-first uploads: YAML import/export uses the user volume path volume:user://~/semantic_models/ unless a different volume/stage is selected.
  • Metadata discovery: Workspace metadata (catalogs, schemas, tables) is fetched via ClickZetta INFORMATION_SCHEMA queries. Sample values and comments are collected using ClickZetta sessions.
  • Partner integrations: dbt helpers read YAML from the chosen volume/stage, merge metadata, and reuse ClickZetta credentials.
  • Chat/validation placeholders: Cortex-specific validation and chat calls are not yet available in ClickZetta mode; the UI will display placeholders instead of calling external services.

Development scripts

Useful commands while iterating:

make setup        # install dependencies
make run_admin_app
make fmt_lint     # format + lint
make test         # execute pytest suite
make docker-buildx       # build multi-arch Docker image (linux/amd64, linux/arm64)
make docker-buildx-push  # build and push multi-arch image

License

Apache 2.0 / BSD (dual license) – see LICENSE and LEGAL files for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clickzetta_semantic_model_generator-1.0.26.tar.gz (113.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file clickzetta_semantic_model_generator-1.0.26.tar.gz.

File metadata

File hashes

Hashes for clickzetta_semantic_model_generator-1.0.26.tar.gz
Algorithm Hash digest
SHA256 e0be59756fbe071b5a2dd5a1bd4970496232c90599671ba5a9104ccc2cefd2e7
MD5 383125963dcdbce15a2aac6c68e30984
BLAKE2b-256 569843d9aea27aaac1ba5c291b262cd348fec3e16b35e615b13744b4e3fb86fa

See more details on using hashes here.

File details

Details for the file clickzetta_semantic_model_generator-1.0.26-py3-none-any.whl.

File metadata

File hashes

Hashes for clickzetta_semantic_model_generator-1.0.26-py3-none-any.whl
Algorithm Hash digest
SHA256 c3aef43b5551863bccb86f6d1490a6c2bbe4db070cdbf1b89f0639170b915cac
MD5 971650ac2f48265002b8fc1bc22d7a00
BLAKE2b-256 d284daab727deb3eea6dba94d9bf7c2eabab867728ac727b7e5f886b1a911337

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page