Skip to main content

Curate a Semantic Model for ClickZetta Lakehouse

Project description

semantic-model-generator

The ClickZetta Semantic Model Generator is a Streamlit companion built for ClickZetta teams. Use it to explore Lakehouse metadata, author and refine semantic YAML, and plug into partner workflows—everything runs against ClickZetta’s Lakehouse APIs and backed volumes by default.

Requirements

  • Python 3.11
  • Access to a ClickZetta workspace (service URL, instance, workspace, schema, vcluster, username, password)
  • A connections.json file in one of the standard ClickZetta locations (~/.clickzetta/connections.json, config/connections.json, config/lakehouse_connection/connections.json, or /app/.clickzetta/lakehouse_connection/connections.json). The structure matches the template from mcp-clickzetta-server. Set "is_default": true for the connection the app should use.
{
  "system_config": {
    "embedding": {
      "provider": "dashscope",
      "dashscope": {
        "api_key": "dashscope_api_key",
        "model": "qwen-plus-latest"
      }
    }
  },
  "connections": [
    {
      "connection_name": "dev",
      "is_default": true,
      "service": "cn-shanghai-alicloud.api.clickzetta.com",
      "instance": "your_instance",
      "workspace": "quick_start",
      "schema": "PUBLIC",
      "username": "user",
      "password": "password",
      "vcluster": "default_ap"
    }
  ]
}

Environment variables such as CLICKZETTA_SERVICE, CLICKZETTA_USERNAME, etc. override the JSON values when present.

App Overview

The Streamlit homepage highlights how to use this toolkit alongside the ClickZetta platform:

  • Local companion for semantic modeling. Iterate quickly on YAML, inspect metadata, and validate changes before promoting them back to your Lakehouse.
  • Keep production work in ClickZetta. Build and manage canonical models in the ClickZetta console, then switch to this app when you need richer editing, partner integrations, or AI enrichment—both share the same volumes for frictionless workflows.
  • Why semantics matter. A curated semantic layer standardizes measures, joins, and business logic so LLMs understand context, avoid hallucinations, and deliver consistent analytics for data teams and business users.
  • Typical workflows covered: author and refine models from table metadata, safely edit existing YAML with ClickZetta validation, generate/test SQL through the chat assistant, and auto-enrich documentation via DashScope.
  • Use it as a sandbox. Pull models from a volume, experiment with the editor and chat assistant, then push the refined YAML back once it passes validation.

Semantic model generator architecture

Installation from Docker

If you prefer not to install Python dependencies locally, pull the published Docker image:

docker pull czqiliang/semantic-model-generator:latest
docker run --rm -p 8501:8501 \
  -v $(pwd)/connections.json:/app/.clickzetta/connections.json \
  czqiliang/semantic-model-generator:latest

Mount your connections.json (see example below) so the container can pick up ClickZetta and DashScope credentials.

The Streamlit UI will be available at http://localhost:8501.

Docker Compose example (docker-compose.yml):

version: "3.9"
services:
  app:
    image: czqiliang/semantic-model-generator:latest
    ports:
      - "8501:8501"
    volumes:
      - ~/.clickzetta:/app/.clickzetta         # macOS 默认配置挂载

Run it with docker compose up.

Linux 主机通常会把 ClickZetta 配置放在 /opt/clickzetta 下,Compose 配置可以改成:

    volumes:
      - /opt/clickzetta:/app/.clickzetta

or Installation from source code

# optional: conda env using environment.yml
conda env create -f environment.yml
conda activate clickzetta_env

# or install via poetry/pip
poetry install
# pip install .

The app depends on clickzetta-connector-python and clickzetta-zettapark-python; ensure they are installed via the commands above.

Running the Streamlit app

# inside the Poetry environment
poetry run streamlit run app.py

# or, after activating the env, run:
python -m streamlit run app.py

When the app launches it will:

  1. Load credentials from the ClickZetta connection config or environment.
  2. Default file operations to volume:user://~/semantic_models/ inside your user volume.
  3. Provide workflows for generating semantic YAML, editing YAML, validating (basic checks), and importing partner specs (dbt, etc.).

DashScope 使用提示

  • 语义补全调用 DashScope 官方 SDK 默认端点,无需也无法通过 base_url 重写。
  • 即便在 connections.json 或环境变量里设置 DASHSCOPE_BASE_URL/兼容端点,应用也不会使用这些值。
  • 仍需提供 DASHSCOPE_API_KEY 与模型名称(如 qwen-plus);其他参数保持默认即可避免常见 InvalidParameter: url error
  • 仅当你明确需要 OpenAI 兼容模式时才应使用兼容端点;当前 Streamlit 应用未对兼容端点提供支持。

Key behaviours

  • Volume-first uploads: YAML import/export uses the user volume path volume:user://~/semantic_models/ unless a different volume/stage is selected.
  • Metadata discovery: Workspace metadata (catalogs, schemas, tables) is fetched via ClickZetta INFORMATION_SCHEMA queries. Sample values and comments are collected using ClickZetta sessions.
  • Partner integrations: dbt helpers read YAML from the chosen volume/stage, merge metadata, and reuse ClickZetta credentials.
  • Chat/validation placeholders: Cortex-specific validation and chat calls are not yet available in ClickZetta mode; the UI will display placeholders instead of calling external services.

Development scripts

Useful commands while iterating:

make setup        # install dependencies
make run_admin_app
make fmt_lint     # format + lint
make test         # execute pytest suite
make docker-buildx       # build multi-arch Docker image (linux/amd64, linux/arm64)
make docker-buildx-push  # build and push multi-arch image

License

Apache 2.0 / BSD (dual license) – see LICENSE and LEGAL files for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clickzetta_semantic_model_generator-1.0.45.tar.gz (125.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file clickzetta_semantic_model_generator-1.0.45.tar.gz.

File metadata

File hashes

Hashes for clickzetta_semantic_model_generator-1.0.45.tar.gz
Algorithm Hash digest
SHA256 543e7e5820d5146b0d9f31b7c251391b3a1bdcb7ff85d95923d34d927eb50fec
MD5 9d805a6423b846f72a8f82a9b638394d
BLAKE2b-256 f898a9edc6b58e5ff1ff0c68d638f5ae9e3be1ee033f9feacf1451e826370e7d

See more details on using hashes here.

File details

Details for the file clickzetta_semantic_model_generator-1.0.45-py3-none-any.whl.

File metadata

File hashes

Hashes for clickzetta_semantic_model_generator-1.0.45-py3-none-any.whl
Algorithm Hash digest
SHA256 f18f96b8674be687b794d535435539edf917684997ab40da6416f4c8e7ec7845
MD5 f1b9d2c7f5e033b852438a9d9c3ceefa
BLAKE2b-256 ec5b03567926a1f3d6da1289bfb23953b2166dbc839d7925f3b8f47544e8b4ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page