Skip to main content

Curate a Semantic Model for ClickZetta Lakehouse

Project description

semantic-model-generator

The ClickZetta Semantic Model Generator is a Streamlit companion built for ClickZetta teams. Use it to explore Lakehouse metadata, author and refine semantic YAML, and plug into partner workflows—everything runs against ClickZetta’s Lakehouse APIs and backed volumes by default.

Requirements

  • Python 3.11
  • Access to a ClickZetta workspace (service URL, instance, workspace, schema, vcluster, username, password)
  • A connections.json file in one of the standard ClickZetta locations (~/.clickzetta/connections.json, config/connections.json, config/lakehouse_connection/connections.json, or /app/.clickzetta/lakehouse_connection/connections.json). The structure matches the template from mcp-clickzetta-server. Set "is_default": true for the connection the app should use.
{
  "system_config": {
    "embedding": {
      "provider": "dashscope",
      "dashscope": {
        "api_key": "dashscope_api_key",
        "model": "qwen-plus-latest"
      }
    }
  },
  "connections": [
    {
      "connection_name": "dev",
      "is_default": true,
      "service": "cn-shanghai-alicloud.api.clickzetta.com",
      "instance": "your_instance",
      "workspace": "quick_start",
      "schema": "PUBLIC",
      "username": "user",
      "password": "password",
      "vcluster": "default_ap"
    }
  ]
}

Environment variables such as CLICKZETTA_SERVICE, CLICKZETTA_USERNAME, etc. override the JSON values when present.

App Overview

The Streamlit homepage highlights how to use this toolkit alongside the ClickZetta platform:

  • Local companion for semantic modeling. Iterate quickly on YAML, inspect metadata, and validate changes before promoting them back to your Lakehouse.
  • Keep production work in ClickZetta. Build and manage canonical models in the ClickZetta console, then switch to this app when you need richer editing, partner integrations, or AI enrichment—both share the same volumes for frictionless workflows.
  • Why semantics matter. A curated semantic layer standardizes measures, joins, and business logic so LLMs understand context, avoid hallucinations, and deliver consistent analytics for data teams and business users.
  • Typical workflows covered: author and refine models from table metadata, safely edit existing YAML with ClickZetta validation, generate/test SQL through the chat assistant, and auto-enrich documentation via DashScope.
  • Use it as a sandbox. Pull models from a volume, experiment with the editor and chat assistant, then push the refined YAML back once it passes validation.

Semantic model generator architecture

Installation from Docker

If you prefer not to install Python dependencies locally, pull the published Docker image:

docker pull czqiliang/semantic-model-generator:latest
docker run --rm -p 8501:8501 \
  -v $(pwd)/connections.json:/app/.clickzetta/connections.json \
  czqiliang/semantic-model-generator:latest

Mount your connections.json (see example below) so the container can pick up ClickZetta and DashScope credentials.

The Streamlit UI will be available at http://localhost:8501.

Docker Compose example (docker-compose.yml):

version: "3.9"
services:
  app:
    image: czqiliang/semantic-model-generator:latest
    ports:
      - "8501:8501"
    volumes:
      - ~/.clickzetta:/app/.clickzetta         # macOS 默认配置挂载

Run it with docker compose up.

Linux 主机通常会把 ClickZetta 配置放在 /opt/clickzetta 下,Compose 配置可以改成:

    volumes:
      - /opt/clickzetta:/app/.clickzetta

or Installation from source code

# optional: conda env using environment.yml
conda env create -f environment.yml
conda activate clickzetta_env

# or install via poetry/pip
poetry install
# pip install .

The app depends on clickzetta-connector-python and clickzetta-zettapark-python; ensure they are installed via the commands above.

Running the Streamlit app

# inside the Poetry environment
poetry run streamlit run app.py

# or, after activating the env, run:
python -m streamlit run app.py

When the app launches it will:

  1. Load credentials from the ClickZetta connection config or environment.
  2. Default file operations to volume:user://~/semantic_models/ inside your user volume.
  3. Provide workflows for generating semantic YAML, editing YAML, validating (basic checks), and importing partner specs (dbt, etc.).

DashScope 使用提示

  • 语义补全调用 DashScope 官方 SDK 默认端点,无需也无法通过 base_url 重写。
  • 即便在 connections.json 或环境变量里设置 DASHSCOPE_BASE_URL/兼容端点,应用也不会使用这些值。
  • 仍需提供 DASHSCOPE_API_KEY 与模型名称(如 qwen-plus);其他参数保持默认即可避免常见 InvalidParameter: url error
  • 仅当你明确需要 OpenAI 兼容模式时才应使用兼容端点;当前 Streamlit 应用未对兼容端点提供支持。

Key behaviours

  • Volume-first uploads: YAML import/export uses the user volume path volume:user://~/semantic_models/ unless a different volume/stage is selected.
  • Metadata discovery: Workspace metadata (catalogs, schemas, tables) is fetched via ClickZetta INFORMATION_SCHEMA queries. Sample values and comments are collected using ClickZetta sessions.
  • Partner integrations: dbt helpers read YAML from the chosen volume/stage, merge metadata, and reuse ClickZetta credentials.
  • Chat/validation placeholders: Cortex-specific validation and chat calls are not yet available in ClickZetta mode; the UI will display placeholders instead of calling external services.

Development scripts

Useful commands while iterating:

make setup        # install dependencies
make run_admin_app
make fmt_lint     # format + lint
make test         # execute pytest suite
make docker-buildx       # build multi-arch Docker image (linux/amd64, linux/arm64)
make docker-buildx-push  # build and push multi-arch image

License

Apache 2.0 / BSD (dual license) – see LICENSE and LEGAL files for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clickzetta_semantic_model_generator-1.0.54.tar.gz (127.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file clickzetta_semantic_model_generator-1.0.54.tar.gz.

File metadata

File hashes

Hashes for clickzetta_semantic_model_generator-1.0.54.tar.gz
Algorithm Hash digest
SHA256 2f8b067a2e020a95fcd25600bbe67f252e8aa5925d2906e0c268d2b1ba763a9b
MD5 cd29a9a9c9517cd5e062471eff567ac5
BLAKE2b-256 35ae19b2826969e05d37f7d2c73529508aca7831bc985e4d8fe913f6a0066ec9

See more details on using hashes here.

File details

Details for the file clickzetta_semantic_model_generator-1.0.54-py3-none-any.whl.

File metadata

File hashes

Hashes for clickzetta_semantic_model_generator-1.0.54-py3-none-any.whl
Algorithm Hash digest
SHA256 ba5bb27c77c4d806dd90055f93be06bc74a14e7539b26feebc1fb4ca8a5e0833
MD5 91ae925dd7ad45a29303d550001936d1
BLAKE2b-256 5fd6a3e85cf3fac512e241317771813d09b839159d5446cd61420a99856b4c65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page