Curate a Semantic Model for ClickZetta Lakehouse
Project description
semantic-model-generator
The ClickZetta Semantic Model Generator is a Streamlit companion built for ClickZetta teams. Use it to explore Lakehouse metadata, author and refine semantic YAML, and plug into partner workflows—everything runs against ClickZetta’s Lakehouse APIs and backed volumes by default.
Requirements
- Python 3.11
- Access to a ClickZetta workspace (service URL, instance, workspace, schema, vcluster, username, password)
- A
connections.jsonfile in one of the standard ClickZetta locations (~/.clickzetta/connections.json,config/connections.json,config/lakehouse_connection/connections.json, or/app/.clickzetta/lakehouse_connection/connections.json). The structure matches the template frommcp-clickzetta-server. Set "is_default": true for the connection the app should use.
{
"system_config": {
"embedding": {
"provider": "dashscope",
"dashscope": {
"api_key": "dashscope_api_key",
"model": "qwen-plus-latest"
}
}
},
"connections": [
{
"connection_name": "dev",
"is_default": true,
"service": "cn-shanghai-alicloud.api.clickzetta.com",
"instance": "your_instance",
"workspace": "quick_start",
"schema": "PUBLIC",
"username": "user",
"password": "password",
"vcluster": "default_ap"
}
]
}
Environment variables such as CLICKZETTA_SERVICE, CLICKZETTA_USERNAME, etc. override the JSON values when present.
App Overview
The Streamlit homepage highlights how to use this toolkit alongside the ClickZetta platform:
- Local companion for semantic modeling. Iterate quickly on YAML, inspect metadata, and validate changes before promoting them back to your Lakehouse.
- Keep production work in ClickZetta. Build and manage canonical models in the ClickZetta console, then switch to this app when you need richer editing, partner integrations, or AI enrichment—both share the same volumes for frictionless workflows.
- Why semantics matter. A curated semantic layer standardizes measures, joins, and business logic so LLMs understand context, avoid hallucinations, and deliver consistent analytics for data teams and business users.
- Typical workflows covered: author and refine models from table metadata, safely edit existing YAML with ClickZetta validation, generate/test SQL through the chat assistant, and auto-enrich documentation via DashScope.
- Use it as a sandbox. Pull models from a volume, experiment with the editor and chat assistant, then push the refined YAML back once it passes validation.
Installation from Docker
If you prefer not to install Python dependencies locally, pull the published Docker image:
docker pull czqiliang/semantic-model-generator:latest
docker run --rm -p 8501:8501 \
-v $(pwd)/connections.json:/app/.clickzetta/connections.json \
czqiliang/semantic-model-generator:latest
Mount your connections.json (see example below) so the container can pick up ClickZetta and DashScope credentials.
The Streamlit UI will be available at http://localhost:8501.
Docker Compose example (docker-compose.yml):
version: "3.9"
services:
app:
image: czqiliang/semantic-model-generator:latest
ports:
- "8501:8501"
volumes:
- ~/.clickzetta:/app/.clickzetta # macOS 默认配置挂载
Run it with docker compose up.
Linux 主机通常会把 ClickZetta 配置放在 /opt/clickzetta 下,Compose 配置可以改成:
volumes:
- /opt/clickzetta:/app/.clickzetta
or Installation from source code
# optional: conda env using environment.yml
conda env create -f environment.yml
conda activate clickzetta_env
# or install via poetry/pip
poetry install
# pip install .
The app depends on clickzetta-connector-python and clickzetta-zettapark-python; ensure they are installed via the commands above.
Running the Streamlit app
# inside the Poetry environment
poetry run streamlit run app.py
# or, after activating the env, run:
python -m streamlit run app.py
When the app launches it will:
- Load credentials from the ClickZetta connection config or environment.
- Default file operations to
volume:user://~/semantic_models/inside your user volume. - Provide workflows for generating semantic YAML, editing YAML, validating (basic checks), and importing partner specs (dbt, etc.).
DashScope 使用提示
- 语义补全调用 DashScope 官方 SDK 默认端点,无需也无法通过
base_url重写。 - 即便在
connections.json或环境变量里设置DASHSCOPE_BASE_URL/兼容端点,应用也不会使用这些值。 - 仍需提供
DASHSCOPE_API_KEY与模型名称(如qwen-plus);其他参数保持默认即可避免常见InvalidParameter: url error。 - 仅当你明确需要 OpenAI 兼容模式时才应使用兼容端点;当前 Streamlit 应用未对兼容端点提供支持。
Key behaviours
- Volume-first uploads: YAML import/export uses the user volume path
volume:user://~/semantic_models/unless a different volume/stage is selected. - Metadata discovery: Workspace metadata (catalogs, schemas, tables) is fetched via ClickZetta INFORMATION_SCHEMA queries. Sample values and comments are collected using ClickZetta sessions.
- Partner integrations: dbt helpers read YAML from the chosen volume/stage, merge metadata, and reuse ClickZetta credentials.
- Chat/validation placeholders: Cortex-specific validation and chat calls are not yet available in ClickZetta mode; the UI will display placeholders instead of calling external services.
Development scripts
Useful commands while iterating:
make setup # install dependencies
make run_admin_app
make fmt_lint # format + lint
make test # execute pytest suite
make docker-buildx # build multi-arch Docker image (linux/amd64, linux/arm64)
make docker-buildx-push # build and push multi-arch image
License
Apache 2.0 / BSD (dual license) – see LICENSE and LEGAL files for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clickzetta_semantic_model_generator-1.0.26.tar.gz.
File metadata
- Download URL: clickzetta_semantic_model_generator-1.0.26.tar.gz
- Upload date:
- Size: 113.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.3 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0be59756fbe071b5a2dd5a1bd4970496232c90599671ba5a9104ccc2cefd2e7
|
|
| MD5 |
383125963dcdbce15a2aac6c68e30984
|
|
| BLAKE2b-256 |
569843d9aea27aaac1ba5c291b262cd348fec3e16b35e615b13744b4e3fb86fa
|
File details
Details for the file clickzetta_semantic_model_generator-1.0.26-py3-none-any.whl.
File metadata
- Download URL: clickzetta_semantic_model_generator-1.0.26-py3-none-any.whl
- Upload date:
- Size: 124.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.3 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3aef43b5551863bccb86f6d1490a6c2bbe4db070cdbf1b89f0639170b915cac
|
|
| MD5 |
971650ac2f48265002b8fc1bc22d7a00
|
|
| BLAKE2b-256 |
d284daab727deb3eea6dba94d9bf7c2eabab867728ac727b7e5f886b1a911337
|