datus-agent-clickzetta

Dashscope-powered Datus agent with Clickzetta integrations

These details have not been verified by PyPI

Project links

Project description

🎯 Overview

Datus is an open-source data engineering agent that builds evolvable context for your data system.

Data engineering needs a shift from "building tables and pipelines" to "delivering scoped, domain-aware agents for analysts and business users.

DatusArchitecure

Datus-CLI: An AI-powered command-line interface for data engineers—think "Claude Code for data engineers." Write SQL, build subagents, and construct context interactively.
Datus-Chat: A web chatbot providing multi-turn conversations with built-in feedback mechanisms (upvotes, issue reports, success stories) for data analysts.
Datus-API: APIs for other agents or applications that need stable, accurate data services.
Semantic model–aware orchestration: preload MetricFlow-compatible YAML from ClickZetta volumes or local files and switch between semantic context and live schema linking per task.

🚀 Key Features

🧩 Contextual Data Engineering

Automatically builds a living semantic map of your company’s data — combining metadata, metrics, SQL history, and external knowledge — so engineers and analysts collaborate through context instead of raw SQL.

💬 Agentic Chat

A Claude-Code-like CLI for data engineers.
Chat with your data, recall tables or metrics instantly, and run agentic actions — all in one terminal.

🧠 Subagents for Every Domain

Turn data domains into domain-aware chatbots.
Each subagent encapsulates the right context, tools, and rules — making data access accurate, reusable, and safe.

🔁 Continuous Learning Loop

Every query and feedback improves the model.
Datus learns from success stories and user corrections to evolve reasoning accuracy over time.

🛠️ Developer Quickstart

Set up a local environment that uses Dashscope for LLM calls and Clickzetta as the data source:

Clone and install dependencies

git clone https://github.com/<your-org>/Datus-agent-clickzetta.git
cd Datus-agent-clickzetta
python3.11 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Create a .env file at the project root to store secrets:

DASHSCOPE_API_KEY=your_dashscope_key
DEEPSEEK_API_KEY=your_deepseek_key
CLICKZETTA_SERVICE=your_clickzetta_service
CLICKZETTA_USERNAME=your_clickzetta_username
CLICKZETTA_PASSWORD=your_clickzetta_password
CLICKZETTA_INSTANCE=your_clickzetta_instance
CLICKZETTA_WORKSPACE=your_clickzetta_workspace
CLICKZETTA_SCHEMA=your_clickzetta_schema
CLICKZETTA_VCLUSTER=your_clickzetta_vcluster

The entry points (datus-cli, python -m datus.main, datus/api/server.py) automatically load this file via python-dotenv, so no manual export is required. For shell-based workflows you can still run export $(grep -v '^#' .env | xargs) before launching the CLI.

Copy the Clickzetta configuration
```
cp conf/agent.clickzetta.yml.example conf/agent.clickzetta.yml
```
The example file ships with Dashscope/DeepSeek models, a clickzetta namespace, and a semantic_models block. Update that block to point at your preferred ClickZetta volume/directory (or disable allow_local_path if needed) so the agent knows where to pull YAML specs.

Start the CLI (or API)

mkdir -p .datus_home
DATUS_HOME=$(pwd)/.datus_home python -m datus.cli.main --config conf/agent.clickzetta.yml --namespace clickzetta
# optionally launch the API server
DATUS_HOME=$(pwd)/.datus_home python -m datus.api.server --config conf/agent.clickzetta.yml --namespace clickzetta

During !dastart you can now choose whether the workflow should load a semantic model (from the volume or a local file) or fall back to schema linking. Pick semantic_model for strict semantic prompting, auto for best-effort loading, or schema_linking if you only want live metadata.

(Optional) Preload a semantic model for the run
```
!lsm --dir semantic_models
!dastart
# Context source [auto|schema_linking|semantic_model]: semantic_model
# Semantic model volume/stage: volume:user://~/
# Semantic model directory (optional): semantic_models
# Semantic model filename (.yaml/.yml): retail_finance.yaml
```
After choosing an index the semantic model is loaded for chat/SQL generation. The load_semantic_model node fetches the YAML before schema linking starts, injects measures/dimensions into the SQL prompt, and only falls back to raw metadata if you select auto.

📚 Semantic Model Workflow

Configure defaults – in any agent config file include:

semantic_models:
  default_strategy: auto          # auto | schema_linking | semantic_model
  default_volume: volume:user://~/  # base ClickZetta user volume
  default_directory: semantic_models  # folder within the user volume
  allow_local_path: true          # set false to forbid direct filesystem reads
  prompt_max_length: 14000        # truncate long YAML snippets before prompting

Store YAML assets – upload either MetricFlow-style (semantic_models:) or Analyst-spec (tables:, relationships:, verified_queries:) semantic model files to your ClickZetta user volume (the default volume is volume:user://~/ with semantic_models/ as the directory, so subfolders like finance/ work naturally) or keep them on disk when allow_local_path is enabled. Use !list_semantic_models (alias !lsm) to browse and select the YAML you want to load for the current session.
Pick the context source per task – the CLI (and API) honour semantic_model, schema_linking, or auto selection, giving you deterministic prompts when a curated semantic spec is available.
Enjoy richer prompts – the SQL generator now includes a “Semantic Model Specification” section with logical tables, base table FQNs, dimensions, facts, table-level metrics, relationships, model metrics, and verified queries pulled directly from the YAML spec, reducing guesswork and improving query accuracy.
Automatic fallback – when the chosen semantic model cannot be read and the strategy is auto, the workflow transparently falls back to schema linking; if you picked semantic_model, the run stops early with a clear error so you can fix the path or permissions.

🧰 Installation

Requirements: Python >= 3.9 and Python <= 3.11, 3.11 is verified.

pip install datus-agent-clickzetta

datus-agent-clickzetta init  # 或使用 datus-agent init 兼容命令

For detailed installation instructions, see the Quickstart Guide.

🧭 User Journey

1️⃣ Initial Exploration

A Data Engineer (DE) starts by chatting with the database using /chat. They run simple questions, test joins, and refine prompts using @table or @file. Each round of feedback (e.g., "Join table1 and table2 by PK") helps the model improve accuracy. datus-cli --namespace demo /Check the top 10 bank by assets lost @Table duckdb-demo.main.bank_failures

Learn more: CLI Introduction

2️⃣ Building Context

The DE imports SQL history and semantic model YAMLs generated from the external toolchain (see semantic-model-generator). Using @subject they inspect or refine metrics, and /chat immediately benefits from the combined SQL history + semantic context.

Learn more: Knowledge Base Introduction

3️⃣ Creating a Subagent

When the context matures, the DE defines a domain-specific chatbot (Subagent):

.subagent add mychatbot

They describe its purpose, add rules, choose tools, and limit scope (e.g., 5 tables). Each subagent becomes a reusable, scoped assistant for a specific business area.

Learn more: Subagent Introduction

4️⃣ Delivering to Analysts

The Subagent is deployed to a web interface: http://localhost:8501/?subagent=mychatbot

Analysts chat directly, upvote correct answers, or report issues for feedback. Results can be saved via !export.

Learn more: Web Chatbot Introduction

5️⃣ Refinement & Iteration

Feedback from analysts loops back to improve the subagent: engineers fix SQL, add rules, and update context. Over time, the chatbot becomes more accurate, self-evolving, and domain-aware.

For detailed guidance, please follow our tutorial.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.2

Oct 30, 2025

0.2.1

Oct 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datus_agent_clickzetta-0.2.2.tar.gz (703.4 kB view details)

Uploaded Oct 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datus_agent_clickzetta-0.2.2-py3-none-any.whl (799.0 kB view details)

Uploaded Oct 30, 2025 Python 3

File details

Details for the file datus_agent_clickzetta-0.2.2.tar.gz.

File metadata

Download URL: datus_agent_clickzetta-0.2.2.tar.gz
Upload date: Oct 30, 2025
Size: 703.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for datus_agent_clickzetta-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`4ca70567f14e618855344937b58ecdab5af6ea5e161d5e085b0f22c80d1d054e`
MD5	`78d729d98e8846a88ac1907c1a01f4e5`
BLAKE2b-256	`82aa6ef394c16c326d7fe505d4147b24d9e0f706709198f34caac8bf42bcef80`

See more details on using hashes here.

File details

Details for the file datus_agent_clickzetta-0.2.2-py3-none-any.whl.

File metadata

Download URL: datus_agent_clickzetta-0.2.2-py3-none-any.whl
Upload date: Oct 30, 2025
Size: 799.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for datus_agent_clickzetta-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8491896ceaaa670e37ed8c641aef586caddc356737136983a37949815a18963e`
MD5	`20a1f0fd13513115e5714d53e0a2297c`
BLAKE2b-256	`6be813006baaa5ef3dba02a4d9d31bc8a3208bf1a3314170eef35f5c0eba70d7`

See more details on using hashes here.

datus-agent-clickzetta 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎯 Overview

🚀 Key Features

🧩 Contextual Data Engineering

💬 Agentic Chat

🧠 Subagents for Every Domain

🔁 Continuous Learning Loop

🛠️ Developer Quickstart

📚 Semantic Model Workflow

🧰 Installation

🧭 User Journey

1️⃣ Initial Exploration

2️⃣ Building Context

3️⃣ Creating a Subagent

4️⃣ Delivering to Analysts

5️⃣ Refinement & Iteration

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes