Skip to main content

ClickZetta database adapter for Datus

Project description

Datus ClickZetta Adapter

This package provides a ClickZetta Lakehouse adapter for Datus, enabling seamless integration with ClickZetta analytics platform.

ClickZetta is developed by Singdata and Yunqi.

Installation

pip install datus-clickzetta

Dependencies

This adapter requires the following ClickZetta Python packages:

  • clickzetta-connector-python
  • clickzetta-zettapark-python

Configuration

Configure ClickZetta connection in your Datus configuration. A complete example is available at examples/agent.clickzetta.yml.example.

namespace:
  clickzetta_prod:
    type: clickzetta
    service: "your-service-endpoint.clickzetta.com"
    username: "your-username"
    password: "your-password"
    instance: "your-instance-id"
    workspace: "your-workspace"
    schema: "PUBLIC"  # optional, defaults to PUBLIC
    vcluster: "DEFAULT_AP"  # optional, defaults to DEFAULT_AP
    secure: false  # optional

Configuration Parameters

Parameter Type Required Default Description
service string Yes - ClickZetta service endpoint
username string Yes - ClickZetta username
password string Yes - ClickZetta password
instance string Yes - ClickZetta instance identifier
workspace string Yes - ClickZetta workspace name
schema string No "PUBLIC" Default schema name
vcluster string No "DEFAULT_AP" Virtual cluster name
secure boolean No null Enable secure connection
hints object No {} Additional connection hints
extra object No {} Extra connection parameters

Features

  • Full SQL Support: Execute queries, DDL, DML operations
  • Metadata Discovery: Automatic discovery of databases, schemas, tables, and views
  • Volume Integration: Read files from ClickZetta volumes
  • Sample Data: Extract sample rows for data profiling
  • Connection Management: Automatic connection pooling and session management

Usage

Once installed and configured, use the ClickZetta adapter with Datus:

# Execute queries
result = agent.query("SELECT * FROM my_table LIMIT 10")

# Get table information
tables = agent.get_tables("my_schema")

Volume Operations

The adapter supports reading files from ClickZetta volumes:

# Read a file from a volume
content = connector.read_volume_file("volume:user://my_volume", "path/to/file.yaml")

# List files in a volume directory
files = connector.list_volume_files("volume:user://my_volume", "config/", suffixes=(".yaml", ".yml"))

Connection Hints

You can customize ClickZetta connection behavior using hints:

namespace:
  clickzetta_prod:
    type: clickzetta
    # ... other connection parameters
    hints:
      sdk.job.timeout: 600
      query_tag: "Datus Analytics Query"
      cz.storage.parquet.vector.index.read.memory.cache: "true"

Error Handling

The adapter provides comprehensive error handling with detailed error messages for common issues:

  • Connection failures
  • Authentication errors
  • Query execution errors
  • Schema/workspace switching limitations

Development

Development Mode Setup (Complete Guide)

This guide covers the complete setup from Datus agent installation to ClickZetta adapter development and testing.

Prerequisites

  • Python 3.11+ recommended
  • Git

Step 1: Setup Datus Agent Development Environment

# Clone the Datus agent repository
git clone https://github.com/Datus-ai/datus-agent.git
cd datus-agent

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install Datus agent in editable mode
pip install -e .

Step 2: Clone and Install ClickZetta Adapter

# From your development directory
git clone https://github.com/Datus-ai/datus-db-adapters.git
cd datus-db-adapters/datus-clickzetta

# Install ClickZetta adapter in editable mode (using the same virtual environment)
pip install -e .

# Verify installation
pip show datus-clickzetta

Step 3: Configure Environment Variables

Create a .env file or set environment variables:

# ClickZetta connection settings
export CLICKZETTA_SERVICE="your-service.clickzetta.com"
export CLICKZETTA_USERNAME="your-username"
export CLICKZETTA_PASSWORD="your-password"
export CLICKZETTA_INSTANCE="your-instance-id"
export CLICKZETTA_WORKSPACE="your-workspace"
export CLICKZETTA_SCHEMA="your-schema"
export CLICKZETTA_VCLUSTER="default_ap"

# LLM API keys (optional for testing)
export DASHSCOPE_API_KEY="your-dashscope-key"
export DEEPSEEK_API_KEY="your-deepseek-key"

Step 4: Create ClickZetta Configuration

In your Datus agent directory, create a ClickZetta configuration file using the provided example:

# In datus-agent directory
cp ../datus-db-adapters/datus-clickzetta/examples/agent.clickzetta.yml.example conf/agent.clickzetta.yml

Update conf/agent.clickzetta.yml with ClickZetta settings:

agent:
  target: qwen_main  # or your preferred model
  home: .datus_home

  models:
    # Your model configurations here

  namespace:
    clickzetta:
      type: clickzetta
      service: ${CLICKZETTA_SERVICE}
      username: ${CLICKZETTA_USERNAME}
      password: ${CLICKZETTA_PASSWORD}
      instance: ${CLICKZETTA_INSTANCE}
      workspace: ${CLICKZETTA_WORKSPACE}
      schema: ${CLICKZETTA_SCHEMA}
      vcluster: ${CLICKZETTA_VCLUSTER}
      secure: false

Step 5: Start Development and Testing

Test the adapter directly:

# From datus-clickzetta directory
python -c "
from datus_clickzetta.connector import ClickZettaConnector
import os

connector = ClickZettaConnector(
    service=os.getenv('CLICKZETTA_SERVICE'),
    username=os.getenv('CLICKZETTA_USERNAME'),
    password=os.getenv('CLICKZETTA_PASSWORD'),
    instance=os.getenv('CLICKZETTA_INSTANCE'),
    workspace=os.getenv('CLICKZETTA_WORKSPACE'),
    schema=os.getenv('CLICKZETTA_SCHEMA'),
    vcluster=os.getenv('CLICKZETTA_VCLUSTER'),
    secure=False
)

result = connector.execute('SHOW SCHEMAS')
print(f'Connected! Found {result.row_count} schemas')
"

Start Datus CLI with ClickZetta:

# From datus-agent directory
python -m datus.cli.main --config conf/agent.clickzetta.yml --namespace clickzetta

Step 6: Development Workflow

Making Changes:

  1. Edit code in datus-clickzetta/datus_clickzetta/connector.py
  2. Changes are immediately available (editable install)
  3. No need to reinstall the package

Testing Changes:

# Run adapter tests
cd datus-clickzetta
python test.py

# Test with Datus CLI
cd ../datus-agent
python -m datus.cli.main --config conf/agent.clickzetta.yml --namespace clickzetta

Commit and Push:

# From adapter directory
git add .
git commit -m "Your changes"
git push origin your-branch

# From agent directory (if you made agent changes)
git add .
git commit -m "Your agent changes"
git push origin your-branch

Directory Structure

your-dev-folder/
├── datus-agent/                    # Datus agent repository
│   ├── .venv/                     # Shared virtual environment
│   ├── conf/agent.clickzetta.yml  # ClickZetta configuration
│   └── ...
└── datus-db-adapters/             # Adapters repository
    └── datus-clickzetta/          # ClickZetta adapter
        ├── datus_clickzetta/
        │   └── connector.py       # Main connector code
        └── ...

Tips for Development

  • Editable Installs: Both packages are installed in editable mode, so code changes are immediate
  • Environment Variables: Use .env files for local development, environment variables for production
  • Testing: Always test both the adapter directly and through the Datus CLI
  • Debugging: Use logger.debug() statements; enable with DATUS_LOG_LEVEL=DEBUG

Contributing Guidelines

  1. Clone the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: python test.py
  5. Ensure code style compliance
  6. Submit a pull request

Common Development Issues

Import Errors:

  • Ensure both packages are installed in editable mode
  • Check virtual environment is activated

Connection Issues:

  • Verify environment variables are set
  • Test connection with the direct connector test above

CLI Issues:

  • Check configuration file syntax
  • Verify namespace configuration matches your environment

Testing

This adapter includes comprehensive test coverage with multiple test types and execution modes.

Test Structure

tests/
├── unit/                     # Unit tests for individual components
├── integration/
│   ├── conftest.py           # TPC-H fixtures and test data
│   ├── test_connector_integration.py  # Connector integration tests
│   └── test_tpch.py          # TPC-H benchmark tests
├── run_tests.py              # Main test runner with multiple modes
├── comprehensive_test.py     # Real connection testing script
└── conftest.py               # Shared test fixtures and configuration

Running Tests

Quick Start (from project root):

# Run all tests
python test.py

# Run specific test types
python test.py --mode unit          # Unit tests only (fastest)
python test.py --mode integration   # Integration tests only
python test.py --mode all          # All tests
python test.py --mode coverage     # Tests with coverage report

Advanced Usage (from tests/ directory):

cd tests

# Basic test execution
python run_tests.py --mode unit
python run_tests.py --mode integration -v

# Real connection testing (requires credentials)
python comprehensive_test.py

# Direct pytest usage
pytest unit/                    # Unit tests
pytest integration/             # Integration tests
pytest -k "test_config"        # Specific test patterns

TPC-H Integration Tests

TPC-H integration tests use a simplified TPC-H dataset (5 tables: region, nation, customer, orders, supplier) to validate end-to-end query execution, JOIN operations, aggregations, and multi-format output.

# Set ClickZetta credentials
export CLICKZETTA_SERVICE="your-service.clickzetta.com"
export CLICKZETTA_USERNAME="your-username"
export CLICKZETTA_PASSWORD="your-password"
export CLICKZETTA_INSTANCE="your-instance"
export CLICKZETTA_WORKSPACE="your-workspace"

# Initialize TPC-H test data
uv run python scripts/init_tpch_data.py

# Run TPC-H integration tests
uv run pytest tests/integration/test_tpch.py -m integration -v

# Clean re-init (drop and recreate tables)
uv run python scripts/init_tpch_data.py --drop

TPC-H Tables:

Table Rows Description
tpch_region 5 Standard TPC-H regions
tpch_nation 25 Standard TPC-H nations
tpch_customer 10 Simplified customer data
tpch_orders 15 Simplified order data
tpch_supplier 5 Simplified supplier data

Test Requirements

  • Unit Tests: No external dependencies, run with mocked components
  • Integration Tests: Mocked ClickZetta SDK, test connector logic
  • TPC-H Tests: Require actual ClickZetta credentials
  • Real Connection Tests: Require actual ClickZetta credentials

Environment Variables:

Variable Default Description
CLICKZETTA_SERVICE (required) ClickZetta service endpoint
CLICKZETTA_USERNAME (required) ClickZetta username
CLICKZETTA_PASSWORD (required) ClickZetta password
CLICKZETTA_INSTANCE (required) ClickZetta instance ID
CLICKZETTA_WORKSPACE (required) ClickZetta workspace
CLICKZETTA_SCHEMA PUBLIC Default schema
CLICKZETTA_VCLUSTER DEFAULT_AP Virtual cluster

Test Coverage

  • Configuration validation and error handling
  • SQL query execution and result processing
  • Metadata discovery (tables, views, schemas)
  • Connection management and lifecycle
  • Volume operations and file listing
  • Error handling and exception cases
  • TPC-H benchmark queries (JOINs, aggregations, multi-format output)

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Support

For issues and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datus_clickzetta-0.1.2.tar.gz (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datus_clickzetta-0.1.2-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file datus_clickzetta-0.1.2.tar.gz.

File metadata

  • Download URL: datus_clickzetta-0.1.2.tar.gz
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.4

File hashes

Hashes for datus_clickzetta-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3b2fc8e1d0bc6e185509034781dc20236a006847bba981d5e5655e0cdadf5ba6
MD5 93e3cafa3c32b2e4fc598ef4ef9f0054
BLAKE2b-256 ca5c2578d77eebc4d8aa5ee15b4b49a4eb2d9a610f95f57f274561bb51aa3582

See more details on using hashes here.

File details

Details for the file datus_clickzetta-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for datus_clickzetta-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bfab97359a6f98308d0167a60bfad8f65cb91917396c75e279c3eb4a8c36a6f9
MD5 e8797d29937aade5d6e5fd2011faf389
BLAKE2b-256 eab4c55b9f2b1fc179d9c002ba80175aacb2064f7e0bd81670756f8edebcfb4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page