An integration package connecting ClickZetta and LangChain

These details have not been verified by PyPI

Project description

LangChain ClickZetta

🚀 Enterprise-grade LangChain integration for ClickZetta - Unlock the power of cloud-native lakehouse with AI-driven SQL queries, high-performance vector search, and intelligent full-text retrieval in a unified platform.

🚀 Why ClickZetta + LangChain?

🏆 Unique Advantages

1. Native Lakehouse Architecture

ClickZetta's cloud-native lakehouse provides 10x performance improvement over traditional Spark-based architectures
Unified storage and compute for all data types (structured, semi-structured, unstructured)
Real-time incremental processing capabilities

2. True Hybrid Search in Single Table

Industry-first single-table hybrid search combining vector and full-text indexes
No complex joins or multiple tables needed - everything in one place
Atomic MERGE operations for consistent data updates

3. Enterprise-Grade Storage Services

Complete LangChain BaseStore implementation with sync/async support
Native Volume integration for binary file storage (models, embeddings)
SQL-queryable document storage with JSON metadata
Atomic UPSERT operations using ClickZetta's MERGE INTO

4. Advanced Chinese Language Support

Built-in Chinese text analyzers (IK, standard, keyword)
Optimized for bilingual (Chinese/English) AI applications
DashScope integration for state-of-the-art Chinese embeddings

5. Production-Ready Features

Connection pooling and query optimization
Comprehensive error handling and logging
Full test coverage (unit + integration)
Type-safe operations throughout

🛠️ Core Features

🧠 AI-Powered Query Interface

Natural Language to SQL: Convert questions to optimized ClickZetta SQL
Context-Aware: Understands table schemas and relationships
Bilingual Support: Works seamlessly with Chinese and English queries

🔍 Advanced Search Capabilities

Vector Search: High-performance embedding-based similarity search
Full-Text Search: Enterprise-grade inverted index with multiple analyzers
True Hybrid Search: Single-table combined vector + text search (industry first)
Metadata Filtering: Complex filtering with JSON metadata support

💾 Enterprise Storage Solutions

ClickZettaStore: High-performance key-value storage using SQL tables
ClickZettaDocumentStore: Structured document storage with queryable metadata
ClickZettaFileStore: Binary file storage using native ClickZetta Volume
ClickZettaVolumeStore: Direct Volume integration for maximum performance

🔄 Production-Grade Operations

Atomic UPSERT: MERGE INTO operations for data consistency
Batch Processing: Efficient bulk operations for large datasets
Connection Management: Pooling and automatic reconnection
Type Safety: Full type annotations and runtime validation

🎯 LangChain Compatibility

BaseStore Interface: 100% compatible with LangChain storage standards
Async Support: Full async/await pattern implementation
Chain Integration: Seamless integration with LangChain chains and agents
Memory Systems: Persistent chat history and conversation memory

Installation

From PyPI (Recommended)

pip install langchain-clickzetta

Development Installation

git clone https://github.com/yunqiqiliang/langchain-clickzetta.git
cd langchain-clickzetta/libs/clickzetta
pip install -e ".[dev]"

Local Installation from Source

git clone https://github.com/yunqiqiliang/langchain-clickzetta.git
cd langchain-clickzetta/libs/clickzetta
pip install .

Quick Start

Basic Setup

from langchain_clickzetta import ClickZettaEngine, ClickZettaSQLChain, ClickZettaVectorStore
from langchain_community.embeddings import DashScopeEmbeddings
from langchain_community.llms import Tongyi

# Initialize ClickZetta engine
# ClickZetta requires exactly 7 connection parameters
engine = ClickZettaEngine(
    service="your-service",
    instance="your-instance",
    workspace="your-workspace",
    schema="your-schema",
    username="your-username",
    password="your-password",
    vcluster="your-vcluster"  
)

# Initialize embeddings (DashScope recommended for Chinese/English support)
embeddings = DashScopeEmbeddings(
    dashscope_api_key="your-dashscope-api-key",
    model="text-embedding-v4"
)

# Initialize LLM
llm = Tongyi(dashscope_api_key="your-dashscope-api-key")

SQL Queries

# Create SQL chain
sql_chain = ClickZettaSQLChain.from_engine(
    engine=engine,
    llm=llm,
    return_sql=True
)

# Ask questions in natural language
result = sql_chain.invoke({
    "query": "How many users do we have in the database?"
})

print(result["result"])  # Natural language answer
print(result["sql_query"])  # Generated SQL query

Vector Storage

from langchain_core.documents import Document

# Create vector store
vector_store = ClickZettaVectorStore(
    engine=engine,
    embeddings=embeddings,
    table_name="my_vectors",
    vector_element_type="float"  # Options: float, int, tinyint
)

# Add documents
documents = [
    Document(
        page_content="ClickZetta is a high-performance analytics database.",
        metadata={"category": "database", "type": "analytics"}
    ),
    Document(
        page_content="LangChain enables building applications with LLMs.",
        metadata={"category": "framework", "type": "ai"}
    )
]

vector_store.add_documents(documents)

# Search for similar documents
results = vector_store.similarity_search(
    "What is ClickZetta?",
    k=2
)

for doc in results:
    print(doc.page_content)

Full-text Search

from langchain_clickzetta.retrievers import ClickZettaFullTextRetriever

# Create full-text retriever
retriever = ClickZettaFullTextRetriever(
    engine=engine,
    table_name="my_documents",
    search_type="phrase",
    k=5
)

# Add documents to search index
retriever.add_documents(documents)

# Search documents
results = retriever.get_relevant_documents("ClickZetta database")
for doc in results:
    print(f"Score: {doc.metadata.get('relevance_score', 'N/A')}")
    print(f"Content: {doc.page_content}")

True Hybrid Search (Single Table)

from langchain_clickzetta import ClickZettaHybridStore, ClickZettaUnifiedRetriever

# Create true hybrid store (single table with both vector + inverted indexes)
hybrid_store = ClickZettaHybridStore(
    engine=engine,
    embeddings=embeddings,
    table_name="hybrid_docs",
    text_analyzer="ik",  # Chinese text analyzer
    distance_metric="cosine"
)

# Add documents to hybrid store
documents = [
    Document(page_content="云器 Lakehouse 是由云器科技完全自主研发的新一代云湖仓。使用增量计算的数据计算引擎，性能可以提升至传统开源架构例如Spark的 10倍，实现了海量数据的全链路-低成本-实时化处理，为AI 创新提供了支持全类型数据整合、存储与计算的平台，帮助企业从传统的开源 Spark 体系升级到 AI 时代的数据基础设施。"),
    Document(page_content="LangChain enables building LLM applications")
]
hybrid_store.add_documents(documents)

# Create unified retriever for hybrid search
retriever = ClickZettaUnifiedRetriever(
    hybrid_store=hybrid_store,
    search_type="hybrid",  # "vector", "fulltext", or "hybrid"
    alpha=0.5,  # Balance between vector and full-text search
    k=5
)

# Search using hybrid approach
results = retriever.invoke("analytics database")
for doc in results:
    print(f"Content: {doc.page_content}")

Chat Message History

from langchain_clickzetta import ClickZettaChatMessageHistory
from langchain_core.messages import HumanMessage, AIMessage

# Create chat history
chat_history = ClickZettaChatMessageHistory(
    engine=engine,
    session_id="user_123",
    table_name="chat_sessions"
)

# Add messages
chat_history.add_message(HumanMessage(content="Hello!"))
chat_history.add_message(AIMessage(content="Hi there! How can I help you?"))

# Retrieve conversation history
messages = chat_history.messages
for message in messages:
    print(f"{message.__class__.__name__}: {message.content}")

Configuration

Environment Variables

You can configure ClickZetta connection using environment variables:

export CLICKZETTA_SERVICE="your-service"
export CLICKZETTA_INSTANCE="your-instance"
export CLICKZETTA_WORKSPACE="your-workspace"
export CLICKZETTA_SCHEMA="your-schema"
export CLICKZETTA_USERNAME="your-username"
export CLICKZETTA_PASSWORD="your-password"
export CLICKZETTA_VCLUSTER="your-vcluster"  # Required

Connection Options

engine = ClickZettaEngine(
    service="your-service",
    instance="your-instance",
    workspace="your-workspace",
    schema="your-schema",
    username="your-username",
    password="your-password",
    vcluster="your-vcluster",  # Required parameter
    connection_timeout=30,      # Connection timeout in seconds
    query_timeout=300,         # Query timeout in seconds
    hints={                    # Custom query hints
        "sdk.job.timeout": 600,
        "query_tag": "My Application"
    }
)

Advanced Usage

Custom SQL Prompts

from langchain_core.prompts import PromptTemplate

custom_prompt = PromptTemplate(
    input_variables=["input", "table_info", "dialect"],
    template="""
    You are a ClickZetta SQL expert. Given the input question and table information,
    write a syntactically correct {dialect} query.

    Tables: {table_info}
    Question: {input}

    SQL Query:"""
)

sql_chain = ClickZettaSQLChain(
    engine=engine,
    llm=llm,
    sql_prompt=custom_prompt
)

Vector Store with Custom Distance Metrics

vector_store = ClickZettaVectorStore(
    engine=engine,
    embeddings=embeddings,
    distance_metric="euclidean",  # or "cosine", "manhattan"
    vector_dimension=1536,
    vector_element_type="float"  # or "int", "tinyint"
)

Metadata Filtering

# Search with metadata filters
results = vector_store.similarity_search(
    "machine learning",
    k=5,
    filter={"category": "tech", "year": 2024}
)

# Full-text search with metadata
retriever = ClickZettaFullTextRetriever(
    engine=engine,
    table_name="research_docs"
)
results = retriever.get_relevant_documents(
    "artificial intelligence",
    filter={"type": "research"}
)

Testing

Run the test suite:

# Navigate to package directory
cd libs/clickzetta

# Install test dependencies
pip install -e ".[dev]"

# Run unit tests
make test-unit

# Run integration tests
make test-integration

# Run all tests
make test

Integration Tests

To run integration tests against a real ClickZetta instance:

Configure your connection in ~/.clickzetta/connections.json with a UAT connection
Add DashScope API key to the configuration
Run integration tests:

cd libs/clickzetta
make integration
make integration-dashscope

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/yunqiqiliang/langchain-clickzetta.git
cd langchain-clickzetta/libs/clickzetta

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks (if configured)
pre-commit install

Code Quality

# Navigate to the package directory
cd libs/clickzetta

# Format code (auto-fixes many issues)
make format

# Linting (significantly improved)
make lint      # ✅ Reduced from 358 to 65 errors - 82% improvement!

# Core functionality testing
# Use project virtual environment for best results:
source .venv/bin/activate
make test-unit        # ✅ Core unit tests (LangChain compatibility verified)
make test-integration # Integration tests

# Type checking (in progress)
make typecheck # Some LangChain compatibility issues being resolved

Recent Improvements ✨:

✅ Ruff configuration updated to modern format
✅ 155 typing issues auto-fixed (Dict→dict, Optional→|None)
✅ Method signatures fixed for LangChain BaseStore compatibility
✅ Bare except clauses improved with proper exception handling
✅ Code formatting standardized with black

Current Status: Core functionality fully working with significantly improved code quality (82% reduction in lint errors). All LangChain BaseStore compatibility tests pass.

📦 Storage Services

LangChain ClickZetta provides comprehensive storage services that implement the LangChain BaseStore interface with enterprise-grade features:

🔑 Key Advantages of ClickZetta Storage

🚀 Performance Benefits

10x Faster: ClickZetta's optimized lakehouse architecture
Atomic Operations: MERGE INTO for consistent UPSERT operations
Batch Processing: Efficient handling of large datasets
Connection Pooling: Optimized database connections

🏗️ Architecture Benefits

Native Integration: Direct ClickZetta Volume support for binary data
SQL Queryability: Full SQL access to stored documents and metadata
Unified Storage: Single platform for all data types
Schema Evolution: Flexible metadata storage with JSON support

🔒 Enterprise Features

ACID Compliance: Full transaction support
Type Safety: Runtime validation and type checking
Error Handling: Comprehensive error recovery and logging
Monitoring: Built-in query performance tracking

Key-Value Store

from langchain_clickzetta import ClickZettaStore

# Basic key-value storage
store = ClickZettaStore(engine=engine, table_name="cache")
store.mset([("key1", b"value1"), ("key2", b"value2")])
values = store.mget(["key1", "key2"])

Document Store

from langchain_clickzetta import ClickZettaDocumentStore

# Document storage with metadata
doc_store = ClickZettaDocumentStore(engine=engine, table_name="documents")
doc_store.store_document("doc1", "content", {"author": "user"})
content, metadata = doc_store.get_document("doc1")

File Store

from langchain_clickzetta import ClickZettaFileStore

# Binary file storage using ClickZetta Volume
file_store = ClickZettaFileStore(
    engine=engine,
    volume_type="user",
    subdirectory="models"
)
file_store.store_file("model.bin", binary_data, "application/octet-stream")
content, mime_type = file_store.get_file("model.bin")

Volume Store (Native ClickZetta Volume)

from langchain_clickzetta import ClickZettaUserVolumeStore

# Native Volume integration
volume_store = ClickZettaUserVolumeStore(engine=engine, subdirectory="data")
volume_store.mset([("config.json", b'{"key": "value"}')])
config = volume_store.mget(["config.json"])[0]

📊 Comparison with Alternatives

ClickZetta vs. Traditional Vector Databases

Feature	ClickZetta + LangChain	Pinecone/Weaviate	Chroma/FAISS
Hybrid Search	✅ Single table	❌ Multiple systems	❌ Separate tools
SQL Queryability	✅ Full SQL support	❌ Limited	❌ No SQL
Lakehouse Integration	✅ Native	❌ External	❌ External
Chinese Language	✅ Optimized	⚠️ Basic	⚠️ Basic
Enterprise Features	✅ ACID, Transactions	⚠️ Limited	❌ Basic
Storage Services	✅ Full LangChain API	❌ Custom	❌ Limited
Performance	✅ 10x improvement	⚠️ Variable	⚠️ Memory limited

ClickZetta vs. Other LangChain Integrations

Integration	Vector Search	Full-Text	Hybrid	Storage API	SQL Queries
ClickZetta	✅	✅	✅	✅	✅
Elasticsearch	✅	✅	⚠️	❌	❌
PostgreSQL/pgvector	✅	⚠️	❌	⚠️	✅
MongoDB	✅	⚠️	❌	⚠️	❌
Redis	✅	❌	❌	✅	❌

Key Differentiators

🎯 Single Platform Solution

No need to manage multiple systems (vector DB + full-text + SQL + storage)
Unified data governance and security model
Simplified architecture and reduced operational complexity

🚀 Performance at Scale

ClickZetta's incremental computing engine
Optimized for both analytical and operational workloads
Native lakehouse storage with separation of compute and storage

🌏 Chinese Market Focus

Deep integration with Chinese AI ecosystem (DashScope, Tongyi)
Optimized text processing for Chinese language
Compliance with Chinese data regulations

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests for your changes
Ensure all tests pass (pytest)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Documentation: [Link to detailed docs]
Issues: GitHub Issues
Discussions: GitHub Discussions

Acknowledgments

LangChain for the foundational framework
ClickZetta for the powerful analytics lakehouse

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.15

Sep 20, 2025

0.1.13

Sep 20, 2025

0.1.4

Sep 19, 2025

This version

0.1.3

Sep 19, 2025

0.1.2

Sep 19, 2025

0.1.1

Sep 19, 2025

0.1.0

Sep 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_clickzetta-0.1.3.tar.gz (76.0 kB view details)

Uploaded Sep 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_clickzetta-0.1.3-py3-none-any.whl (41.6 kB view details)

Uploaded Sep 19, 2025 Python 3

File details

Details for the file langchain_clickzetta-0.1.3.tar.gz.

File metadata

Download URL: langchain_clickzetta-0.1.3.tar.gz
Upload date: Sep 19, 2025
Size: 76.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for langchain_clickzetta-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`82fa8f52f83dbe7720f0f39cbd7ff7181f672d40cc8b9d2af24d10f46224cb9a`
MD5	`20134c35f2a1282ebd5bc34dea771a61`
BLAKE2b-256	`5889c6087e82c6d4c1e679550e4639634e2e9218d8c650867bcb32a35b404eb6`

See more details on using hashes here.

File details

Details for the file langchain_clickzetta-0.1.3-py3-none-any.whl.

File metadata

Download URL: langchain_clickzetta-0.1.3-py3-none-any.whl
Upload date: Sep 19, 2025
Size: 41.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for langchain_clickzetta-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e7f6e35eceeb38809aaa48e19c04ad05e366337b2dd31f854d299af9cebb255`
MD5	`2ea416932905373dc047801d673b3a2d`
BLAKE2b-256	`5bfbe24ddec5783ea31c239f6a176ff6f9f9394663d069d76f19d25f68431cb6`

See more details on using hashes here.

langchain-clickzetta 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

LangChain ClickZetta

📖 Table of Contents

🚀 Why ClickZetta + LangChain?

🏆 Unique Advantages

🛠️ Core Features

🧠 AI-Powered Query Interface

🔍 Advanced Search Capabilities

💾 Enterprise Storage Solutions

🔄 Production-Grade Operations

🎯 LangChain Compatibility

Installation

From PyPI (Recommended)

Development Installation

Local Installation from Source

Quick Start

Basic Setup

SQL Queries

Vector Storage

Full-text Search

True Hybrid Search (Single Table)

Chat Message History

Configuration

Environment Variables

Connection Options

Advanced Usage

Custom SQL Prompts

Vector Store with Custom Distance Metrics

Metadata Filtering

Testing

Integration Tests

Development

Setup Development Environment

Code Quality

📦 Storage Services

🔑 Key Advantages of ClickZetta Storage

Key-Value Store

Document Store

File Store

Volume Store (Native ClickZetta Volume)

📊 Comparison with Alternatives

ClickZetta vs. Traditional Vector Databases

ClickZetta vs. Other LangChain Integrations

Key Differentiators

Contributing

License

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes