GlassFlow Clickhouse ETL Python SDK: Create GlassFlow pipelines between Kafka and ClickHouse

These details have not been verified by PyPI

Project links

Project description

Clickhouse ETL Python SDK

A Python SDK for creating and managing data pipelines between Kafka and ClickHouse.

Features

Create and manage data pipelines between Kafka and ClickHouse
Deduplication of events during a time window based on a key
Temporal joins between topics based on a common key with a given time window
Schema validation and configuration management

Installation

pip install glassflow-clickhouse-etl

Quick Start

Initialize client

from glassflow_clickhouse_etl import Client

# Initialize GlassFlow client
client = Client(host="your-glassflow-etl-url")

Create a pipeline

pipeline_config = {
    "pipeline_id": "deduplication-demo-pipeline",
    "source": {
      "type": "kafka",
      "provider": "confluent",
      "connection_params": {
        "brokers": [
          "kafka:9093"
        ],
        "protocol": "PLAINTEXT",
        "skip_auth": True
      },
      "topics": [
        {
          "consumer_group_initial_offset": "latest",
          "name": "users",
          "schema": {
            "type": "json",
            "fields": [
              {
                "name": "event_id",
                "type": "string"
              },
              {
                "name": "user_id",
                "type": "string"
              },
              {
                "name": "name",
                "type": "string"
              },
              {
                "name": "email",
                "type": "string"
              },
              {
                "name": "created_at",
                "type": "string"
              }
            ]
          },
          "deduplication": {
            "enabled": True,
            "id_field": "event_id",
            "id_field_type": "string",
            "time_window": "1h"
          }
        }
      ]
    },
    "join": {
      "enabled": False
    },
    "sink": {
      "type": "clickhouse",
      "provider": "localhost",
      "host": "clickhouse",
      "port": "9000",
      "database": "default",
      "username": "default",
      "password": "c2VjcmV0",
      "secure": False,
      "max_batch_size": 1000,
      "max_delay_time": "30s",
      "table": "users_dedup",
      "table_mapping": [
        {
          "source_id": "users",
          "field_name": "event_id",
          "column_name": "event_id",
          "column_type": "UUID"
        },
        {
          "source_id": "users",
          "field_name": "user_id",
          "column_name": "user_id",
          "column_type": "UUID"
        },
        {
          "source_id": "users",
          "field_name": "created_at",
          "column_name": "created_at",
          "column_type": "DateTime"
        },
        {
          "source_id": "users",
          "field_name": "name",
          "column_name": "name",
          "column_type": "String"
        },
        {
          "source_id": "users",
          "field_name": "email",
          "column_name": "email",
          "column_type": "String"
        }
      ]
    }
}

# Create a pipeline
pipeline = client.create_pipeline(pipeline_config)

Get pipeline

# Get a pipeline by ID
pipeline = client.get_pipeline("my-pipeline-id")

List pipelines

pipelines = client.list_pipelines()
for pipeline in pipelines:
    print(f"Pipeline ID: {pipeline['pipeline_id']}")
    print(f"Name: {pipeline['name']}")
    print(f"Transformation Type: {pipeline['transformation_type']}")
    print(f"Created At: {pipeline['created_at']}")
    print(f"State: {pipeline['state']}")

Delete pipeline

# Delete a pipeline
client.delete_pipeline("my-pipeline-id")

# Or delete via pipeline instance
pipeline.delete()

Update pipeline

# Update the sink table name
config_patch = {
  "sink": {
    "table": "new_table_name"
  }
}

pipeline = client.get_pipeline("my-pipeline-id")
pipeline.update(config_patch)

Pause / Resume pipeline

# Will stop ingesting new messages and finish processing the messages inside the pipeline
pipeline.pause()

# Will resume ingestion
pipeline.resume()

Pipeline Configuration

For detailed information about the pipeline configuration, see GlassFlow docs.

Tracking

The SDK includes anonymous usage tracking to help improve the product. Tracking is enabled by default but can be disabled in two ways:

Using an environment variable:

export GF_TRACKING_ENABLED=false

Programmatically using the disable_tracking method:

from glassflow_clickhouse_etl import Client

client = Client(host="my-glassflow-host")
client.disable_tracking()

The tracking collects anonymous information about:

SDK version
Platform (operating system)
Python version
Pipeline ID
Whether joins or deduplication are enabled
Kafka security protocol, auth mechanism used and whether authentication is disabled
Errors during pipeline creation and deletion

Development

Setup

Clone the repository
Create a virtual environment
Install dependencies:

uv venv
source .venv/bin/activate
uv pip install -e .[dev]

Testing

pytest

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Sep 1, 2025

0.2.10

Jul 4, 2025

0.2.9

Jul 1, 2025

0.2.8

Jul 1, 2025

0.2.7

May 22, 2025

0.2.6

May 19, 2025

0.2.5

May 16, 2025

0.2.4

May 14, 2025

0.2.3

May 9, 2025

0.2.2

Apr 28, 2025

0.2.1

Apr 24, 2025

0.2.0

Apr 23, 2025

0.1.8

Apr 23, 2025

0.1.7

Apr 23, 2025

0.1.5

Apr 22, 2025

0.1.4

Apr 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glassflow_clickhouse_etl-1.0.0.tar.gz (78.5 kB view details)

Uploaded Sep 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

glassflow_clickhouse_etl-1.0.0-py3-none-any.whl (19.9 kB view details)

Uploaded Sep 1, 2025 Python 3

File details

Details for the file glassflow_clickhouse_etl-1.0.0.tar.gz.

File metadata

Download URL: glassflow_clickhouse_etl-1.0.0.tar.gz
Upload date: Sep 1, 2025
Size: 78.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for glassflow_clickhouse_etl-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`713a7ddb713d86e5eb33a71fcfe6ceaea9c0ead73f26616b88bbe3e97ea17c6e`
MD5	`93b47a3b1285ebd24e506ad7d49ee05e`
BLAKE2b-256	`dae30b323f0bfad46d9ca00ac7c428f5ac42665d268ac6ba1d30ff5265d78a1f`

See more details on using hashes here.

File details

Details for the file glassflow_clickhouse_etl-1.0.0-py3-none-any.whl.

File metadata

Download URL: glassflow_clickhouse_etl-1.0.0-py3-none-any.whl
Upload date: Sep 1, 2025
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for glassflow_clickhouse_etl-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b20e359b923fb10aa9bc09099d882014054d987e58d829962ee38cdca7929ad`
MD5	`11cf0468f9a806143ab662c19f91a0ad`
BLAKE2b-256	`eefbf972ad8f2160ff99219c1364a8596087fd7f6412783c3a682570ce016a6a`

See more details on using hashes here.

glassflow-clickhouse-etl 1.0.0

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

Clickhouse ETL Python SDK

Features

Installation

Quick Start

Initialize client

Create a pipeline

Get pipeline

List pipelines

Delete pipeline

Update pipeline

Pause / Resume pipeline

Pipeline Configuration

Tracking

Development

Setup

Testing

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes