Skip to main content

GlassFlow Clickhouse ETL Python SDK: Create GlassFlow pipelines between Kafka and ClickHouse

Project description

Clickhouse ETL Python SDK


A Python SDK for creating and managing data pipelines between Kafka and ClickHouse.

Features

  • Create and manage data pipelines between Kafka and ClickHouse
  • Deduplication of events during a time window based on a key
  • Temporal joins between topics based on a common key with a given time window
  • Schema validation and configuration management

Installation

pip install glassflow-clickhouse-etl

Quick Start

from glassflow_clickhouse_etl import Pipeline


pipeline_config = {
  "pipeline_id": "test-pipeline",
  "source": {
    "type": "kafka",
    "provider": "aiven",
    "connection_params": {
      "brokers": ["localhoust:9092"],
      "protocol": "SASL_SSL",
      "mechanism": "SCRAM-SHA-256",
      "username": "user",
      "password": "pass"
    }
    "topics": [
      {
        "consumer_group_initial_offset": "earliest",
        "id": "test-topic",
        "name": "test-topic",
        "schema": {
          "type": "json",
          "fields": [
            {"name": "id", "type": "string" },
            {"name": "email", "type": "string"}
          ]
        },
        "deduplication": {
          "id_field": "id",
          "id_field_type": "string",
          "time_window": "1h",
          "enabled": True
        }
      }
    ],
  },
  "sink": {
    "type": "clickhouse",
    "host": "localhost:8443",
    "port": 8443,
    "database": "test",
    "username": "default",
    "password": "pass",
    "table_mapping": [
      {
        "source_id": "test_table",
        "field_name": "id",
        "column_name": "user_id",
        "column_type": "UUID"
      },
      {
        "source_id": "test_table",
        "field_name": "email",
        "column_name": "email",
        "column_type": "String"
      }
    ]
  }
}

# Create a pipeline from a JSON configuration
pipeline = Pipeline(pipeline_config)

# Create the pipeline
pipeline.create()

Pipeline Configuration

For detailed information about the pipeline configuration, see GlassFlow docs.

Tracking

The SDK includes anonymous usage tracking to help improve the product. Tracking is enabled by default but can be disabled in two ways:

  1. Using an environment variable:
export GF_TRACKING_ENABLED=false
  1. Programmatically using the disable_tracking method:
pipeline = Pipeline(pipeline_config)
pipeline.disable_tracking()

The tracking collects anonymous information about:

  • SDK version
  • Platform (operating system)
  • Python version
  • Pipeline ID
  • Whether joins or deduplication are enabled
  • Kafka security protocol, auth mechanism used and whether authentication is disabled
  • Errors during pipeline creation and deletion

Development

Setup

  1. Clone the repository
  2. Create a virtual environment
  3. Install dependencies:
uv venv
source .venv/bin/activate
uv pip install -e .[dev]

Testing

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glassflow_clickhouse_etl-0.2.7.tar.gz (74.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glassflow_clickhouse_etl-0.2.7-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file glassflow_clickhouse_etl-0.2.7.tar.gz.

File metadata

  • Download URL: glassflow_clickhouse_etl-0.2.7.tar.gz
  • Upload date:
  • Size: 74.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for glassflow_clickhouse_etl-0.2.7.tar.gz
Algorithm Hash digest
SHA256 e8f916a4b88395b8b34202e91d62f76426eae02617bb6c25d289f4bdc864f3d7
MD5 df36be932fb0b8619273514ac30ae0de
BLAKE2b-256 9f1c709305380b44e49189761b745f658fc4c59263c221cdb63396ac73cd5595

See more details on using hashes here.

File details

Details for the file glassflow_clickhouse_etl-0.2.7-py3-none-any.whl.

File metadata

File hashes

Hashes for glassflow_clickhouse_etl-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 86488426d6bc5d33704cc89adf8de6b619f35c77bdeeb4f16595cb47a3bbaebd
MD5 d783beabb3c0ba7476a4f40173ff12b0
BLAKE2b-256 2f008578eefad57ae3d05d96eba1f8ed976682786d2b07a5fd8479d70ed06d1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page