Skip to main content

GlassFlow Clickhouse ETL Python SDK: Create GlassFlow pipelines between Kafka and ClickHouse

Project description

Clickhouse ETL Python SDK

Coverage

A Python SDK for creating and managing data pipelines between Kafka and ClickHouse.

Features

  • Create and manage data pipelines between Kafka and ClickHouse
  • Deduplication of events during a time window based on a key
  • Temporal joins between topics based on a common key with a given time window
  • Schema validation and configuration management

Installation

pip install glassflow-clickhouse-etl

Quick Start

from glassflow_clickhouse_etl import Pipeline


pipeline_config = {
  "pipeline_id": "test-pipeline",
  "source": {
    "type": "kafka",
    "provider": "aiven",
    "connection_params": {
      "brokers": ["localhoust:9092"],
      "protocol": "SASL_SSL",
      "mechanism": "SCRAM-SHA-256",
      "username": "user",
      "password": "pass"
    }
    "topics": [
      {
        "consumer_group_initial_offset": "earliest",
        "id": "test-topic",
        "name": "test-topic",
        "schema": {
          "type": "json",
          "fields": [
            {"name": "id", "type": "string" },
            {"name": "email", "type": "string"}
          ]
        },
        "deduplication": {
          "id_field": "id",
          "id_field_type": "string",
          "time_window": "1h",
          "enabled": True
        }
      }
    ],
  },
  "sink": {
    "type": "clickhouse",
    "host": "localhost:8443",
    "port": 8443,
    "database": "test",
    "username": "default",
    "password": "pass",
    "table_mapping": [
      {
        "source_id": "test_table",
        "field_name": "id",
        "column_name": "user_id",
        "column_type": "UUID"
      },
      {
        "source_id": "test_table",
        "field_name": "email",
        "column_name": "email",
        "column_type": "String"
      }
    ]
  }
}

# Create a pipeline from a JSON configuration
pipeline = Pipeline(pipeline_config)

# Create the pipeline
pipeline.create()

Configuration

For detailed information about the pipeline configuration, see CONFIGURATION.

Development

Setup

  1. Clone the repository
  2. Create a virtual environment
  3. Install dependencies:
uv venv
source .venv/bin/activate
uv pip install -e .[dev]

Testing

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glassflow_clickhouse_etl-0.1.8.tar.gz (74.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glassflow_clickhouse_etl-0.1.8-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file glassflow_clickhouse_etl-0.1.8.tar.gz.

File metadata

  • Download URL: glassflow_clickhouse_etl-0.1.8.tar.gz
  • Upload date:
  • Size: 74.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for glassflow_clickhouse_etl-0.1.8.tar.gz
Algorithm Hash digest
SHA256 0557ca7a33be3686924327b340622495aa7d0b5968ea40ba1e0dda9883bc9164
MD5 a114e0155d20d952d62808412a0d089e
BLAKE2b-256 0439b2415d49d8b76546503a287cbde7b11539f7158a57133248301b0cc29b07

See more details on using hashes here.

File details

Details for the file glassflow_clickhouse_etl-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for glassflow_clickhouse_etl-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 4e0c60c2bce62109ec7a54b8a4a7e8038e5b2eb4084ecef6815d351afb15cf0e
MD5 6ef454c4ecfb6ae338bfa89fafc049c3
BLAKE2b-256 033ccd43513084461e17e2a3ed0f6d5dd96a41ab2400de088c930d00c288966a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page