Skip to main content

GlassFlow Clickhouse ETL Python SDK: Create GlassFlow pipelines between Kafka and ClickHouse

Project description

Clickhouse ETL Python SDK

Coverage

A Python SDK for creating and managing data pipelines between Kafka and ClickHouse.

Features

  • Create and manage data pipelines between Kafka and ClickHouse
  • Deduplication of events during a time window based on a key
  • Temporal joins between topics based on a common key with a given time window
  • Schema validation and configuration management

Installation

pip install glassflow-clickhouse-etl

Quick Start

from glassflow_clickhouse_etl import Pipeline


pipeline_config = {
  "pipeline_id": "test-pipeline",
  "source": {
    "type": "kafka",
    "provider": "aiven",
    "connection_params": {
      "brokers": ["localhoust:9092"],
      "protocol": "SASL_SSL",
      "mechanism": "SCRAM-SHA-256",
      "username": "user",
      "password": "pass"
    }
    "topics": [
      {
        "consumer_group_initial_offset": "earliest",
        "id": "test-topic",
        "name": "test-topic",
        "schema": {
          "type": "json",
          "fields": [
            {"name": "id", "type": "string" },
            {"name": "email", "type": "string"}
          ]
        },
        "deduplication": {
          "id_field": "id",
          "id_field_type": "string",
          "time_window": "1h",
          "enabled": True
        }
      }
    ],
  },
  "sink": {
    "type": "clickhouse",
    "host": "localhost:8443",
    "port": 8443,
    "database": "test",
    "username": "default",
    "password": "pass",
    "table_mapping": [
      {
        "source_id": "test_table",
        "field_name": "id",
        "column_name": "user_id",
        "column_type": "UUID"
      },
      {
        "source_id": "test_table",
        "field_name": "email",
        "column_name": "email",
        "column_type": "String"
      }
    ]
  }
}

# Create a pipeline from a JSON configuration
pipeline = Pipeline(pipeline_config)

# Create the pipeline
pipeline.create()

Configuration

For detailed information about the pipeline configuration, see CONFIGURATION.

Development

Setup

  1. Clone the repository
  2. Create a virtual environment
  3. Install dependencies:
uv venv
source .venv/bin/activate
uv pip install -e .[dev]

Testing

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glassflow_clickhouse_etl-0.1.4.tar.gz (73.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glassflow_clickhouse_etl-0.1.4-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file glassflow_clickhouse_etl-0.1.4.tar.gz.

File metadata

  • Download URL: glassflow_clickhouse_etl-0.1.4.tar.gz
  • Upload date:
  • Size: 73.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for glassflow_clickhouse_etl-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9292d8d7be95879fb6647a075c06c528f7a8930df95c405e1a8e4450b2fca981
MD5 688710b6c5421f8e53c2e87998ce7365
BLAKE2b-256 702f43b2a0c1bfb714d45cc4d77b4f89479ea4a09027f10ba96d3c0d815b2493

See more details on using hashes here.

File details

Details for the file glassflow_clickhouse_etl-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for glassflow_clickhouse_etl-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c094b256e22b11d03e2631d9353aa42402d3304bf1d7ba711f950cf268e2921d
MD5 2d123435250de5dc99080fbb2f7227fa
BLAKE2b-256 1932ef20963b65d5aac9533325b0af2d48a6be374bb1f9adb9b48a400705fc4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page