Skip to main content

Python client SDK for the H2O Connector Service — create connectors, open connections, and stream extracted data

Project description

h2o-connector-service

Python client SDK for the H2O Connector Service. Provides a high-level API to create connectors, open connections, and stream extracted data from supported data sources (PostgreSQL, Snowflake, Hive, Delta Lake, Blob Storage, and more).

pip install h2o-connector-service

Quick Start (H2O Cloud Discovery)

The recommended way to connect when running on H2O AI Cloud:

from h2o_connector_service import ConnectorService

with ConnectorService.from_discovery("https://cloud.h2o.ai", "my-workspace") as svc:
    with svc.open_session("postgresql", {
        "host": "db.example.com",
        "port": "5432",
        "database": "mydb",
        "username": "user",
        "password": "pass",
    }, worker_name="pg-worker") as session:
        # Stream rows one-by-one (constant memory)
        for row in session.stream_records():
            print(row)

Quick Start (Manual / Legacy)

For direct connections without H2O Cloud Discovery (deprecated):

from h2o_connector_service import ConnectorService

with ConnectorService("http://localhost:8080", "<your-oidc-token>", "my-workspace") as svc:
    with svc.open_session("postgresql", {
        "host": "db.example.com",
        "port": "5432",
        "database": "mydb",
        "username": "user",
        "password": "pass",
    }, worker_name="pg-worker") as session:
        for row in session.stream_records():
            print(row)

Output Formats

Once you have a session, stream data into various formats:

# CSV file (memory-safe — rows written as they arrive)
session.stream_to_csv("output.csv")

# pandas DataFrame (requires: pip install h2o-connector-service[pandas])
df = session.stream_to_pandas()

# Parquet file (memory-safe, chunked row groups)
# requires: pip install h2o-connector-service[parquet]
session.stream_to_parquet("output.parquet")

# datatable Frame (memory-safe, chunked rbind)
# requires: pip install h2o-connector-service[datatable]
frame = session.stream_to_data_table()

# H2O Frame (requires running H2O cluster + h2o.init())
# requires: pip install h2o-connector-service[h2o]
h2o_frame = session.stream_to_h2o_frame()

# Collect all rows into a list of dicts
records = session.stream_to_records()

Advanced Usage

For full control over the connector lifecycle, use the individual service clients:

from h2o_connector_service import (
    Client,
    ConnectorServiceClient,
    ConnectionServiceClient,
    ConnectorSession,
)

with Client.from_discovery("https://cloud.h2o.ai", "my-workspace") as client:
    connector_svc = ConnectorServiceClient(client)
    conn_svc = ConnectionServiceClient(client)

    # 1. Create a connector
    connector_svc.create_connector("my-workspace", {
        "metadata": {"name": "my-pg"},
        "data_source_type": "postgresql",
        "data_source_config": {"host": "db.example.com", "port": "5432", "database": "mydb"},
    })

    # 2. Create a connection (worker must be pre-provisioned by an admin)
    connection = conn_svc.create_connection("my-workspace", {
        "connector": "workspaces/my-workspace/connectors/my-pg",
        "worker": "workspaces/my-workspace/workers/pg-worker",
        "extraction": {"query": "SELECT * FROM my_table"},
    })

    # 3. Wait for the worker pod and stream data
    session = ConnectorSession(client, "my-workspace", connection["connection_id"])
    session.wait_for_worker_ready(timeout=300)
    session.stream_to_csv("output.csv")

Optional Dependencies

Install extras for additional output format support:

pip install h2o-connector-service[pandas]       # pandas DataFrames
pip install h2o-connector-service[parquet]      # Parquet files (pyarrow)
pip install h2o-connector-service[datatable]    # datatable Frames
pip install h2o-connector-service[h2o]          # H2O Frames (pandas + pyarrow + h2o)

Supported Data Source Types

data_source_type Display Name Category Worker Language
postgresql PostgreSQL Tabular Go
snowflake Snowflake Tabular Go
hive Apache Hive Tabular Java
delta-lake Delta Lake Tabular Rust
s3 Amazon S3 Blob Go
gcs Google Cloud Storage Blob Go
azure-blob Azure Blob Storage Blob Go
minio MinIO Blob Go

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h2o_connector_service-0.1.0.dev11001.tar.gz (66.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

h2o_connector_service-0.1.0.dev11001-py3-none-any.whl (94.7 kB view details)

Uploaded Python 3

File details

Details for the file h2o_connector_service-0.1.0.dev11001.tar.gz.

File metadata

File hashes

Hashes for h2o_connector_service-0.1.0.dev11001.tar.gz
Algorithm Hash digest
SHA256 cea549c6d20e2baa715cc1f8ba9f18cc35c7b7513529bdeac144f6e0e7feb62b
MD5 4412683464af8b36b935f0615fb5cc87
BLAKE2b-256 46fce6fd279d9f768fffea6ed3b1eb72d65ec1cac2569af925fcca93a556f7ee

See more details on using hashes here.

File details

Details for the file h2o_connector_service-0.1.0.dev11001-py3-none-any.whl.

File metadata

File hashes

Hashes for h2o_connector_service-0.1.0.dev11001-py3-none-any.whl
Algorithm Hash digest
SHA256 3d7ab74d8b2971aa23797c36c9b1ad7d43447b1c77a033f10ed045d9c8d1e92d
MD5 8f43e3dc4f4a1db8cfee2a7cf39ce965
BLAKE2b-256 942d589e0d156c134ef84037f6ddd487dde3f73e9d805849c1d24440149540f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page