Skip to main content

Python client SDK for the H2O Connector Service — create connectors, open connections, and stream extracted data

Project description

h2o-connector-service

Python client SDK for the H2O Connector Service. Provides a high-level API to create connectors, open connections, and stream extracted data from supported data sources (PostgreSQL, Snowflake, Hive, Delta Lake, Blob Storage, and more).

pip install h2o-connector-service

Quick Start (H2O Cloud Discovery)

The recommended way to connect when running on H2O AI Cloud:

from h2o_connector_service import ConnectorService

with ConnectorService.from_discovery("https://cloud.h2o.ai", "my-workspace") as svc:
    with svc.open_session("CONNECTOR_TYPE_POSTGRESQL", {
        "host": "db.example.com",
        "port": "5432",
        "database": "mydb",
        "username": "user",
        "password": "pass",
    }) as session:
        # Stream rows one-by-one (constant memory)
        for row in session.stream_records():
            print(row)

Quick Start (Manual / Legacy)

For direct connections without H2O Cloud Discovery (deprecated):

from h2o_connector_service import ConnectorService

with ConnectorService("http://localhost:8080", "<your-oidc-token>", "my-workspace") as svc:
    with svc.open_session("CONNECTOR_TYPE_POSTGRESQL", {
        "host": "db.example.com",
        "port": "5432",
        "database": "mydb",
        "username": "user",
        "password": "pass",
    }) as session:
        for row in session.stream_records():
            print(row)

Output Formats

Once you have a session, stream data into various formats:

# CSV file (memory-safe — rows written as they arrive)
session.stream_to_csv("output.csv")

# pandas DataFrame (requires: pip install h2o-connector-service[pandas])
df = session.stream_to_pandas()

# Parquet file (memory-safe, chunked row groups)
# requires: pip install h2o-connector-service[parquet]
session.stream_to_parquet("output.parquet")

# datatable Frame (memory-safe, chunked rbind)
# requires: pip install h2o-connector-service[datatable]
frame = session.stream_to_data_table()

# H2O Frame (requires running H2O cluster + h2o.init())
# requires: pip install h2o-connector-service[h2o]
h2o_frame = session.stream_to_h2o_frame()

# Collect all rows into a list of dicts
records = session.stream_to_records()

Advanced Usage

For full control over the connector lifecycle, use the individual service clients:

from h2o_connector_service import (
    Client,
    ConnectorServiceClient,
    ConnectionServiceClient,
    ConnectorSession,
)

with Client.from_discovery("https://cloud.h2o.ai", "my-workspace") as client:
    connector_svc = ConnectorServiceClient(client)
    conn_svc = ConnectionServiceClient(client)

    # 1. Create a connector
    connector_svc.create_connector("my-workspace", {
        "metadata": {"name": "my-pg", "workspace_id": "my-workspace"},
        "spec": {
            "connector_type": "CONNECTOR_TYPE_POSTGRESQL",
            "config": {"host": "db.example.com", "port": "5432", "database": "mydb"},
        },
    })

    # 2. Create a connection
    connection = conn_svc.create_connection("my-workspace", {
        "metadata": {"workspace_id": "my-workspace"},
        "spec": {"connector_name": "workspaces/my-workspace/connectors/my-pg"},
    })

    # 3. Wait for the worker pod and stream data
    session = ConnectorSession(client, "my-workspace", connection["connection_id"])
    session.wait_for_worker_ready(timeout=300)
    session.stream_to_csv("output.csv")

Optional Dependencies

Install extras for additional output format support:

pip install h2o-connector-service[pandas]      # pandas DataFrames
pip install h2o-connector-service[parquet]      # Parquet files (pyarrow)
pip install h2o-connector-service[datatable]    # datatable Frames
pip install h2o-connector-service[h2o]          # H2O Frames (pandas + pyarrow + h2o)

Supported Connector Types

Connector Type Worker
CONNECTOR_TYPE_POSTGRESQL worker-postgresql (Java/JDBC)
CONNECTOR_TYPE_SNOWFLAKE worker-snowflake (Go)
CONNECTOR_TYPE_HIVE worker-hive (Java/JDBC)
CONNECTOR_TYPE_DELTA_LAKE worker-delta (Rust)
CONNECTOR_TYPE_S3 worker-blob (Go)
CONNECTOR_TYPE_AZURE_BLOB worker-blob (Go)
CONNECTOR_TYPE_GCS worker-blob (Go)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h2o_connector_service-0.1.0.dev7001.tar.gz (63.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

h2o_connector_service-0.1.0.dev7001-py3-none-any.whl (90.5 kB view details)

Uploaded Python 3

File details

Details for the file h2o_connector_service-0.1.0.dev7001.tar.gz.

File metadata

File hashes

Hashes for h2o_connector_service-0.1.0.dev7001.tar.gz
Algorithm Hash digest
SHA256 74d60b601e02ebb591527bd4a7e29048d0468ebeb8b0b90998006fcf83cdd2b9
MD5 b1df1c9a81b6006700fd74d70400134f
BLAKE2b-256 2ac42ffb239522f92d282c0d908cd2535d7035d6d997477954c04615a894f47b

See more details on using hashes here.

File details

Details for the file h2o_connector_service-0.1.0.dev7001-py3-none-any.whl.

File metadata

File hashes

Hashes for h2o_connector_service-0.1.0.dev7001-py3-none-any.whl
Algorithm Hash digest
SHA256 8937d6ec3945be23d42a1a64f82d34e75dec28d4deab1be7e340d4a13ea96f88
MD5 16e259fe8982f09fadcf5b39ebabd36d
BLAKE2b-256 9f5b066f9c61cc84f4b4e1b1511dc279bab10c310be79ac42e23665e0eb648ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page