Python client SDK for the H2O Connector Service — create connectors, open connections, and stream extracted data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

h2o

These details have not been verified by PyPI

Project description

h2o-connector-service

Python client: https://pypi.org/project/h2o-connector-service/
Source: https://github.com/h2oai/connector-service

Python client SDK for the H2O Connector Service. Provides a high-level API to create connectors, open connections, and stream extracted data from supported data sources (PostgreSQL, Snowflake, Hive, Delta Lake, Blob Storage, and more).

pip install h2o-connector-service

Quick Start (H2O Cloud Discovery)

The recommended way to connect when running on H2O AI Cloud:

from h2o_connector_service import ConnectorService

with ConnectorService.from_discovery("https://cloud.h2o.ai", "my-workspace") as svc:
    with svc.open_session("postgresql", {
        "host": "db.example.com",
        "port": "5432",
        "database": "mydb",
        "username": "user",
        "password": "pass",
    }, worker_name="pg-worker") as session:
        # Stream rows one-by-one (constant memory)
        for row in session.stream_records():
            print(row)

Quick Start (Manual / Legacy)

For direct connections without H2O Cloud Discovery (deprecated):

from h2o_connector_service import ConnectorService

with ConnectorService("http://localhost:8080", "<your-oidc-token>", "my-workspace") as svc:
    with svc.open_session("postgresql", {
        "host": "db.example.com",
        "port": "5432",
        "database": "mydb",
        "username": "user",
        "password": "pass",
    }, worker_name="pg-worker") as session:
        for row in session.stream_records():
            print(row)

Output Formats

Once you have a session, stream data into various formats:

# CSV file (memory-safe — rows written as they arrive)
session.stream_to_csv("output.csv")

# pandas DataFrame (requires: pip install h2o-connector-service[pandas])
df = session.stream_to_pandas()

# Parquet file (memory-safe, chunked row groups)
# requires: pip install h2o-connector-service[parquet]
session.stream_to_parquet("output.parquet")

# datatable Frame (memory-safe, chunked rbind)
# requires: pip install h2o-connector-service[datatable]
frame = session.stream_to_data_table()

# H2O Frame (requires running H2O cluster + h2o.init())
# requires: pip install h2o-connector-service[h2o]
h2o_frame = session.stream_to_h2o_frame()

# Collect all rows into a list of dicts
records = session.stream_to_records()

Advanced Usage

For full control over the connector lifecycle, use the individual service clients:

from h2o_connector_service import (
    Client,
    ConnectorServiceClient,
    ConnectionServiceClient,
    ConnectorSession,
)

with Client.from_discovery("https://cloud.h2o.ai", "my-workspace") as client:
    connector_svc = ConnectorServiceClient(client)
    conn_svc = ConnectionServiceClient(client)

    # 1. Create a connector
    connector_svc.create_connector("my-workspace", {
        "metadata": {"name": "my-pg"},
        "data_source_type": "postgresql",
        "data_source_config": {"host": "db.example.com", "port": "5432", "database": "mydb"},
    })

    # 2. Create a connection (worker must be pre-provisioned by an admin)
    connection = conn_svc.create_connection("my-workspace", {
        "connector": "workspaces/my-workspace/connectors/my-pg",
        "worker": "workspaces/my-workspace/workers/pg-worker",
        "extraction": {"query": "SELECT * FROM my_table"},
    })

    # 3. Wait for the worker pod and stream data
    session = ConnectorSession(client, "my-workspace", connection["connection_id"])
    session.wait_for_worker_ready(timeout=300)
    session.stream_to_csv("output.csv")

Optional Dependencies

Install extras for additional output format support:

pip install h2o-connector-service[pandas]       # pandas DataFrames
pip install h2o-connector-service[parquet]      # Parquet files (pyarrow)
pip install h2o-connector-service[datatable]    # datatable Frames
pip install h2o-connector-service[h2o]          # H2O Frames (pandas + pyarrow + h2o)

Supported Data Source Types

`data_source_type`	Display Name	Category	Worker Language
`postgresql`	PostgreSQL	Tabular	Go
`snowflake`	Snowflake	Tabular	Go
`hive`	Apache Hive	Tabular	Java
`delta-lake`	Delta Lake	Tabular	Rust
`s3`	Amazon S3	Blob	Go
`gcs`	Google Cloud Storage	Blob	Go
`azure-blob`	Azure Blob Storage	Blob	Go
`minio`	MinIO	Blob	Go

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

h2o

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.0a1 pre-release

May 5, 2026

This version

0.1.0.dev11001 pre-release

Apr 26, 2026

0.1.0.dev10001 pre-release

Apr 25, 2026

0.1.0.dev9001 pre-release

Apr 20, 2026

0.1.0.dev8001 pre-release

Apr 9, 2026

0.1.0.dev7001 pre-release

Apr 9, 2026

0.1.0.dev6001 pre-release

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h2o_connector_service-0.1.0.dev11001.tar.gz (66.3 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

h2o_connector_service-0.1.0.dev11001-py3-none-any.whl (94.7 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file h2o_connector_service-0.1.0.dev11001.tar.gz.

File metadata

Download URL: h2o_connector_service-0.1.0.dev11001.tar.gz
Upload date: Apr 26, 2026
Size: 66.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for h2o_connector_service-0.1.0.dev11001.tar.gz
Algorithm	Hash digest
SHA256	`cea549c6d20e2baa715cc1f8ba9f18cc35c7b7513529bdeac144f6e0e7feb62b`
MD5	`4412683464af8b36b935f0615fb5cc87`
BLAKE2b-256	`46fce6fd279d9f768fffea6ed3b1eb72d65ec1cac2569af925fcca93a556f7ee`

See more details on using hashes here.

File details

Details for the file h2o_connector_service-0.1.0.dev11001-py3-none-any.whl.

File metadata

Download URL: h2o_connector_service-0.1.0.dev11001-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 94.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for h2o_connector_service-0.1.0.dev11001-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d7ab74d8b2971aa23797c36c9b1ad7d43447b1c77a033f10ed045d9c8d1e92d`
MD5	`8f43e3dc4f4a1db8cfee2a7cf39ce965`
BLAKE2b-256	`942d589e0d156c134ef84037f6ddd487dde3f73e9d805849c1d24440149540f4`

See more details on using hashes here.

h2o-connector-service 0.1.0.dev11001

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

h2o-connector-service

Quick Start (H2O Cloud Discovery)

Quick Start (Manual / Legacy)

Output Formats

Advanced Usage

Optional Dependencies

Supported Data Source Types

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes