Python client SDK for the H2O Connector Service — create connectors, open connections, and stream extracted data
Project description
h2o-connector-service
- Python client: https://pypi.org/project/h2o-connector-service/
- Source: https://github.com/h2oai/connector-service
Python client SDK for the H2O Connector Service. Provides a high-level API to create connectors, open connections, and stream extracted data from supported data sources (PostgreSQL, Snowflake, Hive, Delta Lake, Blob Storage, and more).
pip install h2o-connector-service
Quick Start (H2O Cloud Discovery)
The recommended way to connect when running on H2O AI Cloud:
from h2o_connector_service import ConnectorService
with ConnectorService.from_discovery("https://cloud.h2o.ai", "my-workspace") as svc:
with svc.open_session("postgresql", {
"host": "db.example.com",
"port": "5432",
"database": "mydb",
"username": "user",
"password": "pass",
}, worker_name="pg-worker") as session:
# Stream rows one-by-one (constant memory)
for row in session.stream_records():
print(row)
Quick Start (Manual / Legacy)
For direct connections without H2O Cloud Discovery (deprecated):
from h2o_connector_service import ConnectorService
with ConnectorService("http://localhost:8080", "<your-oidc-token>", "my-workspace") as svc:
with svc.open_session("postgresql", {
"host": "db.example.com",
"port": "5432",
"database": "mydb",
"username": "user",
"password": "pass",
}, worker_name="pg-worker") as session:
for row in session.stream_records():
print(row)
Output Formats
Once you have a session, stream data into various formats:
# CSV file (memory-safe — rows written as they arrive)
session.stream_to_csv("output.csv")
# pandas DataFrame (requires: pip install h2o-connector-service[pandas])
df = session.stream_to_pandas()
# Parquet file (memory-safe, chunked row groups)
# requires: pip install h2o-connector-service[parquet]
session.stream_to_parquet("output.parquet")
# datatable Frame (memory-safe, chunked rbind)
# requires: pip install h2o-connector-service[datatable]
frame = session.stream_to_data_table()
# H2O Frame (requires running H2O cluster + h2o.init())
# requires: pip install h2o-connector-service[h2o]
h2o_frame = session.stream_to_h2o_frame()
# Collect all rows into a list of dicts
records = session.stream_to_records()
Advanced Usage
For full control over the connector lifecycle, use the individual service clients:
from h2o_connector_service import (
Client,
ConnectorServiceClient,
ConnectionServiceClient,
ConnectorSession,
)
with Client.from_discovery("https://cloud.h2o.ai", "my-workspace") as client:
connector_svc = ConnectorServiceClient(client)
conn_svc = ConnectionServiceClient(client)
# 1. Create a connector
connector_svc.create_connector("my-workspace", {
"metadata": {"name": "my-pg"},
"data_source_type": "postgresql",
"data_source_config": {"host": "db.example.com", "port": "5432", "database": "mydb"},
})
# 2. Create a connection (worker must be pre-provisioned by an admin)
connection = conn_svc.create_connection("my-workspace", {
"connector": "workspaces/my-workspace/connectors/my-pg",
"worker": "workspaces/my-workspace/workers/pg-worker",
"extraction": {"query": "SELECT * FROM my_table"},
})
# 3. Wait for the worker pod and stream data
session = ConnectorSession(client, "my-workspace", connection["connection_id"])
session.wait_for_worker_ready(timeout=300)
session.stream_to_csv("output.csv")
Optional Dependencies
Install extras for additional output format support:
pip install h2o-connector-service[pandas] # pandas DataFrames
pip install h2o-connector-service[parquet] # Parquet files (pyarrow)
pip install h2o-connector-service[datatable] # datatable Frames
pip install h2o-connector-service[h2o] # H2O Frames (pandas + pyarrow + h2o)
Supported Data Source Types
data_source_type |
Display Name | Category | Worker Language |
|---|---|---|---|
postgresql |
PostgreSQL | Tabular | Go |
snowflake |
Snowflake | Tabular | Go |
hive |
Apache Hive | Tabular | Java |
delta-lake |
Delta Lake | Tabular | Rust |
s3 |
Amazon S3 | Blob | Go |
gcs |
Google Cloud Storage | Blob | Go |
azure-blob |
Azure Blob Storage | Blob | Go |
minio |
MinIO | Blob | Go |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file h2o_connector_service-0.1.0.dev11001.tar.gz.
File metadata
- Download URL: h2o_connector_service-0.1.0.dev11001.tar.gz
- Upload date:
- Size: 66.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cea549c6d20e2baa715cc1f8ba9f18cc35c7b7513529bdeac144f6e0e7feb62b
|
|
| MD5 |
4412683464af8b36b935f0615fb5cc87
|
|
| BLAKE2b-256 |
46fce6fd279d9f768fffea6ed3b1eb72d65ec1cac2569af925fcca93a556f7ee
|
File details
Details for the file h2o_connector_service-0.1.0.dev11001-py3-none-any.whl.
File metadata
- Download URL: h2o_connector_service-0.1.0.dev11001-py3-none-any.whl
- Upload date:
- Size: 94.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d7ab74d8b2971aa23797c36c9b1ad7d43447b1c77a033f10ed045d9c8d1e92d
|
|
| MD5 |
8f43e3dc4f4a1db8cfee2a7cf39ce965
|
|
| BLAKE2b-256 |
942d589e0d156c134ef84037f6ddd487dde3f73e9d805849c1d24440149540f4
|