Skip to main content

Python interface to iomete (Hive)

Project description

py-hive-iomete is a collection of Python DB-API and SQLAlchemy interfaces for iomete hive.

Usage

DB-API

from pyhive import hive

connection = hive.connect(
    host="<data_plane_host>",
    port=<data_plane_port>,
    scheme="http", # or "https"
    lakehouse="<lakehouse_cluster_name>",
    database="default",
    username="<username>",
    password="<password>"
)

cursor = connection.cursor()
cursor.execute("SELECT * FROM my_awesome_data LIMIT 10")

print(cursor.fetchone())
print(cursor.fetchall())

DB-API (asynchronous)

from pyhive import hive
from TCLIService.ttypes import TOperationState

connection = hive.connect(
    host="<data_plane_host>",
    port=<data_plane_port>,
    scheme="http", # or "https"
    lakehouse="<lakehouse_cluster_name>",
    database="default",
    username="<username>",
    password="<password>"
)

cursor = connection.cursor()

cursor.execute("SELECT * FROM my_awesome_data LIMIT 10", async_=True)

status = cursor.poll().operationState

while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
    logs = cursor.fetch_logs()
    for message in logs:
        print(message)

    # If needed, an asynchronous query can be cancelled at any time with:
    # cursor.cancel()

    status = cursor.poll().operationState

print(cursor.fetchall())

SQLAlchemy

First install this package to register it with SQLAlchemy (see setup.py).

from sqlalchemy.engine import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.schema import *

# Possible dialects (hive and iomete are both operate identically):
# hive+http
# hive+https
# iomete+http
# iomete+https
engine = create_engine(
    'iomete+https://<username>:<password>@<data_plane_host>:<data_plane_port>/<database>?lakehouse=<lakehouse_cluster_name>')

# Alternatively, "hive" driver could be used as well
# engine = create_engine(
#    'hive+https://<username>:<password>@<data_plane_host>:<data_plane_port>/<database>?lakehouse=<lakehouse_cluster_name>')

session = sessionmaker(bind=engine)()
records = session.query(Table('my_awesome_data', MetaData(bind=engine), autoload=True)) \
    .limit(10) \
    .all()
print(records)

Note: query generation functionality is not exhaustive or fully tested, but there should be no problem with raw SQL.

Requirements

Install using

  • pip install 'py-hive-iomete' for the DB-API interface

  • pip install 'py-hive-iomete[sqlalchemy]' for the SQLAlchemy interface

py-hive-iomete works with

  • Python 2.7 / Python 3

Changelog

See https://github.com/iomete/py-hive-iomete/releases.

Contributing

  • Changes must come with tests, with the exception of trivial things like fixing comments. See .travis.yml for the test environment setup.

  • Notes on project scope:

    • This project is intended to be a minimal iomete (hive) client that does that one thing and nothing else. Features that can be implemented on top of py-hive-iomete, such integration with your favorite data analysis library, are likely out of scope.

    • We prefer having a small number of generic features over a large number of specialized, inflexible features.

Updating TCLIService

The TCLIService module is autogenerated using a TCLIService.thrift file. To update it, the generate.py file can be used: python generate.py <TCLIServiceURL>. When left blank, the version for Hive 2.3 will be downloaded.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-hive-iomete-1.3.0.tar.gz (44.8 kB view hashes)

Uploaded Source

Built Distribution

py_hive_iomete-1.3.0-py3-none-any.whl (50.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page