Python interface to Hive
Project description
PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.
Usage
DB-API
from pyhive import presto
cursor = presto.connect('localhost').cursor()
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')
print cursor.fetchone()
print cursor.fetchall()
DB-API (asynchronous)
from pyhive import hive
from TCLIService.ttypes import TOperationState
cursor = hive.connect('localhost').cursor()
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10', async=True)
status = cursor.poll().operationState
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
logs = cursor.fetch_logs()
for message in logs:
print message
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
status = cursor.poll().operationState
print cursor.fetchall()
SQLAlchemy
First install this package to register it with SQLAlchemy (see setup.py).
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine('presto://localhost:8080/hive/default')
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
print select([func.count('*')], from_obj=logs).scalar()
Note: query generation functionality is not exhaustive or fully tested, but there should be no problem with raw SQL.
Passing session configuration
# DB-API
hive.connect('localhost', configuration={'hive.exec.reducers.max': '123'})
presto.connect('localhost', session_props={'query_max_run_time': '1234m'})
# SQLAlchemy
create_engine(
'hive://user@host:10000/database',
connect_args={'configuration': {'hive.exec.reducers.max': '123'}},
)
Requirements
Install using
pip install pyhive[hive] for the Hive interface and
pip install pyhive[presto] for the Presto interface.
PyHive works with
Python 2.7
For Presto: Presto install
For Hive: HiveServer2 daemon
There’s also a third party Conda package.
Changelog
Testing
Run the following in an environment with Hive/Presto:
./scripts/make_test_tables.sh virtualenv --no-site-packages env source env/bin/activate pip install -e . pip install -r dev_requirements.txt py.test
WARNING: This drops/creates tables named one_row, one_row_complex, and many_rows, plus a database called pyhive_test_database.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.