Skip to main content

Python client for the Impala distributed query engine

Project description

# impyla

Python client for HiveServer2 implementations (e.g., Impala, Hive) for
distributed query engines.

For higher-level Impala functionality, including a Pandas-like interface over
distributed data sets, see the [Ibis project][ibis].

### Features

* HiveServer2 compliant; works with Impala and Hive, including nested data

* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

* Works with Kerberos, LDAP, SSL

* [SQLAlchemy][sqlalchemy] connector

* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib]); but see the [Ibis project][ibis] for a richer

### Dependencies


* Python 2.6+ or 3.3+

* `six`, `bit_array`

* `thrift` (on Python 2.x) or `thriftpy` (on Python 3.x)

For Hive and/or Kerberos support:

pip install thrift_sasl
pip install sasl


* `pandas` for conversion to `DataFrame` objects; but see the [Ibis project][ibis] instead

* `sqlalchemy` for the SQLAlchemy engine

* `pytest` for running tests; `unittest2` for testing on Python 2.6

### Installation

Install the latest release (`0.13.1`) with `pip`:

pip install impyla

For the latest (dev) version, install directly from the repo:

pip install git+

or clone the repo:

git clone
cd impyla
python install

#### Running the tests

impyla uses the [pytest][pytest] toolchain, and depends on the following
environment variables:

export IMPYLA_TEST_PORT=21050

To run the maximal set of tests, run

cd path/to/impyla
py.test --connect impyla

Leave out the `--connect` option to skip tests for DB API compliance.

### Usage

Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):

from impala.dbapi import connect
conn = connect(host='', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()

The `Cursor` object also exposes the iterator interface, which is buffered
(controlled by `cursor.arraysize`):

cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:

You can also get back a pandas DataFrame object

from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example


Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ym-impyla-0.14.0.tar.gz (140.9 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page