Skip to main content

Python client for the Impala distributed query engine

Project description

# impyla

Python DBAPI 2.0 client for Impala/Hive distributed query engine.

For higher-level Impala functionality, see the [Ibis project][ibis].

### Features

* Lightweight, `pip`-installable package for connecting to Impala and Hive
databases

* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

* Connects to HiveServer2; runs with Kerberos, LDAP, SSL

* [SQLAlchemy][sqlalchemy] connector

* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib])

### Dependencies

Required:

* Python 2.6+ or 3.3+

* `six`

* `thrift_sasl`

* `bit_array`

* `thrift` (on Python 2.x) or `thriftpy` (on Python 3.x)

Optional:

* `pandas` for conversion to `DataFrame` objects

* `python-sasl` for Kerberos support (for Python 3.x support, requires
laserson/python-sasl@cython)

* `sqlalchemy` for the SQLAlchemy engine

* `pytest` for running tests; `unittest2` for testing on Python 2.6


### Installation

Install the latest release (`0.12.0`) with `pip`:

```bash
pip install impyla
```

For the latest (dev) version, clone the repo:

```bash
pip install git+https://github.com/cloudera/impyla.git
```

or clone the repo:

```bash
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
```

#### Running the tests

impyla uses the [pytest][pytest] toolchain, and depends on the following
environment variables:

```bash
export IMPYLA_TEST_HOST=your.impalad.com
export IMPYLA_TEST_PORT=21050
export IMPYLA_TEST_AUTH_MECH=NOSASL
```

To run the maximal set of tests, run

```bash
cd path/to/impyla
py.test --connect impyla
```

Leave out the `--connect` option to skip tests for DB API compliance.


### Quickstart

Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):

```python
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
```

The `Cursor` object also exposes the iterator interface, which is buffered
(controlled by `cursor.arraysize`):

```python
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
```

You can also get back a pandas DataFrame object

```python
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
```


[pep249]: http://legacy.python.org/dev/peps/pep-0249/
[pandas]: http://pandas.pydata.org/
[sklearn]: http://scikit-learn.org/
[matplotlib]: http://matplotlib.org/
[madlib]: http://madlib.net/
[madlibport]: https://github.com/bitfort/madlibport
[numba]: http://numba.pydata.org/
[llvm]: http://llvm.org/
[pytest]: http://pytest.org/latest/
[sqlalchemy]: http://www.sqlalchemy.org/
[ibis]: http://www.ibis-project.org/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impyla-0.12.0.tar.gz (134.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

impyla-0.12.0-py2.7.egg (368.8 kB view details)

Uploaded Egg

File details

Details for the file impyla-0.12.0.tar.gz.

File metadata

  • Download URL: impyla-0.12.0.tar.gz
  • Upload date:
  • Size: 134.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for impyla-0.12.0.tar.gz
Algorithm Hash digest
SHA256 27bb821abb68b19c7e30f70e375c4845304a3970cc45136be2c2844bf40fa15a
MD5 80c0374ba450c171661eade087216611
BLAKE2b-256 301a043dd2a378d7397c20252be3d69a7cf8f48c84a1852b8d2d8e4a327e378f

See more details on using hashes here.

File details

Details for the file impyla-0.12.0-py2.7.egg.

File metadata

  • Download URL: impyla-0.12.0-py2.7.egg
  • Upload date:
  • Size: 368.8 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for impyla-0.12.0-py2.7.egg
Algorithm Hash digest
SHA256 fa0762fb386c007b47a36519445054f7b7bbd09508a5ef8c6cdc96f4bdc85d12
MD5 1b75cfa923895a97d26ba7d3af476696
BLAKE2b-256 ade4c8e85b294963334dfe8ffe3596eabfd6e2f7cb721c1e91d062555c6b7906

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page