Skip to main content

Python client for the Impala distributed query engine

Project description

# impyla

Python client for Impala/Hive distributed query engine.


### Features

* Lightweight, `pip`-installable package for connecting to Impala and Hive
databases

* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

* Connects to HiveServer2; runs with Kerberos, LDAP, SSL

* [SQLAlchemy][sqlalchemy] connector

* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib])


#### Deprecated functionality

These features will be removed in a future release.

* `BigDataFrame`

* beeswax support

* scikit-learn wrapper

* numba-compiled Python UDFs

See the [Ibis project][ibis] for continued development of these higher-level
features.


### Dependencies

Required:

* Python 2.6+ or 3.3+

* `six`

* `thrift_sasl`

* `bit_array`

* `thrift` (on Python 2.x) or `thriftpy` (on Python 3.x)

Optional:

* `pandas` for conversion to `DataFrame` objects

* `python-sasl` for Kerberos support (for Python 3.x support, requires
laserson/python-sasl@cython)

* `sqlalchemy` for the SQLAlchemy engine

* `pytest` for running tests; `unittest2` for testing on Python 2.6


### Installation

Install the latest release (`0.11.1`) with `pip`:

```bash
pip install impyla
```

For the latest (dev) version, clone the repo:

```bash
pip install git+https://github.com/cloudera/impyla.git
```

or clone the repo:

```bash
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
```

#### Running the tests

impyla uses the [pytest][pytest] toolchain, and depends on the following
environment variables:

```bash
export IMPYLA_TEST_HOST=your.impalad.com
export IMPYLA_TEST_PORT=21050
export IMPYLA_TEST_AUTH_MECH=NOSASL
```

To run the maximal set of tests, run

```bash
cd path/to/impyla
py.test --connect impyla
```

Leave out the `--connect` option to skip tests for DB API compliance.


### Quickstart

Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):

```python
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
```

The `Cursor` object also exposes the iterator interface, which is buffered
(controlled by `cursor.arraysize`):

```python
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
```

You can also get back a pandas DataFrame object

```python
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
```


[pep249]: http://legacy.python.org/dev/peps/pep-0249/
[pandas]: http://pandas.pydata.org/
[sklearn]: http://scikit-learn.org/
[matplotlib]: http://matplotlib.org/
[madlib]: http://madlib.net/
[madlibport]: https://github.com/bitfort/madlibport
[numba]: http://numba.pydata.org/
[llvm]: http://llvm.org/
[pytest]: http://pytest.org/latest/
[sqlalchemy]: http://www.sqlalchemy.org/
[ibis]: http://www.ibis-project.org/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impyla-0.11.1.tar.gz (161.3 kB view details)

Uploaded Source

Built Distribution

impyla-0.11.1-py2.7.egg (458.7 kB view details)

Uploaded Egg

File details

Details for the file impyla-0.11.1.tar.gz.

File metadata

  • Download URL: impyla-0.11.1.tar.gz
  • Upload date:
  • Size: 161.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for impyla-0.11.1.tar.gz
Algorithm Hash digest
SHA256 099deea1a1218f979f368a39359bc6928dddc30f730c0676492d69db58a56ea3
MD5 542dcc371950a4805dc3ff87653ab9d4
BLAKE2b-256 66ee115527a2f1b83a0b9e1710e8fb83cd38e3456d22ee8744f265458bf5873a

See more details on using hashes here.

File details

Details for the file impyla-0.11.1-py2.7.egg.

File metadata

  • Download URL: impyla-0.11.1-py2.7.egg
  • Upload date:
  • Size: 458.7 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for impyla-0.11.1-py2.7.egg
Algorithm Hash digest
SHA256 9c4cb9e937991430d61465b2dfd1041c88288d9d223d4f6ba505dd7e1ad136f6
MD5 d510de0a6b10abf5abbf3f9802df337a
BLAKE2b-256 a3476e7f2ee7225465a49f84b931ca05d3c5ba25d7a1299d179189ca0dc36fd0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page