Skip to main content

Python client for the Impala distributed query engine

Project description

# impyla

Python client for HiveServer2 implementations (e.g., Impala, Hive) for
distributed query engines.

For higher-level Impala functionality, including a Pandas-like interface over
distributed data sets, see the [Ibis project][ibis].

### Features

* HiveServer2 compliant; works with Impala and Hive, including nested data

* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

* Works with Kerberos, LDAP, SSL

* [SQLAlchemy][sqlalchemy] connector

* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib]); but see the [Ibis project][ibis] for a richer
experience

### Dependencies

Required:

* Python 2.6+ or 3.3+

* `six`, `bit_array`

* `thrift` (on Python 2.x) or `thriftpy` (on Python 3.x)

For Hive and/or Kerberos support:

* `thrift_sasl`

* `python-sasl` (for Python 3.x support, requires
[cloudera/python-sasl@cython][python-sasl-cython] branch)

Optional:

* `pandas` for conversion to `DataFrame` objects; but see the [Ibis project][ibis] instead

* `sqlalchemy` for the SQLAlchemy engine

* `pytest` for running tests; `unittest2` for testing on Python 2.6


### Installation

Install the latest release (`0.13.0`) with `pip`:

```bash
pip install impyla
```

For the latest (dev) version, install directly from the repo:

```bash
pip install git+https://github.com/cloudera/impyla.git
```

or clone the repo:

```bash
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
```

#### Running the tests

impyla uses the [pytest][pytest] toolchain, and depends on the following
environment variables:

```bash
export IMPYLA_TEST_HOST=your.impalad.com
export IMPYLA_TEST_PORT=21050
export IMPYLA_TEST_AUTH_MECH=NOSASL
```

To run the maximal set of tests, run

```bash
cd path/to/impyla
py.test --connect impyla
```

Leave out the `--connect` option to skip tests for DB API compliance.


### Usage

Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):

```python
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
```

The `Cursor` object also exposes the iterator interface, which is buffered
(controlled by `cursor.arraysize`):

```python
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
```

You can also get back a pandas DataFrame object

```python
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
```


[pep249]: http://legacy.python.org/dev/peps/pep-0249/
[pandas]: http://pandas.pydata.org/
[sklearn]: http://scikit-learn.org/
[matplotlib]: http://matplotlib.org/
[madlib]: http://madlib.net/
[madlibport]: https://github.com/bitfort/madlibport
[numba]: http://numba.pydata.org/
[llvm]: http://llvm.org/
[pytest]: http://pytest.org/latest/
[sqlalchemy]: http://www.sqlalchemy.org/
[ibis]: http://www.ibis-project.org/
[python-sasl-cython]: https://github.com/laserson/python-sasl/tree/cython/sasl

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impyla-0.13.0.tar.gz (135.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

impyla-0.13.0-py2.7.egg (371.9 kB view details)

Uploaded Egg

File details

Details for the file impyla-0.13.0.tar.gz.

File metadata

  • Download URL: impyla-0.13.0.tar.gz
  • Upload date:
  • Size: 135.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for impyla-0.13.0.tar.gz
Algorithm Hash digest
SHA256 e51eabcaef5567326184fb749c68268fd211d406695a5d26e6072013818910ef
MD5 0be987225fe1a41bd0c2488f9ca46a8d
BLAKE2b-256 b5c5a446817bb3a37c9bfaa7ca8eae29abbf1e6073862333be544486d4a3eed3

See more details on using hashes here.

File details

Details for the file impyla-0.13.0-py2.7.egg.

File metadata

  • Download URL: impyla-0.13.0-py2.7.egg
  • Upload date:
  • Size: 371.9 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for impyla-0.13.0-py2.7.egg
Algorithm Hash digest
SHA256 f2f5c7ac1b88ee388658ac75026166f1b2455ce9580b355e186f6cbcc49375d7
MD5 dfbd53d66af7cff95ccbf23107512029
BLAKE2b-256 b1ecc04e8fb30ccba36aba2dd66ed57e712b7f295d0ba147a3ac63fbcb084a2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page