Python client for the Impala distributed query engine
Project description
# impyla
Python client for Impala/Hive distributed query engine.
### Features
* Lightweight, `pip`-installable package for connecting to Impala and Hive
databases
* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.
* Connects to HiveServer2; runs with Kerberos, LDAP, SSL
* [SQLAlchemy][sqlalchemy] connector
* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib])
#### Deprecated functionality
These features will be removed in a future release.
* `BigDataFrame`
* beeswax support
* scikit-learn wrapper
* numba-compiled Python UDFs
See the [Ibis project][ibis] for continued development of these higher-level
features.
### Dependencies
Required:
* Python 2.6+ or 3.3+
* `six`
* `thrift_sasl`
* `bit_array`
* `thrift` (on Python 2.x) or `thriftpy` (on Python 3.x)
Optional:
* `pandas` for conversion to `DataFrame` objects
* `python-sasl` for Kerberos support (for Python 3.x support, requires
laserson/python-sasl@cython)
* `sqlalchemy` for the SQLAlchemy engine
* `pytest` for running tests; `unittest2` for testing on Python 2.6
### Installation
Install the latest release (`0.11.1`) with `pip`:
```bash
pip install impyla
```
For the latest (dev) version, clone the repo:
```bash
pip install git+https://github.com/cloudera/impyla.git
```
or clone the repo:
```bash
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
```
#### Running the tests
impyla uses the [pytest][pytest] toolchain, and depends on the following
environment variables:
```bash
export IMPYLA_TEST_HOST=your.impalad.com
export IMPYLA_TEST_PORT=21050
export IMPYLA_TEST_AUTH_MECH=NOSASL
```
To run the maximal set of tests, run
```bash
cd path/to/impyla
py.test --connect impyla
```
Leave out the `--connect` option to skip tests for DB API compliance.
### Quickstart
Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):
```python
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
```
The `Cursor` object also exposes the iterator interface, which is buffered
(controlled by `cursor.arraysize`):
```python
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
```
You can also get back a pandas DataFrame object
```python
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
```
[pep249]: http://legacy.python.org/dev/peps/pep-0249/
[pandas]: http://pandas.pydata.org/
[sklearn]: http://scikit-learn.org/
[matplotlib]: http://matplotlib.org/
[madlib]: http://madlib.net/
[madlibport]: https://github.com/bitfort/madlibport
[numba]: http://numba.pydata.org/
[llvm]: http://llvm.org/
[pytest]: http://pytest.org/latest/
[sqlalchemy]: http://www.sqlalchemy.org/
[ibis]: http://www.ibis-project.org/
Python client for Impala/Hive distributed query engine.
### Features
* Lightweight, `pip`-installable package for connecting to Impala and Hive
databases
* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.
* Connects to HiveServer2; runs with Kerberos, LDAP, SSL
* [SQLAlchemy][sqlalchemy] connector
* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib])
#### Deprecated functionality
These features will be removed in a future release.
* `BigDataFrame`
* beeswax support
* scikit-learn wrapper
* numba-compiled Python UDFs
See the [Ibis project][ibis] for continued development of these higher-level
features.
### Dependencies
Required:
* Python 2.6+ or 3.3+
* `six`
* `thrift_sasl`
* `bit_array`
* `thrift` (on Python 2.x) or `thriftpy` (on Python 3.x)
Optional:
* `pandas` for conversion to `DataFrame` objects
* `python-sasl` for Kerberos support (for Python 3.x support, requires
laserson/python-sasl@cython)
* `sqlalchemy` for the SQLAlchemy engine
* `pytest` for running tests; `unittest2` for testing on Python 2.6
### Installation
Install the latest release (`0.11.1`) with `pip`:
```bash
pip install impyla
```
For the latest (dev) version, clone the repo:
```bash
pip install git+https://github.com/cloudera/impyla.git
```
or clone the repo:
```bash
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
```
#### Running the tests
impyla uses the [pytest][pytest] toolchain, and depends on the following
environment variables:
```bash
export IMPYLA_TEST_HOST=your.impalad.com
export IMPYLA_TEST_PORT=21050
export IMPYLA_TEST_AUTH_MECH=NOSASL
```
To run the maximal set of tests, run
```bash
cd path/to/impyla
py.test --connect impyla
```
Leave out the `--connect` option to skip tests for DB API compliance.
### Quickstart
Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):
```python
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
```
The `Cursor` object also exposes the iterator interface, which is buffered
(controlled by `cursor.arraysize`):
```python
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
```
You can also get back a pandas DataFrame object
```python
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
```
[pep249]: http://legacy.python.org/dev/peps/pep-0249/
[pandas]: http://pandas.pydata.org/
[sklearn]: http://scikit-learn.org/
[matplotlib]: http://matplotlib.org/
[madlib]: http://madlib.net/
[madlibport]: https://github.com/bitfort/madlibport
[numba]: http://numba.pydata.org/
[llvm]: http://llvm.org/
[pytest]: http://pytest.org/latest/
[sqlalchemy]: http://www.sqlalchemy.org/
[ibis]: http://www.ibis-project.org/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
impyla-0.11.1.tar.gz
(161.3 kB
view details)
Built Distribution
impyla-0.11.1-py2.7.egg
(458.7 kB
view details)
File details
Details for the file impyla-0.11.1.tar.gz
.
File metadata
- Download URL: impyla-0.11.1.tar.gz
- Upload date:
- Size: 161.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 099deea1a1218f979f368a39359bc6928dddc30f730c0676492d69db58a56ea3 |
|
MD5 | 542dcc371950a4805dc3ff87653ab9d4 |
|
BLAKE2b-256 | 66ee115527a2f1b83a0b9e1710e8fb83cd38e3456d22ee8744f265458bf5873a |
File details
Details for the file impyla-0.11.1-py2.7.egg
.
File metadata
- Download URL: impyla-0.11.1-py2.7.egg
- Upload date:
- Size: 458.7 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c4cb9e937991430d61465b2dfd1041c88288d9d223d4f6ba505dd7e1ad136f6 |
|
MD5 | d510de0a6b10abf5abbf3f9802df337a |
|
BLAKE2b-256 | a3476e7f2ee7225465a49f84b931ca05d3c5ba25d7a1299d179189ca0dc36fd0 |