Skip to main content

Python client for the Impala distributed query engine

Project description

impyla

Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.

For higher-level Impala functionality, including a Pandas-like interface over distributed data sets, see the Ibis project.

Features

  • HiveServer2 compliant; works with Impala and Hive, including nested data

  • Fully DB API 2.0 (PEP 249)-compliant Python client (similar to sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

  • Works with Kerberos, LDAP, SSL

  • SQLAlchemy connector

  • Converter to pandas DataFrame, allowing easy integration into the Python data stack (including scikit-learn and matplotlib); but see the Ibis project for a richer experience

Dependencies

Required:

  • Python 2.6+ or 3.3+

  • six, bit_array

  • thrift

Optional:

  • thrift_sasl>=0.2.1 for hive and/or Kerberos support. This also requires a SASL library to be installed on your system - see System SASL

  • pandas for conversion to DataFrame objects; but see the Ibis project instead

  • sqlalchemy for the SQLAlchemy engine

  • pytest for running tests; unittest2 for testing on Python 2.6

System SASL

Different systems require different packages to be installed to enable SASL support in Impyla. Some examples of how to install the packages on different distributions follow.

Ubuntu:

apt-get install libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit

RHEL/CentOS:

yum install cyrus-sasl-md5 cyrus-sasl-plain cyrus-sasl-gssapi cyrus-sasl-devel

Installation

Install the latest release with pip:

pip install impyla

For the latest (dev) version, install directly from the repo:

pip install git+https://github.com/cloudera/impyla.git

or clone the repo:

git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install

Running the tests

impyla uses the pytest toolchain, and depends on the following environment variables:

export IMPYLA_TEST_HOST=your.impalad.com
export IMPYLA_TEST_PORT=21050
export IMPYLA_TEST_AUTH_MECH=NOSASL

To run the maximal set of tests, run

cd path/to/impyla
py.test --connect impala

Leave out the --connect option to skip tests for DB API compliance.

Usage

Impyla implements the Python DB API v2.0 (PEP 249) database interface (refer to it for API details):

from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description  # prints the result set's schema
results = cursor.fetchall()

The Cursor object also exposes the iterator interface, which is buffered (controlled by cursor.arraysize):

cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
    print(row)

Furthermore the Cursor object returns you information about the columns returned in the query. This is useful to export your data as a csv file.

import csv

cursor.execute('SELECT * FROM mytable LIMIT 100')
columns = [datum[0] for datum in cursor.description]
targetfile = '/tmp/foo.csv'

with open(targetfile, 'w', newline='') as outcsv:
    writer = csv.writer(outcsv, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')
    writer.writerow(columns)
    for row in cursor:
        writer.writerow(row)

You can also get back a pandas DataFrame object

from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example

How do I contribute code?

You need to first sign and return an ICLA and CCLA before we can accept and redistribute your contribution. Once these are submitted you are free to start contributing to impyla. Submit these to CLA@cloudera.com.

Find

We use Github issues to track bugs for this project. Find an issue that you would like to work on (or file one if you have discovered a new issue!). If no-one is working on it, assign it to yourself only if you intend to work on it shortly.

It's a good idea to discuss your intended approach on the issue. You are much more likely to have your patch reviewed and committed if you've already got buy-in from the impyla community before you start.

Fix

Now start coding! As you are writing your patch, please keep the following things in mind:

First, please include tests with your patch. If your patch adds a feature or fixes a bug and does not include tests, it will generally not be accepted. If you are unsure how to write tests for a particular component, please ask on the issue for guidance.

Second, please keep your patch narrowly targeted to the problem described by the issue. It's better for everyone if we maintain discipline about the scope of each patch. In general, if you find a bug while working on a specific feature, file a issue for the bug, check if you can assign it to yourself and fix it independently of the feature. This helps us to differentiate between bug fixes and features and allows us to build stable maintenance releases.

Finally, please write a good, clear commit message, with a short, descriptive title and a message that is exactly long enough to explain what the problem was, and how it was fixed.

Please create a pull request on github with your patch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impyla_jz-0.16.3.tar.gz (239.2 kB view details)

Uploaded Source

Built Distribution

impyla_jz-0.16.3-py3-none-any.whl (255.5 kB view details)

Uploaded Python 3

File details

Details for the file impyla_jz-0.16.3.tar.gz.

File metadata

  • Download URL: impyla_jz-0.16.3.tar.gz
  • Upload date:
  • Size: 239.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.3

File hashes

Hashes for impyla_jz-0.16.3.tar.gz
Algorithm Hash digest
SHA256 f9bfc07bc10ff4f20a0c15f92cb41905b5a30a4b9ea27e92f131641217dd408e
MD5 c4d3d0c79d1b5fb60271b01151873aa6
BLAKE2b-256 8283bbccfbf63c6b9228d741879670f715f98e5271d47cf56c991004111cd4ef

See more details on using hashes here.

File details

Details for the file impyla_jz-0.16.3-py3-none-any.whl.

File metadata

  • Download URL: impyla_jz-0.16.3-py3-none-any.whl
  • Upload date:
  • Size: 255.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.3

File hashes

Hashes for impyla_jz-0.16.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3ee8226f1396402254e331b538ce58b911b58ad0515734ee0eb9f96b02aac689
MD5 cf9e8509cffa34b26894c86b553ab90c
BLAKE2b-256 a417137df5f5687c50eba4bf1d2bbf4a766c06ea62425d3abafd344a237a1808

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page