Skip to main content

Supports Custom ML/Analytics Execution Inside Netezza

Project description

nzpyida

Accelerating Python Analytics by In-Database Processing

The nzpyida project provides a Python interface to the in-database data-manipulation algorithms provided by IBM Netezza:

  • It accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution, thereby benefitting from in-database performance-enhancing features, such as columnar storage and parallel processing.
  • It can be used by Python developers with very little additional knowledge, because it copies the well-known interface of the Pandas library for data manipulation and the Scikit-learn library for the use of machine learning algorithms.
  • It is compatible with Python 3.6.
  • It can connect to Netezza databases via nzpy, ODBC or JDBC.

nzpyida = NeteZa PYthon In Database Analytics

The latest version of nzpyida is available on the Python Package Index and Github.

How nzpyida works

The nzpyida project translates Pandas-like syntax into SQL and uses a middleware API (like pypyodbc or nzpy) to send it to an ODBC, JDBC or nzpy connected database for execution. The results are fetched and formatted into the corresponding data structure, for example, a Pandas.Dataframe or a Pandas.Series.

The following scenario illustrates how nzpyida works.

Issue the following statements to connect via nzpy to a Netezza database server NETEZZA_HOSTNAME on port 5480 logging in as DATABASE_USER with password PASSWORD. The database to use on that server is DATABASE.

from nzpyida import IdaDataBase, IdaDataFrame
nzpy_cfg = {
  'user': 'DATABASE_USER', 
  'password': 'PASSWORD', 
  'host': 'NETEZZA_HOSTNAME', 
  'port': 5480, 
  'database': 'DATABASE',
  'logLevel': 0, 
  'securityLevel': 0
} 
idadb = IdaDataBase(nzpy_cfg)

A few sample data sets are included in nzpyida for you to experiment. First, we can load the IRIS table into this database instance.

from nzpyida.sampledata import iris
idadb.as_idadataframe(iris, "IRIS")

Next, we can create an IDA data frame that points to the table we just uploaded:

idadf = IdaDataFrame(idadb, 'IRIS')

Note that to create an IDA data frame using the IdaDataFrame object, we need to specify our previously opened IdaDataBase object, because it holds the connection.

Next, we compute the correlation matrix:

idadf.corr()

In the background, nzpyida looks for numerical columns in the table and builds an SQL request that returns the correlation between each pair of columns.

The result fetched by nzpyida is a tuple containing all values of the matrix. This tuple is formatted back into a Pandas.DataFrame and then returned:

               sepal_length  sepal_width   petal_length  petal_width
sepal_length      1.000000    -0.117570      0.871754     0.817941
sepal_width      -0.117570     1.000000     -0.428440    -0.366126
petal_length      0.871754    -0.428440      1.000000     0.962865
petal_width       0.817941    -0.366126      0.962865     1.000000

Contributors

The nzpyida is based on ibmdbpy project developed for IBM Db2 Warehouse. See https://github.com/ibmdbanalytics/ibmdbpy for details.

How to contribute

You want to contribute? That's great! There are many things you can do.

If you are a member of the ibmdbanalytics group, you can create branchs and merge them to master. Otherwise, you can fork the project and do a pull request. You are very welcome to contribute to the code and to the documentation.

There are many ways to contribute. If you find bugs and have improvement ideas or need some new specific features, please open a ticket! We do care about it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nzpyida-1.2.tar.gz (230.3 kB view details)

Uploaded Source

Built Distribution

nzpyida-1.2-py3-none-any.whl (208.7 kB view details)

Uploaded Python 3

File details

Details for the file nzpyida-1.2.tar.gz.

File metadata

  • Download URL: nzpyida-1.2.tar.gz
  • Upload date:
  • Size: 230.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for nzpyida-1.2.tar.gz
Algorithm Hash digest
SHA256 5910e5bbaade52eb743ccfebe5902ff805bb7c9ba22b709439a17b7579817487
MD5 ceae4834f1827a71e6230c389d93e32c
BLAKE2b-256 1d55fc8e1051e40ffbdb9076d7891d945de3a1cac1eb786232fa863751ff5fcd

See more details on using hashes here.

File details

Details for the file nzpyida-1.2-py3-none-any.whl.

File metadata

  • Download URL: nzpyida-1.2-py3-none-any.whl
  • Upload date:
  • Size: 208.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for nzpyida-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0c4c04d5d039f0d228a23b2c29fc5796da44d20b3dfac7ee671333019e71d30a
MD5 74095dec94e6385dff2d02a82eee9922
BLAKE2b-256 5af5da6a369ccbe9dd3944cf223b69a7e711e6d5c9cc8702d52502dc9d0a9a6d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page