Skip to main content

IBM Netezza python driver - extended fork

Project description

nzpy_extended: High-performance IBM Netezza driver for Python

nzpy_extended is a hard fork of IBM nzpy — enriched with new features, major performance improvements via a C extension, and expanded platform support.

Key differences from upstream nzpy

Feature nzpy (IBM) nzpy_extended
Row parsing performance (mixed types) ~10 000 rows/s ~63 000 rows/s (no C ext) → ~93 000 rows/s (+ C ext)
Supported Python 3.5+ 3.12, 3.13, 3.14
Platform wheels ❌ None ✅ Linux x64, macOS ARM, Windows x64 (pre-built)
Async support ✅ Fully async API

Performance gains vary by data type. For mixed-type workloads (most representative of real-world queries), nzpy_extended reaches ~49 000 rows/s with C extension (vs. ~5‬400 rows/s for official nzpy — a ~9× improvement on WSL2) and ~33 000 rows/s without C extension (~6× improvement). Per-type benchmarks with 100 k rows:

Windows 11

Data type official nzpy nzpy_extended (no C ext) nzpy_extended (+ C ext) vs. ODBC
INTEGER ~18 k rows/s ~196 k rows/s ~256 k rows/s ≈ ODBC (~239 k)
NUMERIC ~10 k rows/s ~75 k rows/s ~97 k rows/s ≈ ODBC (~90 k)
STRING ~18 k rows/s ~75 k rows/s ~79 k rows/s ≈ ODBC (~69 k)
DATETIME ~18 k rows/s ~135 k rows/s ~255 k rows/s ≈ ODBC (~249 k)
BOOLEAN ~23 k rows/s ~243 k rows/s ~401 k rows/s ≈ ODBC (~359 k)
Mixed ~4 900 rows/s ~36 k rows/s ~45 k rows/s ≈ ODBC (~44 k)

WSL2 (Linux x86_64)

Data type official nzpy nzpy_extended (no C ext) nzpy_extended (+ C ext) vs. ODBC
INTEGER ~21 k rows/s ~203 k rows/s ~640 k rows/s ~92 k rows/s
NUMERIC ~11 k rows/s ~79 k rows/s ~150 k rows/s ~63 k rows/s
STRING ~22 k rows/s ~71 k rows/s ~160 k rows/s ~80 k rows/s
DATETIME ~21 k rows/s ~126 k rows/s ~407 k rows/s ~111 k rows/s
BOOLEAN ~25 k rows/s ~186 k rows/s ~418 k rows/s ~192 k rows/s
Mixed ~5.4 k rows/s ~33 k rows/s ~49 k rows/s ~20 k rows/s

Note: WSL2 results were obtained on the same Netezza server as Windows 11 (192.168.0.144) — the higher throughput reflects Linux network stack efficiency. ODBC results use a ctypes-based ANSI-ODBC fallback (because pyodbc's Unicode API is incompatible with the Netezza ODBC driver on Linux). This wrapper fetches every cell individually via SQLGetData, so ODBC numbers on WSL2 are understated vs. native pyodbc on Windows (which uses SQLBindCol batch fetching). The real ODBC-vs-native gap on Linux is likely smaller.

macOS ARM64 (Apple M4)

Benchmarks run on a Mac mini (Apple M4, 16 GB RAM) — same Netezza server at 192.168.0.144. 100 k rows per query.

Data type official nzpy nzpy_extended (no C ext) nzpy_extended (+ C ext)
INTEGER ~44 k rows/s ~356 k rows/s ~726 k rows/s
NUMERIC ~22 k rows/s ~122 k rows/s ~195 k rows/s
STRING ~39 k rows/s ~143 k rows/s ~183 k rows/s
DATETIME ~42 k rows/s ~253 k rows/s ~683 k rows/s
BOOLEAN ~53 k rows/s ~449 k rows/s ~786 k rows/s
Mixed ~10 k rows/s ~63 k rows/s ~93 k rows/s

The C extension accelerates integer, decimal, datetime, and boolean parsing by avoiding Python object allocations per field and struct.unpack overhead. String parsing is primarily network-bound, so the C extension offers minimal benefit there.

If the compiled extension is not available (unsupported platform or Python version), the driver gracefully falls back to pure Python with identical semantics.

Installation

pip install nzpy_extended

Pre-built wheels are provided for:

Platform Architecture Python
Linux x86_64 (manylinux) 3.12 / 3.13 / 3.14
macOS ARM64 (Apple Silicon) 3.12 / 3.13 / 3.14
Windows x86_64 3.12 / 3.13 / 3.14

For other platforms or Python versions, pip install will compile the C extension from source (requires a C compiler: GCC, Clang, or MSVC). On systems without a compiler the install will fail — use a supported platform or version.

Quick Start

Sync — scripts, ETL, Jupyter, Django

import nzpy_extended.sync as nzpy

conn = nzpy.connect(
    user="admin", password="secret",
    host="netezza-host", database="mydb",
)
with conn.cursor() as cur:
    cur.execute("SELECT id, name FROM users WHERE active = ?", (1,))
    for row in cur:
        print(row)

Async — FastAPI, asyncio

import asyncio
import nzpy_extended as nzpy

async def main():
    async with await nzpy.connect(
        user="admin", password="secret",
        host="netezza-host", database="mydb",
    ) as conn:
        async with conn.cursor() as cur:
            await cur.execute("SELECT id, name FROM users WHERE active = ?", (1,))
            return await cur.fetchall()

asyncio.run(main())

FastAPI with connection pool

from fastapi import FastAPI, Depends
import nzpy_extended as nzpy
import nzpy_extended.fastapi as nzpy_fastapi

pool = nzpy.NzPool(
    min_size=2, max_size=10,
    host="netezza-host", database="mydb",
    user="admin", password="secret",
)

app = FastAPI(lifespan=nzpy_fastapi.lifespan(pool))

@app.get("/users")
async def get_users(conn=Depends(nzpy_fastapi.get_connection)):
    async with conn.cursor() as cur:
        await cur.execute("SELECT * FROM users LIMIT 100")
        return await cur.fetchall()

Bulk data loading via external table protocol

load_data() inserts rows from a Python iterable into a Netezza table using the native external table protocol (REMOTESOURCE 'python'). It supports optional automatic table creation.

# --- Auto-infer: create table + load in one step ---
rows = [(1, "Alice", 100.50), (2, "Bob", 200.75)]
count = await nzpy.load_data(conn, "my_table", rows)
print(f"Inserted {count} rows")

# --- Mixed types with auto-infer (INT, VARCHAR, NUMERIC, BOOLEAN, DATE) ---
from decimal import Decimal
from datetime import date
rows = [
    (10, "item", Decimal("19.99"), True, date(2025, 1, 15)),
]
count = await conn.load_data("products", rows)
# Creates: col1 SMALLINT, col2 VARCHAR(255), col3 NUMERIC(4,2),
#          col4 BOOLEAN, col5 DATE

# --- Explicit columns (no auto-infer) ---
count = await conn.load_data(
    table_name="products",
    rows=[(101, "Widget", 9.99)],
    columns=[("id", "INT"), ("name", "VARCHAR(200)"), ("price", "NUMERIC(10,2)")],
)

# --- Generator for large datasets ---
def generate_rows(n):
    for i in range(n):
        yield (i, f"item_{i}")

count = await nzpy.load_data(conn, "my_table", rows=generate_rows(50000))

Parameters:

Parameter Default Description
conn / table_name / rows (required) Connection, target table, row iterable
columns None [(name, nz_type), ...] or None for auto-infer from data
delimiter '|' Field delimiter (pipe is safe; use , with escape_char)
encoding 'LATIN9' Text encoding (use 'UTF8' for NVARCHAR columns)
create_if_missing True Auto-create table if not exists
temporary False Create TEMP TABLE
distribute_on_random True Add DISTRIBUTE ON RANDOM to DDL
logdir temp dir Netezza log directory
escape_char '\\' Escape character for delimiter within values (None to disable)

Auto-infer column types: When columns is None and create_if_missing=True, the driver reads the first row and maps Python types to Netezza DDL: intSMALLINT/INT/BIGINT, floatFLOAT, strVARCHAR(255), DecimalNUMERIC(p,s), boolBOOLEAN, dateDATE, datetimeTIMESTAMP, bytesBYTEA. Column names default to col1, col2, etc.

Note on delimiters and escaping: Netezza external tables do not support standard CSV double-quote quoting. When a value contains the delimiter character, it must be escaped with the escape character (default: \). The driver does this automatically.

The function is available both as a standalone nzpy.load_data() and as conn.load_data().

Requirements

  • Python ≥ 3.12
  • CPython (PyPy not supported for C extension, pure-Python fallback only)

Documentation

Testing

Running the test suite

Tests require a running Netezza instance. Set the connection environment variables:

export NZ_DEV_HOST=your_netezza_host
export NZ_DEV_PORT=5480
export NZ_DEV_DB=JUST_DATA
export NZ_DEV_USER=admin
export NZ_DEV_PASSWORD=password

Run all tests:

pytest tests/ -v

C Extension / Pure Python parity

The C extension and pure-Python fallback must produce identical results for all data types. Parity tests verify this in two ways:

Unit tests (tests/test_c_python_parity_unit.py) — compare individual C parser functions against Python reference implementations byte-by-byte. No database required.

pytest tests/test_c_python_parity_unit.py -v

Integration tests (tests/test_c_python_parity_integration.py) — run real SQL queries through both code paths and verify results match. Requires a database.

pytest tests/test_c_python_parity_integration.py -v

Verification script — runs both test suites in C-extension and pure-Python modes side-by-side:

python tools/verify_c_python_parity.py

Disabling C extension at runtime:

Set the environment variable NZPY_EXTENDED_NO_CEXT=1 to force pure-Python mode even when the compiled extension is available. Useful for debugging or verifying fallback correctness.

NZPY_EXTENDED_NO_CEXT=1 pip install nzpy_extended
# or at runtime:
NZPY_EXTENDED_NO_CEXT=1 python -c "import nzpy_extended.core; print(nzpy_extended.core._HAVE_C_EXT)"  # False

Reproducing benchmark results

The per-type benchmark table above is generated by tools/examples/performance_test.py.

Prerequisites

  • A running Netezza instance with a table named JUST_DATA..FACTPRODUCTINVENTORY (or adjust SOURCE_TABLE in the script)
  • Python ≥ 3.12
  • Install the required packages:
pip install nzpy_extended

The official IBM driver is also tested for comparison (pip install nzpy).

Optional (for ODBC comparison):

  • pip install pyodbc — with NetezzaSQL ODBC driver installed (rows labeled pyodbc)
  • Set NZ_ODBC_DRIVER if your ODBC driver uses a different name (default: NetezzaSQL). On Linux/WSL2 the name is defined in /etc/odbcinst.ini, e.g.:
    export NZ_ODBC_DRIVER=NetezzaSQL  # Linux/WSL2
    
  • Linux/WSL2: pyodbc's Unicode API is incompatible with the Netezza ODBC driver. The benchmark auto-detects this and falls back to a ctypes-based ANSI-ODBC wrapper. No additional configuration needed.

Steps

  1. Set connection environment variables:
set NZ_HOST=your_netezza_host     # Windows
set NZ_PORT=5480
set NZ_USER=admin
set NZ_PASSWORD=password
set NZ_DATABASE=JUST_DATA

# or on Linux/macOS:
export NZ_HOST=your_netezza_host
export NZ_PORT=5480
export NZ_USER=admin
export NZ_PASSWORD=password
export NZ_DATABASE=JUST_DATA

All variables have defaults — only NZ_HOST is required if your setup differs from the defaults.

ODBC driver name can be set separately:

export NZ_ODBC_DRIVER=NetezzaSQL  # Linux/macOS — defaults to "NetezzaSQL"
  1. Run the benchmark:
python tools/examples/performance_test.py
  1. Adjust row count (default: 100 000):
set NZ_ROWS=100000     # Windows
export NZ_ROWS=100000  # Linux/macOS

What the script does

  1. Connects using each driver: official_nzpy (always), pyodbc (via ANSI-ODBC ctypes fallback on Linux), nzpy_extended (async + sync, with and without C extension).
  2. Runs six query categories: integer_types, numeric_types, string_types, datetime_types, boolean_types, and all_types (mixed).
  3. Prints per-query timing, a compact DRIVER × TYPE comparison table (matching the README layout), and visual bar charts.

Saving results to a TXT file

Use the --output / -o flag or the NZ_OUTPUT environment variable:

python tools/examples/performance_test.py -o benchmark_results.txt

# or via env var:
set NZ_OUTPUT=benchmark_results.txt
python tools/examples/performance_test.py

Force pure-Python mode

set NZPY_EXTENDED_NO_CEXT=1
python tools/examples/performance_test.py

Pytest benchmark (alternative)

A simpler pytest-based benchmark is also available (10 k rows, nzpy_extended only):

pytest tests/test_benchmark.py -v -m benchmark

License

Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nzpy_extended-0.0.3.tar.gz (115.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nzpy_extended-0.0.3-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl (125.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.5+ x86-64

File details

Details for the file nzpy_extended-0.0.3.tar.gz.

File metadata

  • Download URL: nzpy_extended-0.0.3.tar.gz
  • Upload date:
  • Size: 115.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nzpy_extended-0.0.3.tar.gz
Algorithm Hash digest
SHA256 dd6505d13be04ac775ac8001e67aedad8c6a0dd5864c542724a2bc0541441e03
MD5 3961f1930d32059894b030e16ed15fba
BLAKE2b-256 c6a98d44b4a8bebb742c2f124a9bb4837b3afba40e7e59352730db685342f1bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for nzpy_extended-0.0.3.tar.gz:

Publisher: publish.yml on KrzysztofDusko/nzpy_extended

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nzpy_extended-0.0.3-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.

File metadata

File hashes

Hashes for nzpy_extended-0.0.3-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl
Algorithm Hash digest
SHA256 005956304b4b1fa2e79fc247919347d5c49db7811a1f7f5a56c735cbc6b0ccca
MD5 887940e554efd4ed5355b49df03c42b6
BLAKE2b-256 261874ccfb2958574265c6e08f9c87575535f7ba9fe4634cdbec9cdcd7605316

See more details on using hashes here.

Provenance

The following attestation bundles were made for nzpy_extended-0.0.3-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl:

Publisher: publish.yml on KrzysztofDusko/nzpy_extended

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page