A JDBC-first Python database bridge powered by JPype

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

jdbc4py

Python access to databases whose native drivers are weak, commercial, or missing — and a fast Arrow bulk-read path across any database with a JDBC driver.

jdbc4py embeds a JVM in-process via JPype, loads your JDBC driver JAR, and hands back a standard DBAPI 2.0 connection — plus near-zero-copy Arrow reads into pandas, Polars, and DuckDB.

When to use this

Reach for jdbc4py when:

Your database's native Python driver is weak, commercial, or nonexistent — SAP HANA, Sybase ASE/IQ, or any JDBC-only source.
You need to join or query across two different database systems in one SQL statement (e.g. PostgreSQL ⋈ MySQL) — see cross-database queries.
You query many different databases from one codebase and want one API (and one fast Arrow path) instead of a different driver per engine.
You need fast bulk reads into pandas / Arrow / Polars from a JDBC database.

Use something else when:

You're on PostgreSQL, MySQL, or SQLite — the native drivers (psycopg, etc.) are faster and need no JVM. Use them.
You need sub-second connect latency (serverless, high-frequency OLTP) — jdbc4py runs a JVM in-process, so the first connection pays ~0.5–1 s of JVM startup (use the pool to amortize it).
You can't run a JVM at all.

vs. the alternatives

Native driver (psycopg, hdbcli, …): faster for single-database, row-by-row access and connect latency. jdbc4py wins on bulk Arrow reads and on databases with no good native driver.
JayDeBeApi: the other JPype-based JDBC bridge. jdbc4py adds the Arrow bulk path, a connection pool, richer type mapping, a SQLAlchemy dialect, and an asyncio facade.
ADBC (Arrow Database Connectivity): the emerging Arrow-native standard, no JVM — the better choice if your database has an ADBC driver. jdbc4py covers the long tail of databases that only ship JDBC.

Features

Arrow bulk reads — results serialize on the JVM side and arrive as a near-zero-copy Arrow table; pandas/Polars/DuckDB reads beat native driver + SQLAlchemy + pandas by 1.8–4.8× on large analytical workloads (see BENCHMARKS.md).
Broad database reach — one API over any JDBC driver, including enterprise databases with thin Python support. See Supported Databases.
Cross-database queries — register several connections and run a single SQL query that joins tables across different database systems; jdbc4py fetches each source via Arrow and executes the join in an in-memory DuckDB.
DBAPI 2.0 + SQLAlchemy — a standard connection/cursor, plus a postgresql+jdbc4py://… dialect so SQLAlchemy, pandas, and ORMs work unchanged.
Write paths — insert Arrow, pandas, Polars, or DuckDB data directly into any JDBC table.
Server-side streaming — stream=True for huge results: instant time-to-first-batch, bounded memory.
Connection pool — idle eviction, broken-connection replacement, configurable timeouts.
Rich type mapping — numeric/decimal, temporal (nanosecond on the Arrow path), JSON, UUID, arrays, ranges, intervals, vectors/embeddings, and full Unicode.
asyncio — jdbc4py.aio runs blocking JDBC calls off the event loop.
Operational niceties — cancel a running query from any thread, uniform metadata helpers, and credential redaction in exceptions and logs.

Requirements

Python 3.11+
Java 11+ (JRE is enough; JDK only needed if building the helper JAR)

Installation

pip install jdbc4py                 # core DBAPI 2.0 (row-by-row fetch)
pip install "jdbc4py[dataframe]"    # + Arrow bulk reads into pandas / Polars / DuckDB

Requires a Java 11+ runtime (a JRE is enough — no JDK needed). The wheel bundles the prebuilt Arrow helper JAR, so there's no build step at install time. You still bring your own JDBC driver JAR for each database — see docs/installation.md for driver placement and environment configuration, and Building from Source to work on jdbc4py itself.

Quick Start

from jdbc4py import connect

with connect(
    engine="postgres",
    host="localhost",
    database="mydb",
    user="me",
    password="secret",
) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT id, name FROM users WHERE active = ?", [True])
        rows = cur.fetchall()

Bulk read into a DataFrame:

df = conn.read_sql("SELECT * FROM events", format="pandas")
# or "polars", "arrow", "duckdb"

Write a DataFrame back:

conn.write_polars(df, "staging", if_exists="replace")

Stream a very large result without buffering it client-side:

df = conn.read_sql("SELECT * FROM huge_table", format="pandas", stream=True)

Use it through SQLAlchemy (and therefore pandas, ORMs, Alembic):

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine("postgresql+jdbc4py://me:secret@localhost:5432/mydb")
df = pd.read_sql("SELECT * FROM events", engine)

Cross-database queries

Join tables that live in different database systems in a single query. Register each connection under an alias, then reference it as alias.table; jdbc4py pulls each source over Arrow and runs the join in an in-memory DuckDB:

from jdbc4py import connect, FederatedSession

with FederatedSession() as fed:
    fed.add_connection("pg", connect(engine="postgres", host="pg-host",
                                     database="analytics", user="me", password="secret"))
    fed.add_connection("mysql", connect(engine="mysql", host="mysql-host",
                                        database="users", user="me", password="secret"))

    df = fed.execute(
        "SELECT c.name, SUM(o.total) AS spend "
        "FROM pg.customers c "
        "JOIN mysql.orders o ON c.id = o.customer_id "
        "GROUP BY c.name",
        format="pandas",
    )

Supported Databases

Database	Engine key	Notes
PostgreSQL	`postgres`	Full type coverage including arrays, ranges, hstore
MySQL	`mysql`
MariaDB	`mariadb`
SQL Server	`mssql`
Oracle	`oracle`
ClickHouse	`clickhouse`
Trino	`trino`
Presto	`presto`
Snowflake	`snowflake`	`connect_snowflake()` helper
Databricks	`databricks`	`connect_databricks()` helper
Google Cloud Spanner	`spanner`	`connect_spanner()` helper
Google BigQuery	`bigquery`	`connect_bigquery()` helper
Sybase ASE / SAP IQ	`sybase_ase` / `sybase_iq`	jConnect or jTDS driver
SAP HANA	`hana`	on-prem & HANA Cloud; `ngdbc.jar` driver
SQLite	`sqlite`	Useful for testing

Any database with a JDBC driver works via the jdbc_url= + driver= + jar= path.

Connection Pool

from jdbc4py import ConnectionPool

pool = ConnectionPool(
    engine="postgres",
    host="localhost",
    database="mydb",
    user="me",
    password="secret",
    min_size=2,
    max_connections=10,
)

with pool.connect() as conn:
    df = conn.read_sql("SELECT * FROM orders", format="pandas")

Benchmarks

On a 50K-row × 50-col read, jdbc4py's Arrow path is the fastest pandas loader on every local engine tested, beating the native driver + SQLAlchemy + pandas by ~1.8× on PostgreSQL, 2.3× MySQL, 3.2× MariaDB, 3.3× Oracle, and 4.8× on SQL Server. On Databricks it's 2.8× faster than databricks-sql-connector; on Snowflake the native connector wins (it reads Snowflake's native Arrow wire format directly).

The win is the Arrow bulk path, not row access: row-by-row fetchall() goes through the JVM and is slower than a native C driver (it creates Python objects one cell at a time), and the first connection pays JVM startup. Rule of thumb — use read_sql(format="pandas"/"arrow") for large reads, pool your connections, and reach for a native driver only if you need fast per-row tuple access.

Full numbers across PostgreSQL/MySQL/MariaDB/SQL Server/ClickHouse/Oracle plus deployed SAP HANA, Snowflake, and Databricks: BENCHMARKS.md

Documentation

Document	Contents
docs/installation.md	Java setup, driver JARs, environment variables
docs/architecture.md	How jdbc4py works internally
docs/performance.md	Arrow path, batching, pool tuning
docs/driver-management.md	Driver discovery, custom JARs, `JDBC4PY_DRIVER_PATHS`
docs/jdbc-internals.md	Cursor, type system, cancellation, pool internals
docs/design-decisions.md	Why JPype, why Arrow IPC, tradeoffs
docs/troubleshooting.md	Common errors and fixes
docs/faq.md	Frequently asked questions
docs/operator-guide.md	Production deployment and security

Building from Source

git clone https://github.com/ryuk-me/jdbc4py
cd jdbc4py
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,dataframe]"

# Build the Java helper JAR (enables Arrow bulk reads)
./java/jdbc4py-helper/build.sh

# Unit tests
pytest -q

# Integration tests (requires Docker)
docker compose --profile core up -d
JDBC4PY_RUN_INTEGRATION=1 pytest tests/integration/ -q

Contributing

Bug reports and pull requests are welcome. For large changes, open an issue first to discuss the approach. Tests are required for new features.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ryuk_me

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Jun 23, 2026

This version

0.1.0

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jdbc4py-0.1.0-py3-none-any.whl (5.4 MB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file jdbc4py-0.1.0-py3-none-any.whl.

File metadata

Download URL: jdbc4py-0.1.0-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 5.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jdbc4py-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d44e3e9017fabd8bcff8e3a124368cad80077441b5a9774bacb6e025e8eaf2c5`
MD5	`14e61510229aa4e014c1a3926560d7d8`
BLAKE2b-256	`37132df684a5062a1fffd5003b7792c59ab884bb78643aab24b922c5b3977858`

See more details on using hashes here.

Provenance

The following attestation bundles were made for jdbc4py-0.1.0-py3-none-any.whl:

Publisher: release.yml on Ryuk-me/jdbc4py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: jdbc4py-0.1.0-py3-none-any.whl
- Subject digest: d44e3e9017fabd8bcff8e3a124368cad80077441b5a9774bacb6e025e8eaf2c5
- Sigstore transparency entry: 1925559731
- Sigstore integration time: Jun 23, 2026
Source repository:
- Permalink: Ryuk-me/jdbc4py@74584d7e6f0b01d44121ef655f7b72b3936039a1
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Ryuk-me
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@74584d7e6f0b01d44121ef655f7b72b3936039a1
- Trigger Event: push

jdbc4py 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

jdbc4py

When to use this

vs. the alternatives

Features

Requirements

Installation

Quick Start

Cross-database queries

Supported Databases

Connection Pool

Benchmarks

Documentation

Building from Source

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance