A JDBC-first Python database bridge powered by JPype
Project description
jdbc4py
Python access to databases whose native drivers are weak, commercial, or missing — and a fast Arrow bulk-read path across any database with a JDBC driver.
jdbc4py embeds a JVM in-process via JPype, loads your JDBC driver JAR, and hands back a standard DBAPI 2.0 connection — plus near-zero-copy Arrow reads into pandas, Polars, and DuckDB.
When to use this
Reach for jdbc4py when:
- Your database's native Python driver is weak, commercial, or nonexistent — SAP HANA, Sybase ASE/IQ, or any JDBC-only source.
- You need to join or query across two different database systems in one SQL statement (e.g. PostgreSQL ⋈ MySQL) — see cross-database queries.
- You query many different databases from one codebase and want one API (and one fast Arrow path) instead of a different driver per engine.
- You need fast bulk reads into pandas / Arrow / Polars from a JDBC database.
Use something else when:
- You're on PostgreSQL, MySQL, or SQLite — the native drivers (psycopg, etc.) are faster and need no JVM. Use them.
- You need sub-second connect latency (serverless, high-frequency OLTP) — jdbc4py runs a JVM in-process, so the first connection pays ~0.5–1 s of JVM startup (use the pool to amortize it).
- You can't run a JVM at all.
vs. the alternatives
- Native driver (psycopg, hdbcli, …): faster for single-database, row-by-row access and connect latency. jdbc4py wins on bulk Arrow reads and on databases with no good native driver.
- JayDeBeApi: the other JPype-based JDBC bridge. jdbc4py adds the Arrow bulk path, a connection pool, richer type mapping, a SQLAlchemy dialect, and an asyncio facade.
- ADBC (Arrow Database Connectivity): the emerging Arrow-native standard, no JVM — the better choice if your database has an ADBC driver. jdbc4py covers the long tail of databases that only ship JDBC.
Features
- Arrow bulk reads — results serialize on the JVM side and arrive as a near-zero-copy Arrow table; pandas/Polars/DuckDB reads beat native driver + SQLAlchemy + pandas by 1.8–4.8× on large analytical workloads (see BENCHMARKS.md).
- Broad database reach — one API over any JDBC driver, including enterprise databases with thin Python support. See Supported Databases.
- Cross-database queries — register several connections and run a single SQL query that joins tables across different database systems; jdbc4py fetches each source via Arrow and executes the join in an in-memory DuckDB.
- DBAPI 2.0 + SQLAlchemy — a standard connection/cursor, plus a
postgresql+jdbc4py://…dialect so SQLAlchemy, pandas, and ORMs work unchanged. - Write paths — insert Arrow, pandas, Polars, or DuckDB data directly into any JDBC table.
- Server-side streaming —
stream=Truefor huge results: instant time-to-first-batch, bounded memory. - Connection pool — idle eviction, broken-connection replacement, configurable timeouts.
- Rich type mapping — numeric/decimal, temporal (nanosecond on the Arrow path), JSON, UUID, arrays, ranges, intervals, vectors/embeddings, and full Unicode.
- asyncio —
jdbc4py.aioruns blocking JDBC calls off the event loop. - Operational niceties — cancel a running query from any thread, uniform metadata helpers, and credential redaction in exceptions and logs.
Requirements
- Python 3.11+
- Java 11+ (JRE is enough; JDK only needed if building the helper JAR)
Installation
pip install jdbc4py # core DBAPI 2.0 (row-by-row fetch)
pip install "jdbc4py[dataframe]" # + Arrow bulk reads into pandas / Polars / DuckDB
Requires a Java 11+ runtime (a JRE is enough — no JDK needed). The wheel bundles the prebuilt Arrow helper JAR, so there's no build step at install time. You still bring your own JDBC driver JAR for each database — see docs/installation.md for driver placement and environment configuration, and Building from Source to work on jdbc4py itself.
Quick Start
from jdbc4py import connect
with connect(
engine="postgres",
host="localhost",
database="mydb",
user="me",
password="secret",
) as conn:
with conn.cursor() as cur:
cur.execute("SELECT id, name FROM users WHERE active = ?", [True])
rows = cur.fetchall()
Bulk read into a DataFrame:
df = conn.read_sql("SELECT * FROM events", format="pandas")
# or "polars", "arrow", "duckdb"
Write a DataFrame back:
conn.write_polars(df, "staging", if_exists="replace")
Stream a very large result without buffering it client-side:
df = conn.read_sql("SELECT * FROM huge_table", format="pandas", stream=True)
Use it through SQLAlchemy (and therefore pandas, ORMs, Alembic):
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("postgresql+jdbc4py://me:secret@localhost:5432/mydb")
df = pd.read_sql("SELECT * FROM events", engine)
Cross-database queries
Join tables that live in different database systems in a single query.
Register each connection under an alias, then reference it as alias.table;
jdbc4py pulls each source over Arrow and runs the join in an in-memory DuckDB:
from jdbc4py import connect, FederatedSession
with FederatedSession() as fed:
fed.add_connection("pg", connect(engine="postgres", host="pg-host",
database="analytics", user="me", password="secret"))
fed.add_connection("mysql", connect(engine="mysql", host="mysql-host",
database="users", user="me", password="secret"))
df = fed.execute(
"SELECT c.name, SUM(o.total) AS spend "
"FROM pg.customers c "
"JOIN mysql.orders o ON c.id = o.customer_id "
"GROUP BY c.name",
format="pandas",
)
Supported Databases
| Database | Engine key | Notes |
|---|---|---|
| PostgreSQL | postgres |
Full type coverage including arrays, ranges, hstore |
| MySQL | mysql |
|
| MariaDB | mariadb |
|
| SQL Server | mssql |
|
| Oracle | oracle |
|
| ClickHouse | clickhouse |
|
| Trino | trino |
|
| Presto | presto |
|
| Snowflake | snowflake |
connect_snowflake() helper |
| Databricks | databricks |
connect_databricks() helper |
| Google Cloud Spanner | spanner |
connect_spanner() helper |
| Google BigQuery | bigquery |
connect_bigquery() helper |
| Sybase ASE / SAP IQ | sybase_ase / sybase_iq |
jConnect or jTDS driver |
| SAP HANA | hana |
on-prem & HANA Cloud; ngdbc.jar driver |
| SQLite | sqlite |
Useful for testing |
Any database with a JDBC driver works via the jdbc_url= + driver= + jar= path.
Connection Pool
from jdbc4py import ConnectionPool
pool = ConnectionPool(
engine="postgres",
host="localhost",
database="mydb",
user="me",
password="secret",
min_size=2,
max_connections=10,
)
with pool.connect() as conn:
df = conn.read_sql("SELECT * FROM orders", format="pandas")
Benchmarks
On a 50K-row × 50-col read, jdbc4py's Arrow path is the fastest pandas loader on every local engine tested, beating the native driver + SQLAlchemy + pandas by ~1.8× on PostgreSQL, 2.3× MySQL, 3.2× MariaDB, 3.3× Oracle, and 4.8× on SQL Server. On Databricks it's 2.8× faster than databricks-sql-connector; on Snowflake the native connector wins (it reads Snowflake's native Arrow wire format directly).
The win is the Arrow bulk path, not row access: row-by-row fetchall() goes through the JVM and is slower than a native C driver (it creates Python objects one cell at a time), and the first connection pays JVM startup. Rule of thumb — use read_sql(format="pandas"/"arrow") for large reads, pool your connections, and reach for a native driver only if you need fast per-row tuple access.
Full numbers across PostgreSQL/MySQL/MariaDB/SQL Server/ClickHouse/Oracle plus deployed SAP HANA, Snowflake, and Databricks: BENCHMARKS.md
Documentation
| Document | Contents |
|---|---|
| docs/installation.md | Java setup, driver JARs, environment variables |
| docs/architecture.md | How jdbc4py works internally |
| docs/performance.md | Arrow path, batching, pool tuning |
| docs/driver-management.md | Driver discovery, custom JARs, JDBC4PY_DRIVER_PATHS |
| docs/jdbc-internals.md | Cursor, type system, cancellation, pool internals |
| docs/design-decisions.md | Why JPype, why Arrow IPC, tradeoffs |
| docs/troubleshooting.md | Common errors and fixes |
| docs/faq.md | Frequently asked questions |
| docs/operator-guide.md | Production deployment and security |
Building from Source
git clone https://github.com/ryuk-me/jdbc4py
cd jdbc4py
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,dataframe]"
# Build the Java helper JAR (enables Arrow bulk reads)
./java/jdbc4py-helper/build.sh
# Unit tests
pytest -q
# Integration tests (requires Docker)
docker compose --profile core up -d
JDBC4PY_RUN_INTEGRATION=1 pytest tests/integration/ -q
Contributing
Bug reports and pull requests are welcome. For large changes, open an issue first to discuss the approach. Tests are required for new features.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jdbc4py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jdbc4py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d44e3e9017fabd8bcff8e3a124368cad80077441b5a9774bacb6e025e8eaf2c5
|
|
| MD5 |
14e61510229aa4e014c1a3926560d7d8
|
|
| BLAKE2b-256 |
37132df684a5062a1fffd5003b7792c59ab884bb78643aab24b922c5b3977858
|
Provenance
The following attestation bundles were made for jdbc4py-0.1.0-py3-none-any.whl:
Publisher:
release.yml on Ryuk-me/jdbc4py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jdbc4py-0.1.0-py3-none-any.whl -
Subject digest:
d44e3e9017fabd8bcff8e3a124368cad80077441b5a9774bacb6e025e8eaf2c5 - Sigstore transparency entry: 1925559731
- Sigstore integration time:
-
Permalink:
Ryuk-me/jdbc4py@74584d7e6f0b01d44121ef655f7b72b3936039a1 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Ryuk-me
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@74584d7e6f0b01d44121ef655f7b72b3936039a1 -
Trigger Event:
push
-
Statement type: