Use JDBC database drivers from Python 3 with a DB-API, accelerated with Apache Arrow.
Project description
JayDeBeApiArrow - High-Performance JDBC to Python DB-API Bridge
The JayDeBeApiArrow module allows you to connect from Python code to databases using Java JDBC. It provides a Python DB-API v2.0 to that database.
Note: This is a fork of the original JayDeBeApi project.
Key Differences in this Fork
-
High Performance with Apache Arrow: The primary goal of this fork is to significantly improve data fetch performance. Instead of iterating through JDBC ResultSets row-by-row in Python (which has high overhead), this library uses a custom Java extension (
arrow-jdbc-extension) to convert JDBC data into Apache Arrow record batches directly within the JVM. These batches are then efficiently transferred to Python. -
Modernization:
- Python 3 Only: Support for Python 2 has been removed.
- JPype Only: Support for Jython has been removed to focus on the CPython + JPype architecture.
- Strict Typing: Enforces stricter typing for Decimal and temporal types.
It works on ordinary Python (cPython) using the JPype Java integration.
Install
You can get and install JayDeBeApiArrow with pip:
pip install JayDeBeApiArrow
Or you can get a copy of the source by cloning from the JayDeBeApiArrow github project and install with:
uv sync
Ensure that you have installed JPype properly (it will be installed automatically by uv sync).
Usage
Basically you just import the jaydebeapiarrow Python module and execute the connect method. This gives you a DB-API conform connection to the database.
The first argument to connect is the name of the Java driver class. The second argument is a string with the JDBC connection URL. Third you can optionally supply a sequence consisting of user and password or alternatively a dictionary containing arguments that are internally passed as properties to the Java DriverManager.getConnection method. See the Javadoc of DriverManager class for details.
The next parameter to connect is optional as well and specifies the jar-Files of the driver if your classpath isn't set up sufficiently yet. The classpath set in CLASSPATH environment variable will be honored.
Here is an example:
import jaydebeapiarrow
conn = jaydebeapiarrow.connect(
"org.hsqldb.jdbcDriver",
"jdbc:hsqldb:mem:.",
["SA", ""],
"/path/to/hsqldb.jar"
)
curs = conn.cursor()
curs.execute('create table CUSTOMER'
'("CUST_ID" INTEGER not null,'
' "NAME" VARCHAR(50) not null,'
' primary key ("CUST_ID"))')
curs.execute("insert into CUSTOMER values (?, ?)", (1, 'John'))
curs.execute("select * from CUSTOMER")
print(curs.fetchall())
# Output: [(1, 'John')]
curs.close()
conn.close()
If you're having trouble getting this work check if your JAVA_HOME environment variable is set correctly. For example:
JAVA_HOME=/usr/lib/jvm/java-8-openjdk python
An alternative way to establish connection using connection properties:
conn = jaydebeapiarrow.connect(
"org.hsqldb.jdbcDriver",
"jdbc:hsqldb:mem:.",
{
'user': "SA", 'password': "",
'other_property': "foobar"
},
"/path/to/hsqldb.jar"
)
Also using the with statement might be handy:
with jaydebeapiarrow.connect(
"org.hsqldb.jdbcDriver",
"jdbc:hsqldb:mem:.",
["SA", ""],
"/path/to/hsqldb.jar"
) as conn:
with conn.cursor() as curs:
curs.execute("select count(*) from CUSTOMER")
print(curs.fetchall())
# Output: [(1,)]
Supported Databases
In theory every database with a suitable JDBC driver should work. It is confirmed to work with the following databases:
- SQLite
- Hypersonic SQL (HSQLDB)
- IBM DB2
- IBM DB2 for mainframes
- Oracle
- Teradata DB
- Netezza
- Mimer DB
- Microsoft SQL Server
- MySQL
- PostgreSQL
- ...and many more.
Testing
Integration tests are located in test/. The test suite covers SQLite (in-memory), PostgreSQL, MySQL, and HSQLDB.
Build JARs and download drivers
uv run bash test/build.sh # Build arrow-jdbc-extension and MockDriver JARs
uv run bash test/download_jdbc_drivers.sh # Download PostgreSQL, MySQL, SQLite, HSQLDB JDBC drivers
Run tests
CLASSPATH="test/jars/*" uv run python -m unittest test.test_integration.HsqldbTest # HSQLDB
CLASSPATH="test/jars/*" uv run python -m unittest test.test_integration.SqliteXerialTest # SQLite
CLASSPATH="test/jars/*" uv run python -m unittest test.test_mock # Mock driver
External database tests
PostgreSQL and MySQL tests require running database instances. Docker Compose configs and helper scripts are provided in test/:
# Start both databases
bash test/start.sh
# Check status
bash test/status.sh
# Stop databases
bash test/stop.sh
Database connection defaults (overridable via environment variables):
| Database | Host | Port | DB | User | Password | Env prefix |
|---|---|---|---|---|---|---|
| PostgreSQL | localhost | 5432 | test_db | user | password | JY_PG_* |
| MySQL | localhost | 3306 | test_db | user | password | JY_MYSQL_* |
Benchmarks
This approach was inspired by Uwe Korn's work on pyarrow.jvm (Apache Drill) and Razvi Noorul's Trino benchmarks, both demonstrating 100x+ speedups by using Arrow to bypass JPype's row-by-row serialization.
Our benchmarks (local PostgreSQL, 5M rows, 4 columns) show a ~20x speedup over plain jaydebeapi. The difference in multiplier is due to methodology: both posts tested against distributed query engines (Drill, Trino) over network connections, which have much higher per-row JDBC overhead. PostgreSQL's JDBC driver is significantly faster at row retrieval, so the baseline is lower and there's less headroom for a multiplier. The absolute Arrow throughput is comparable across all three.
| Method | 5M rows | Throughput | vs jaydebeapi |
|---|---|---|---|
| jaydebeapi (baseline) | 198.66s | 25K rows/s | — |
| Drop-in replacement | 25.82s | 194K rows/s | 7.7x |
| Native Arrow API | 9.38s | 542K rows/s | 21.2x |
| Psycopg2 (native driver) | 7.34s | 682K rows/s | 27x |
See benchmark/ for scripts to reproduce these results.
Contributing
Please submit bugs and patches to the JayDeBeApiArrow issue tracker. All contributors will be acknowledged. Thanks!
License
JayDeBeApiArrow is released under the GNU Lesser General Public license (LGPL). See the file COPYING and COPYING.LESSER in the distribution for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jaydebeapiarrow-2.1.1.tar.gz.
File metadata
- Download URL: jaydebeapiarrow-2.1.1.tar.gz
- Upload date:
- Size: 9.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ce8d23b0ccd1573cd9afa54d437036a350aa04d4d8e947e2024af3b402cc2c7
|
|
| MD5 |
ed5bb7b8041d447b95d44ac8911137a7
|
|
| BLAKE2b-256 |
cff992c28792a112d4e564579e95cefb9a2ad2d48c59423789c0b89a6b08302c
|
Provenance
The following attestation bundles were made for jaydebeapiarrow-2.1.1.tar.gz:
Publisher:
publish.yml on HenryNebula/jaydebeapiarrow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jaydebeapiarrow-2.1.1.tar.gz -
Subject digest:
7ce8d23b0ccd1573cd9afa54d437036a350aa04d4d8e947e2024af3b402cc2c7 - Sigstore transparency entry: 1301580836
- Sigstore integration time:
-
Permalink:
HenryNebula/jaydebeapiarrow@d74e5d60983aae1e0a05eccc37260fec1d9f2e53 -
Branch / Tag:
refs/tags/v2.1.1 - Owner: https://github.com/HenryNebula
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d74e5d60983aae1e0a05eccc37260fec1d9f2e53 -
Trigger Event:
release
-
Statement type:
File details
Details for the file jaydebeapiarrow-2.1.1-py3-none-any.whl.
File metadata
- Download URL: jaydebeapiarrow-2.1.1-py3-none-any.whl
- Upload date:
- Size: 9.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd7239277a4a3324c23e7cd09e285c448f7fba2d5500d783db6cda6a0ef6c588
|
|
| MD5 |
2d9568030ce7975cba70779182edc81e
|
|
| BLAKE2b-256 |
d9e08f3a7aaa869e93320911e5bdd6b1f6f39dab8383e89c54c4749b90a9aecf
|
Provenance
The following attestation bundles were made for jaydebeapiarrow-2.1.1-py3-none-any.whl:
Publisher:
publish.yml on HenryNebula/jaydebeapiarrow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jaydebeapiarrow-2.1.1-py3-none-any.whl -
Subject digest:
fd7239277a4a3324c23e7cd09e285c448f7fba2d5500d783db6cda6a0ef6c588 - Sigstore transparency entry: 1301580956
- Sigstore integration time:
-
Permalink:
HenryNebula/jaydebeapiarrow@d74e5d60983aae1e0a05eccc37260fec1d9f2e53 -
Branch / Tag:
refs/tags/v2.1.1 - Owner: https://github.com/HenryNebula
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d74e5d60983aae1e0a05eccc37260fec1d9f2e53 -
Trigger Event:
release
-
Statement type: