Apache Arrow PostgreSQL connector
Project description
Pgeon 🐦
Apache Arrow PostgreSQL connector
pgeon
provides a C++ library and (very) simple python bindings. Almost all
PostgreSQL native types are supported (see below).
This project is similar to pg2arrow and is heavily inspired by it. The main differences are the use of COPY
instead of FETCH
and that our implementation uses the Arrow C++ API.
The goal of pgeon
is to provide fast bulk data download from a PostgreSQL database into Apache Arrow tables. If you're looking to upload data, you might want to have a look at Arrow ADBC.
Usage
from pgeon import copy_query
db = "postgresql://postgres@localhost:5432/postgres"
query = "SELECT TIMESTAMP '2001-01-01 14:00:00'"
tbl = copy_query(db, query)
The actual query performed is COPY ({query}) TO STDOUT (FORMAT binary)
, see this page for more information.
Installation
Building and running pgeon
requires libpq to be available on your system.
Python
Install from source using pip with
git clone https://github.com/0x0L/pgeon.git
cd pgeon
pip install .
On linux, if pyarrow
is already installed as a conda package, you may want to use
CONDA_BUILD=1 pip install .
[optional] C++ library and tools
This requires cmake and ninja. In addition you'll need to install libpq
and the Arrow C++ libraries (e.g. arrow-cpp
in conda)
mkdir build
cd build
cmake -GNinja ..
ninja
Performance
Elapsed time distributions of a query fetching 7 columns (1 timestamp, 2 ints, 4 reals) and around 4.5 million rows. The result is returned as a pandas.DataFrame
in all cases.
Notes
-
Queries using
ROW
(e.g.SELECT ROW('a', 1)
) do not work (anonymous structs) -
SQL arrays are mapped to
pyarrow.list_(...)
. Only 1D arrays are fully supported. Higher dimensional arrays will be flattened. -
BitString types output format is not really helpful
-
tsvector types with letter weights are not supported
-
PostgreSQL range and domain types are not supported.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file pgeon-0.2.0a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pgeon-0.2.0a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f42d3c305221bb3cc86be040c46e357fb40706e6ed4597efee7f0841ba56798 |
|
MD5 | 07647211e39379f1c10a7e5d26faa4df |
|
BLAKE2b-256 | a4ba3011f62a370a04158e6e7314ef4dfb53809f4632c941d48a3f0ba0438887 |
Provenance
File details
Details for the file pgeon-0.2.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pgeon-0.2.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.4 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecda2e5e5ba3daed26178f6767999c6e17f3f15ca63b9a6a70201fb7be213a76 |
|
MD5 | ce1bf84683c1db192b11471faa05f825 |
|
BLAKE2b-256 | 14f2261fcf280db926321655d66c20987520a9bc2773faac59ffbdf1e9418fbd |
Provenance
File details
Details for the file pgeon-0.2.0a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pgeon-0.2.0a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.4 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fee542d337ba4e3213c4b9268540de91a8d9d3ad40e3639e5d9f4f9d7b64757 |
|
MD5 | a0941212083a2b98c525f5c5bc1cab89 |
|
BLAKE2b-256 | abd60bf68c9e60e6455787eaa892a090538a5b3f51c9c19b98a3b2b5cbeee5c6 |