Skip to main content

Apache Arrow PostgreSQL connector

Project description

Pgeon 🐦

Build License: MIT

Apache Arrow PostgreSQL connector

pgeon provides a C++ library and (very) simple python bindings. Almost all PostgreSQL native types are supported (see below).

This project is similar to pg2arrow and is heavily inspired by it. The main differences are the use of COPY instead of FETCH and that our implementation uses the Arrow C++ API.

The goal of pgeon is to provide fast bulk data download from a PostgreSQL database into Apache Arrow tables. If you're looking to upload data, you might want to have a look at Arrow ADBC.

Usage

from pgeon import copy_query
db = "postgresql://postgres@localhost:5432/postgres"
query = "SELECT TIMESTAMP '2001-01-01 14:00:00'"
tbl = copy_query(db, query)

The actual query performed is COPY ({query}) TO STDOUT (FORMAT binary), see this page for more information.

Installation

Building and running pgeon requires libpq to be available on your system.

Python

Install from source using pip with

git clone https://github.com/0x0L/pgeon.git
cd pgeon
pip install .

On linux, if pyarrow is already installed as a conda package, you may want to use

CONDA_BUILD=1 pip install .

[optional] C++ library and tools

This requires cmake and ninja. In addition you'll need to install libpq and the Arrow C++ libraries (e.g. arrow-cpp in conda)

mkdir build
cd build
cmake -GNinja ..
ninja

Performance

Elapsed time distributions of a query fetching 7 columns (1 timestamp, 2 ints, 4 reals) and around 4.5 million rows. The result is returned as a pandas.DataFrame in all cases.

Notes

  • Queries using ROW (e.g. SELECT ROW('a', 1)) do not work (anonymous structs)

  • SQL arrays are mapped to pyarrow.list_(...). Only 1D arrays are fully supported. Higher dimensional arrays will be flattened.

  • BitString types output format is not really helpful

  • tsvector types with letter weights are not supported

  • PostgreSQL range and domain types are not supported.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pgeon-0.2.0a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pgeon-0.2.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pgeon-0.2.0a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

File details

Details for the file pgeon-0.2.0a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pgeon-0.2.0a0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3f42d3c305221bb3cc86be040c46e357fb40706e6ed4597efee7f0841ba56798
MD5 07647211e39379f1c10a7e5d26faa4df
BLAKE2b-256 a4ba3011f62a370a04158e6e7314ef4dfb53809f4632c941d48a3f0ba0438887

See more details on using hashes here.

File details

Details for the file pgeon-0.2.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pgeon-0.2.0a0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ecda2e5e5ba3daed26178f6767999c6e17f3f15ca63b9a6a70201fb7be213a76
MD5 ce1bf84683c1db192b11471faa05f825
BLAKE2b-256 14f2261fcf280db926321655d66c20987520a9bc2773faac59ffbdf1e9418fbd

See more details on using hashes here.

File details

Details for the file pgeon-0.2.0a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pgeon-0.2.0a0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9fee542d337ba4e3213c4b9268540de91a8d9d3ad40e3639e5d9f4f9d7b64757
MD5 a0941212083a2b98c525f5c5bc1cab89
BLAKE2b-256 abd60bf68c9e60e6455787eaa892a090538a5b3f51c9c19b98a3b2b5cbeee5c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page