Fast and Efficient PostgreSQL data into numpy/pandas

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

flaco

PyPI - Wheel

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

Have a gander at the initial benchmarks 🏋

flaco tends to use nearly ~3-5x less memory than standard pandas.read_sql and about ~3x faster. However, it's probably 50x less stable at the moment. 😜

To whet your appetite, here's a memory profile between flaco and pandas.read_sql on a table with 2M rows with columns of various types. (see test_benchmarks.py) *If the test data has null values, you can expect a ~3x saving, instead of the ~5x you see here; therefore (hot tip 🔥), supply fill values in your queries where possible via coalesce.

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    99    140.5 MiB    140.5 MiB           1   @profile
   100                                         def memory_profile():
   101    140.5 MiB      0.0 MiB           1       stmt = "select * from test_table"
   102                                         
   103                                             
   104    140.9 MiB      0.4 MiB           1       with Database(DB_URI) as con:
   105    441.8 MiB    300.9 MiB           1           data = read_sql(stmt, con)
   106    441.8 MiB      0.0 MiB           1           _flaco_df = pd.DataFrame(data, copy=False)
   107                                         
   108                                             
   109    441.8 MiB      0.0 MiB           1       engine = create_engine(DB_URI)
   110   2091.5 MiB   1649.7 MiB           1       _pandas_df = pd.read_sql(stmt, engine)

Example

from flaco.io import read_sql, Database


uri = "postgresql://postgres:postgres@localhost:5432/postgres"
stmt = "select * from my_big_table"

with Database(uri) as con:
    data = read_sql(stmt, con)  # dict of column name to numpy array

# If you have pandas installed, you can create a DataFrame
# with zero copying like this:
import pandas as pd
df = pd.DataFrame(data, copy=False)

# If you know the _exact_ rows which will be returned
# you can supply n_rows to perform a single array 
# allocation without needing to resize during query reading.
with Database(uri) as con:
    data = read_sql(stmt, con, 1_000)

Notes

Is this a drop in replacement for pandas.read_sql?

No. It varies in a few ways:

It will return a dict of str ➡ numpy.ndarray objects. But this can be passed with zero copies to pandas.DataFrame
When querying integer columns, if a null is encountered, the array will be converted to dtype=object and nulls from PostgreSQL will be None. Whereas pandas will convert the underlying array to a float type; where nulls from postgres are basically numpy.nan types.
It lacks basically all of the options pandas.read_sql has.

Furthermore, while it's pretty neat this lib can allow faster and less resource intensive use of numpy/pandas against PostgreSQL, it's in early stages of development and you're likely to encounter some sharp edges which include, but not limited to:

📝 Poor/non-existant error messages
💩 Core dumps
🚰 Memory leaks (although I think most are handled now)
🦖 Almost complete lack of exception handling from underlying Rust/C interface
📍 PostgreSQL numeric type should ideally be converted to decimal.Decimal but uses f64 for now; potentially loosing precision. Note, this is exactly what pandas.read_sql does.
❗ Might not handle all or custom arbitrary PostgreSQL types. If you encounter such types, either convert them to a supported type like text/json/jsonb (ie select my_field::text ...), or open an issue if a standard type is not supported.

License

Why did you choose such lax licensing? Could you change to a copy left license, please?

...just kidding, no one would ask that. This is dual licensed under Unlicense and MIT.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.6.0

Oct 28, 2022

0.6.0rc4 pre-release

Oct 9, 2022

0.6.0rc3 pre-release

Sep 27, 2022

0.6.0rc2 pre-release

Sep 16, 2022

0.6.0rc1 pre-release

Sep 11, 2022

0.5.1

Dec 23, 2021

0.5.0

Nov 11, 2021

This version

0.4.1

Oct 26, 2021

0.4.0

Oct 26, 2021

0.3.0

Oct 21, 2021

0.2.0

Oct 18, 2021

0.2.0rc1 pre-release

Oct 18, 2021

0.1.0.post3

Oct 13, 2021

0.1.0.post2

Oct 8, 2021

0.1.0.post1

Oct 7, 2021

0.1.0

Oct 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

flaco-0.4.1-pp37-pypy37_pp73-win_amd64.whl (1.1 MB view hashes)

Uploaded Oct 26, 2021 PyPy Windows x86-64

flaco-0.4.1-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.0 MB view hashes)

Uploaded Oct 26, 2021 PyPy manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp310-cp310-win_amd64.whl (580.6 kB view hashes)

Uploaded Oct 26, 2021 CPython 3.10 Windows x86-64

flaco-0.4.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded Oct 26, 2021 CPython 3.10 manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp39-cp39-win_amd64.whl (580.5 kB view hashes)

Uploaded Oct 26, 2021 CPython 3.9 Windows x86-64

flaco-0.4.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded Oct 26, 2021 CPython 3.9 manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp38-cp38-win_amd64.whl (580.4 kB view hashes)

Uploaded Oct 26, 2021 CPython 3.8 Windows x86-64

flaco-0.4.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded Oct 26, 2021 CPython 3.8 manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp37-cp37m-win_amd64.whl (579.5 kB view hashes)

Uploaded Oct 26, 2021 CPython 3.7m Windows x86-64

flaco-0.4.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded Oct 26, 2021 CPython 3.7m manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp36-cp36m-win_amd64.whl (579.3 kB view hashes)

Uploaded Oct 26, 2021 CPython 3.6m Windows x86-64

flaco-0.4.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded Oct 26, 2021 CPython 3.6m manylinux: glibc 2.12+ x86-64

Hashes for flaco-0.4.1-pp37-pypy37_pp73-win_amd64.whl

Hashes for flaco-0.4.1-pp37-pypy37_pp73-win_amd64.whl
Algorithm	Hash digest
SHA256	`dafb35db113a4db6799fdf81cf127d3dd86a827aaa66571382e60d8673c6caf4`
MD5	`92e3a7ba062bc9c39ca5ec53fcf66575`
BLAKE2b-256	`84531f58fc7a76b7db6e20e65c127facbfa35a7ba26df389997f533bdc498566`

Hashes for flaco-0.4.1-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl

Hashes for flaco-0.4.1-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`3bb65a4d9a93d2af5bcf67e8c9f432d0c9e71b224c00789715e6d59b7d199e5b`
MD5	`3d6d399606ab400d41aaf78aa92a33f2`
BLAKE2b-256	`1d67b03546d907219e17b9392a8fd34e177ce4d2d11e1921494035f25b1ba49e`

Hashes for flaco-0.4.1-cp310-cp310-win_amd64.whl

Hashes for flaco-0.4.1-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`50e23943457b8ef18a1a7712695a8f3df17061ee76c0c4f6a07192816d2170d7`
MD5	`bce90cbdaf9fade78a12742c9444905f`
BLAKE2b-256	`6eb7495c78365da682aebd7fb29c5af08363c703622f9acaa6898a85ca34ed8d`

Hashes for flaco-0.4.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl

Hashes for flaco-0.4.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`e5f0a226f401f3f6c0547a859ab64b5f3f1b04ea0c1461e6d8fb2e470804a029`
MD5	`da4069ca89b600e1762478c963ac0681`
BLAKE2b-256	`0e01c3d3cc05e1c077577b032d7872d26bf02998204f5d2591efbe18f21e15bf`

Hashes for flaco-0.4.1-cp39-cp39-win_amd64.whl

Hashes for flaco-0.4.1-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`7bd326376f19260370eceadd8f35761d82477e08cae890de573ff65ee9661a8c`
MD5	`1f1ba254f353527aefdf0df77410922b`
BLAKE2b-256	`2eb0877199c6924452f1e6abe1235fc92c7ea2255748a4f6ea1e1c34cc1fea60`

Hashes for flaco-0.4.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl

Hashes for flaco-0.4.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`777a1870965093a2b00cfe040963a1f833acd55b9e1e4756184f120910c8e2ec`
MD5	`8a308879e24e55ab1e7b3e53ade1757b`
BLAKE2b-256	`2bf94afa950b23393acf744a2cfc101993b5023bd45acbb04336d3b69b41e62e`

Hashes for flaco-0.4.1-cp38-cp38-win_amd64.whl

Hashes for flaco-0.4.1-cp38-cp38-win_amd64.whl
Algorithm	Hash digest
SHA256	`94ad5a399dcaeed377c87426ec98b4a4b99cb4fbfda0d4c470b18df83a0ac8d8`
MD5	`34166ccadc51a9a72c63821f729afa3d`
BLAKE2b-256	`8d8f82078b38f762439f8697fb0f26012d63cd36c4b45d4299fe2e2c75d281f4`

Hashes for flaco-0.4.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl

Hashes for flaco-0.4.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`2669d20d0e16901992f029aba45019047aa7d8d89560492afa1437ea47e4c60e`
MD5	`3e85aa4125798e27b199ed7b5d44d116`
BLAKE2b-256	`4f0fcaf90ee1af97c45207e619358d10b77f156e40d614117fc19d4e40bb862c`

Hashes for flaco-0.4.1-cp37-cp37m-win_amd64.whl

Hashes for flaco-0.4.1-cp37-cp37m-win_amd64.whl
Algorithm	Hash digest
SHA256	`d75b1ee447c36f15309495d05b5fb70b41d3f3c6ddbb921d819b36e8bc1b0b51`
MD5	`747ffe534420601798f754d945775b31`
BLAKE2b-256	`544ad87a7d84148e1fb8ea0561df5d4ee49246a75441dbbb3282d6a4d1941af6`

Hashes for flaco-0.4.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl

Hashes for flaco-0.4.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`2e997b72ef968eebd55c687b7148696d112c5d6a6936925257173c7277ea50a3`
MD5	`ba115c4e545b6909feb4bd5103809c5c`
BLAKE2b-256	`8b9c1f2ab0b32fa4711ff1ef390ed92f25518875b353a4b1008d906d6191b8d5`

Hashes for flaco-0.4.1-cp36-cp36m-win_amd64.whl

Hashes for flaco-0.4.1-cp36-cp36m-win_amd64.whl
Algorithm	Hash digest
SHA256	`922b7d3ece29667340790b475d65fea7e31f2eebfb525427d68c5d97f15447bd`
MD5	`b4ce491e46a0ff9fee34e36a3eb93dca`
BLAKE2b-256	`b7990342552c50c6822d8f737d88c7080209a8082c4ef825ae82d93963fce9d2`

Hashes for flaco-0.4.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl

Hashes for flaco-0.4.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm	Hash digest
SHA256	`3ab8ea451c6e8ee5f90b95c59c92740a40b2750ac9e4416173c2660c92c56f79`
MD5	`636dc3d5ef0b968cb2a2f89c2b8a912a`
BLAKE2b-256	`1f2a4fab219d49aeebae3bb8db6dfcf1c3925a90488476418abdaf21b19dfe27`