Skip to main content

Fast and Efficient PostgreSQL data into numpy/pandas

Project description

flaco

Code Style CI PyPI PyPI - Wheel Downloads

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

Have a gander at the initial benchmarks 🏋

flaco tends to use nearly ~3-5x less memory than standard pandas.read_sql and about ~3x faster. However, it's probably 50x less stable at the moment. 😜

To whet your appetite, here's a memory profile between flaco and pandas.read_sql on a table with 2M rows with columns of various types. (see test_benchmarks.py) *If the test data has null values, you can expect a ~3x saving, instead of the ~5x you see here; therefore (hot tip 🔥), supply fill values in your queries where possible via coalesce.

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    99    140.5 MiB    140.5 MiB           1   @profile
   100                                         def memory_profile():
   101    140.5 MiB      0.0 MiB           1       stmt = "select * from test_table"
   102                                         
   103                                             
   104    140.9 MiB      0.4 MiB           1       with Database(DB_URI) as con:
   105    441.8 MiB    300.9 MiB           1           data = read_sql(stmt, con)
   106    441.8 MiB      0.0 MiB           1           _flaco_df = pd.DataFrame(data, copy=False)
   107                                         
   108                                             
   109    441.8 MiB      0.0 MiB           1       engine = create_engine(DB_URI)
   110   2091.5 MiB   1649.7 MiB           1       _pandas_df = pd.read_sql(stmt, engine)

Example

from flaco.io import read_sql, Database


uri = "postgresql://postgres:postgres@localhost:5432/postgres"
stmt = "select * from my_big_table"

with Database(uri) as con:
    data = read_sql(stmt, con)  # dict of column name to numpy array

# If you have pandas installed, you can create a DataFrame
# with zero copying like this:
import pandas as pd
df = pd.DataFrame(data, copy=False)

# If you know the _exact_ rows which will be returned
# you can supply n_rows to perform a single array 
# allocation without needing to resize during query reading.
with Database(uri) as con:
    data = read_sql(stmt, con, 1_000)

Notes

Is this a drop in replacement for pandas.read_sql?

No. It varies in a few ways:

  • It will return a dict of strnumpy.ndarray objects. But this can be passed with zero copies to pandas.DataFrame
  • When querying integer columns, if a null is encountered, the array will be converted to dtype=object and nulls from PostgreSQL will be None. Whereas pandas will convert the underlying array to a float type; where nulls from postgres are basically numpy.nan types.
  • It lacks basically all of the options pandas.read_sql has.

Furthermore, while it's pretty neat this lib can allow faster and less resource intensive use of numpy/pandas against PostgreSQL, it's in early stages of development and you're likely to encounter some sharp edges which include, but not limited to:

  • 📝 Poor/non-existant error messages
  • 💩 Core dumps
  • 🚰 Memory leaks (although I think most are handled now)
  • 🦖 Almost complete lack of exception handling from underlying Rust/C interface
  • 📍 PostgreSQL numeric type should ideally be converted to decimal.Decimal but uses f64 for now; potentially loosing precision. Note, this is exactly what pandas.read_sql does.
  • ❗ Might not handle all or custom arbitrary PostgreSQL types. If you encounter such types, either convert them to a supported type like text/json/jsonb (ie select my_field::text ...), or open an issue if a standard type is not supported.

License

Why did you choose such lax licensing? Could you change to a copy left license, please?

...just kidding, no one would ask that. This is dual licensed under Unlicense and MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

flaco-0.4.1-pp37-pypy37_pp73-win_amd64.whl (1.1 MB view details)

Uploaded PyPy Windows x86-64

flaco-0.4.1-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.0 MB view details)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp310-cp310-win_amd64.whl (580.6 kB view details)

Uploaded CPython 3.10 Windows x86-64

flaco-0.4.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp39-cp39-win_amd64.whl (580.5 kB view details)

Uploaded CPython 3.9 Windows x86-64

flaco-0.4.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp38-cp38-win_amd64.whl (580.4 kB view details)

Uploaded CPython 3.8 Windows x86-64

flaco-0.4.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp37-cp37m-win_amd64.whl (579.5 kB view details)

Uploaded CPython 3.7m Windows x86-64

flaco-0.4.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

flaco-0.4.1-cp36-cp36m-win_amd64.whl (579.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

flaco-0.4.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file flaco-0.4.1-pp37-pypy37_pp73-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.1-pp37-pypy37_pp73-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: PyPy, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.1-pp37-pypy37_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 dafb35db113a4db6799fdf81cf127d3dd86a827aaa66571382e60d8673c6caf4
MD5 92e3a7ba062bc9c39ca5ec53fcf66575
BLAKE2b-256 84531f58fc7a76b7db6e20e65c127facbfa35a7ba26df389997f533bdc498566

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.1-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3bb65a4d9a93d2af5bcf67e8c9f432d0c9e71b224c00789715e6d59b7d199e5b
MD5 3d6d399606ab400d41aaf78aa92a33f2
BLAKE2b-256 1d67b03546d907219e17b9392a8fd34e177ce4d2d11e1921494035f25b1ba49e

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 580.6 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 50e23943457b8ef18a1a7712695a8f3df17061ee76c0c4f6a07192816d2170d7
MD5 bce90cbdaf9fade78a12742c9444905f
BLAKE2b-256 6eb7495c78365da682aebd7fb29c5af08363c703622f9acaa6898a85ca34ed8d

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 e5f0a226f401f3f6c0547a859ab64b5f3f1b04ea0c1461e6d8fb2e470804a029
MD5 da4069ca89b600e1762478c963ac0681
BLAKE2b-256 0e01c3d3cc05e1c077577b032d7872d26bf02998204f5d2591efbe18f21e15bf

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 580.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 7bd326376f19260370eceadd8f35761d82477e08cae890de573ff65ee9661a8c
MD5 1f1ba254f353527aefdf0df77410922b
BLAKE2b-256 2eb0877199c6924452f1e6abe1235fc92c7ea2255748a4f6ea1e1c34cc1fea60

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 777a1870965093a2b00cfe040963a1f833acd55b9e1e4756184f120910c8e2ec
MD5 8a308879e24e55ab1e7b3e53ade1757b
BLAKE2b-256 2bf94afa950b23393acf744a2cfc101993b5023bd45acbb04336d3b69b41e62e

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 580.4 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 94ad5a399dcaeed377c87426ec98b4a4b99cb4fbfda0d4c470b18df83a0ac8d8
MD5 34166ccadc51a9a72c63821f729afa3d
BLAKE2b-256 8d8f82078b38f762439f8697fb0f26012d63cd36c4b45d4299fe2e2c75d281f4

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2669d20d0e16901992f029aba45019047aa7d8d89560492afa1437ea47e4c60e
MD5 3e85aa4125798e27b199ed7b5d44d116
BLAKE2b-256 4f0fcaf90ee1af97c45207e619358d10b77f156e40d614117fc19d4e40bb862c

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 579.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 d75b1ee447c36f15309495d05b5fb70b41d3f3c6ddbb921d819b36e8bc1b0b51
MD5 747ffe534420601798f754d945775b31
BLAKE2b-256 544ad87a7d84148e1fb8ea0561df5d4ee49246a75441dbbb3282d6a4d1941af6

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2e997b72ef968eebd55c687b7148696d112c5d6a6936925257173c7277ea50a3
MD5 ba115c4e545b6909feb4bd5103809c5c
BLAKE2b-256 8b9c1f2ab0b32fa4711ff1ef390ed92f25518875b353a4b1008d906d6191b8d5

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 579.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 922b7d3ece29667340790b475d65fea7e31f2eebfb525427d68c5d97f15447bd
MD5 b4ce491e46a0ff9fee34e36a3eb93dca
BLAKE2b-256 b7990342552c50c6822d8f737d88c7080209a8082c4ef825ae82d93963fce9d2

See more details on using hashes here.

File details

Details for the file flaco-0.4.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3ab8ea451c6e8ee5f90b95c59c92740a40b2750ac9e4416173c2660c92c56f79
MD5 636dc3d5ef0b968cb2a2f89c2b8a912a
BLAKE2b-256 1f2a4fab219d49aeebae3bb8db6dfcf1c3925a90488476418abdaf21b19dfe27

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page