Skip to main content

Fast and Efficient PostgreSQL data into numpy/pandas

Project description

flaco

Code Style CI PyPI PyPI - Wheel Downloads

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

Have a gander at the initial benchmarks 🏋

flaco tends to use nearly ~3-5x less memory than standard pandas.read_sql and about ~3x faster. However, it's probably 50x less stable at the moment. 😜

To whet your appetite, here's a memory profile between flaco and pandas.read_sql on a table with 2M rows with columns of various types. (see test_benchmarks.py) *If the test data has null values, you can expect a ~3x saving, instead of the ~5x you see here; therefore (hot tip 🔥), supply fill values in your queries where possible via coalesce.

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    99    140.5 MiB    140.5 MiB           1   @profile
   100                                         def memory_profile():
   101    140.5 MiB      0.0 MiB           1       stmt = "select * from test_table"
   102                                         
   103                                             
   104    140.9 MiB      0.4 MiB           1       with Database(DB_URI) as con:
   105    441.8 MiB    300.9 MiB           1           data = read_sql(stmt, con)
   106    441.8 MiB      0.0 MiB           1           _flaco_df = pd.DataFrame(data, copy=False)
   107                                         
   108                                             
   109    441.8 MiB      0.0 MiB           1       engine = create_engine(DB_URI)
   110   2091.5 MiB   1649.7 MiB           1       _pandas_df = pd.read_sql(stmt, engine)

Example

from flaco.io import read_sql, Database


uri = "postgresql://postgres:postgres@localhost:5432/postgres"
stmt = "select * from my_big_table"

with Database(uri) as con:
    data = read_sql(stmt, con)  # dict of column name to numpy array

# If you have pandas installed, you can create a DataFrame
# with zero copying like this:
import pandas as pd
df = pd.DataFrame(data, copy=False)

# If you know the _exact_ rows which will be returned
# you can supply n_rows to perform a single array 
# allocation without needing to resize during query reading.
with Database(uri) as con:
    data = read_sql(stmt, con, 1_000)

Notes

Is this a drop in replacement for pandas.read_sql?

No. It varies in a few ways:

  • It will return a dict of strnumpy.ndarray objects. But this can be passed with zero copies to pandas.DataFrame
  • When querying integer columns, if a null is encountered, the array will be converted to dtype=object and nulls from PostgreSQL will be None. Whereas pandas will convert the underlying array to a float type; where nulls from postgres are basically numpy.nan types.
  • It lacks basically all of the options pandas.read_sql has.

Furthermore, while it's pretty neat this lib can allow faster and less resource intensive use of numpy/pandas against PostgreSQL, it's in early stages of development and you're likely to encounter some sharp edges which include, but not limited to:

  • 📝 Poor/non-existant error messages
  • 💩 Core dumps
  • 🚰 Memory leaks (although I think most are handled now)
  • 🦖 Almost complete lack of exception handling from underlying Rust/C interface
  • 📍 PostgreSQL numeric type should ideally be converted to decimal.Decimal but uses f64 for now; potentially loosing precision. Note, this is exactly what pandas.read_sql does.
  • ❗ Might not handle all or custom arbitrary PostgreSQL types. If you encounter such types, either convert them to a supported type like text/json/jsonb (ie select my_field::text ...), or open an issue if a standard type is not supported.

License

Why did you choose such lax licensing? Could you change to a copy left license, please?

...just kidding, no one would ask that. This is dual licensed under Unlicense and MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

flaco-0.3.0-pp37-pypy37_pp73-win_amd64.whl (1.1 MB view details)

Uploaded PyPy Windows x86-64

flaco-0.3.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (975.2 kB view details)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

flaco-0.3.0-cp310-cp310-win_amd64.whl (577.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

flaco-0.3.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

flaco-0.3.0-cp39-cp39-win_amd64.whl (577.6 kB view details)

Uploaded CPython 3.9 Windows x86-64

flaco-0.3.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

flaco-0.3.0-cp38-cp38-win_amd64.whl (577.5 kB view details)

Uploaded CPython 3.8 Windows x86-64

flaco-0.3.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

flaco-0.3.0-cp37-cp37m-win_amd64.whl (576.5 kB view details)

Uploaded CPython 3.7m Windows x86-64

flaco-0.3.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

flaco-0.3.0-cp36-cp36m-win_amd64.whl (576.3 kB view details)

Uploaded CPython 3.6m Windows x86-64

flaco-0.3.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file flaco-0.3.0-pp37-pypy37_pp73-win_amd64.whl.

File metadata

  • Download URL: flaco-0.3.0-pp37-pypy37_pp73-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: PyPy, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.3.0-pp37-pypy37_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 cfa23161f69f552fc8aae7015917784c95c05800b05016c2d4efaa1d8eef3c5c
MD5 844272a0d2209486b26cbe7efedc91e5
BLAKE2b-256 b1891c6d67b772a311b8ccee0e4080239e79b641094fb968586252bc93c99696

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.3.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f7587d5264515c36a5a7322a1864e970d103b2f1948c76e14027c74113e1142d
MD5 4ad1a7e4213d8b52f13ed3e8866ba6ad
BLAKE2b-256 3956a56b0560450bca3eb3b2c8d481d6f588c667a3c03bc81910840a7ff5292c

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: flaco-0.3.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 577.7 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.3.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 24f92b8c2dc332be896862045132414fbf64d24f25b5750f84e65d0c654d8f63
MD5 798b5fef521e5e985bfc51a9f5722961
BLAKE2b-256 76c93bb76f3c2c7405dd5029271aa47155bada8420ab48e1162a70969b920612

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.3.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 250c79d84adc4b0731304eb83e706221649765302823c4f35ac6f161407d6deb
MD5 57923719e22b36717965871d529a9f00
BLAKE2b-256 372d7c12b67c5496bfa8e9c26fd605da0be854725c23c3f16991e312a4da8fe5

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: flaco-0.3.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 577.6 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.3.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 349606b46c0caf5b02dcbc1a6f82fa1453c8433562fad7acbb2b24525d7904c1
MD5 1112367d6cce4007d3fec7041c263b33
BLAKE2b-256 e4581f4a7b472149a02ba90e3d55bb2995758ca624f6160859d7c2732c8fa158

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.3.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a26d3c4ab5d10395cbebee1729080b94a24841a8a057c5ef5db3551a5f46249e
MD5 af26a8f86d496cd4116608ce3adb67c5
BLAKE2b-256 fcc29d4fd496a8a9596fbc424a899bdc1b03341b2c017ea1ef372e7e794f1c3c

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: flaco-0.3.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 577.5 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.3.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 7e93600261e939a2c0c2931cbfda4f7f8f2c72250f1dad9feb34560692f3d690
MD5 8c39eefc1efa480f71ab7d89a591582f
BLAKE2b-256 1f55eb9bb522e52717ef53a20dad32b03b8a61c70cd69d35d1cd5f9425072dd2

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.3.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 a71e78b2d61c280adb05d91263c15b1d5b3483d309a17819bed3994db1da03a6
MD5 aab28f3032fda4eeef5cd48e7b3ab35f
BLAKE2b-256 68679db01c5e77125d9673e5cf02bf1ff33881a2b3664c853509193e4c320a64

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: flaco-0.3.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 576.5 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.3.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 0be6e7b0abd7a195284a8283fdf4ea31c6f342262e945bd97d4dda6ce8558a50
MD5 701ea6242d5b67b956bc4f2da3bb6d40
BLAKE2b-256 74dd761bd11a2844c33488567414376826869d68d589763fba90505adf7db6b5

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.3.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 14d06ff1238d6145798e07118d6a808be55e1cea1cbc48c5bb27dd7f35ca55aa
MD5 a0122d76b4d66e316e88dd3fd9f235c0
BLAKE2b-256 923163defec08e9dce8985e0e1bdadc248d56f072f53341280cf82e3a7503ff5

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: flaco-0.3.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 576.3 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.3.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 c654ebceaadcafe5f9f2c98ca8dfc4f0a3d19f16ddd93516eb782c933c04d16e
MD5 7f6de0e2f6a88969821ba1a3006f8e0d
BLAKE2b-256 4b4e8e2599a279aa17329b2aef1874f8e912a40a8c284227ab41fa371ad19d1e

See more details on using hashes here.

File details

Details for the file flaco-0.3.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.3.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 12fcf9f9e2d40bfceeb18482c7814b02f7bb72531fdae375a26b42c8ea259edd
MD5 fb92ec53108f29f4634f56f2cbb3bf67
BLAKE2b-256 f34d0b6a404bd65d7ab286b94def549cbe07ec5a16dc7a33be518e11bc7ca48c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page