Skip to main content

Fast and Efficient PostgreSQL data into numpy/pandas

Project description

flaco

Code Style CI PyPI PyPI - Wheel Downloads

Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀

Have a gander at the initial benchmarks 🏋

flaco tends to use nearly ~3-5x less memory than standard pandas.read_sql and about ~3x faster. However, it's probably 50x less stable at the moment. 😜

To whet your appetite, here's a memory profile between flaco and pandas.read_sql on a table with 2M rows with columns of various types. (see test_benchmarks.py) *If the test data has null values, you can expect a ~3x saving, instead of the ~5x you see here; therefore (hot tip 🔥), supply fill values in your queries where possible via coalesce.

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    99    140.5 MiB    140.5 MiB           1   @profile
   100                                         def memory_profile():
   101    140.5 MiB      0.0 MiB           1       stmt = "select * from test_table"
   102                                         
   103                                             
   104    140.9 MiB      0.4 MiB           1       with Database(DB_URI) as con:
   105    441.8 MiB    300.9 MiB           1           data = read_sql(stmt, con)
   106    441.8 MiB      0.0 MiB           1           _flaco_df = pd.DataFrame(data, copy=False)
   107                                         
   108                                             
   109    441.8 MiB      0.0 MiB           1       engine = create_engine(DB_URI)
   110   2091.5 MiB   1649.7 MiB           1       _pandas_df = pd.read_sql(stmt, engine)

Example

from flaco.io import read_sql, Database


uri = "postgresql://postgres:postgres@localhost:5432/postgres"
stmt = "select * from my_big_table"

with Database(uri) as con:
    data = read_sql(stmt, con)  # dict of column name to numpy array

# If you have pandas installed, you can create a DataFrame
# with zero copying like this:
import pandas as pd
df = pd.DataFrame(data, copy=False)

# If you know the _exact_ rows which will be returned
# you can supply n_rows to perform a single array 
# allocation without needing to resize during query reading.
with Database(uri) as con:
    data = read_sql(stmt, con, 1_000)

Notes

Is this a drop in replacement for pandas.read_sql?

No. It varies in a few ways:

  • It will return a dict of strnumpy.ndarray objects. But this can be passed with zero copies to pandas.DataFrame
  • When querying integer columns, if a null is encountered, the array will be converted to dtype=object and nulls from PostgreSQL will be None. Whereas pandas will convert the underlying array to a float type; where nulls from postgres are basically numpy.nan types.
  • It lacks basically all of the options pandas.read_sql has.

Furthermore, while it's pretty neat this lib can allow faster and less resource intensive use of numpy/pandas against PostgreSQL, it's in early stages of development and you're likely to encounter some sharp edges which include, but not limited to:

  • 📝 Poor/non-existant error messages
  • 💩 Core dumps
  • 🚰 Memory leaks (although I think most are handled now)
  • 🦖 Almost complete lack of exception handling from underlying Rust/C interface
  • 📍 PostgreSQL numeric type should ideally be converted to decimal.Decimal but uses f64 for now; potentially loosing precision. Note, this is exactly what pandas.read_sql does.
  • ❗ Might not handle all or custom arbitrary PostgreSQL types. If you encounter such types, either convert them to a supported type like text/json/jsonb (ie select my_field::text ...), or open an issue if a standard type is not supported.

License

Why did you choose such lax licensing? Could you change to a copy left license, please?

...just kidding, no one would ask that. This is dual licensed under Unlicense and MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

flaco-0.4.0-pp37-pypy37_pp73-win_amd64.whl (1.1 MB view details)

Uploaded PyPy Windows x86-64

flaco-0.4.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.0 MB view details)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

flaco-0.4.0-cp310-cp310-win_amd64.whl (580.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

flaco-0.4.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

flaco-0.4.0-cp39-cp39-win_amd64.whl (580.8 kB view details)

Uploaded CPython 3.9 Windows x86-64

flaco-0.4.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

flaco-0.4.0-cp38-cp38-win_amd64.whl (580.7 kB view details)

Uploaded CPython 3.8 Windows x86-64

flaco-0.4.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

flaco-0.4.0-cp37-cp37m-win_amd64.whl (579.7 kB view details)

Uploaded CPython 3.7m Windows x86-64

flaco-0.4.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

flaco-0.4.0-cp36-cp36m-win_amd64.whl (579.5 kB view details)

Uploaded CPython 3.6m Windows x86-64

flaco-0.4.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file flaco-0.4.0-pp37-pypy37_pp73-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.0-pp37-pypy37_pp73-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: PyPy, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.0-pp37-pypy37_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 f4087fd8e87eee2c8457fe69d5693cd2c0ae481dd34ebc28839a8e9a6c0bb7ae
MD5 179f89d35a2f9294a70512abb9ce5469
BLAKE2b-256 4a1ca82ded14edcd7eddf9382ab797d5fff2ad4f811f8a6eb007f9926b9ba5d1

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3eb05c3a0f7af98c6cfcf2acb1c0a6c6a24f20685819bd3cf3eefc57f9a79633
MD5 402ac2676c261bd15736a75e79fe1eec
BLAKE2b-256 660d4d67e5c697da01e21bfb6a76c1e84f3aea3bcc7938031bd593c7052814b4

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 580.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 272833b92a89a31ec42fbef63ec2633f625180daf066630824d26573c5c6914e
MD5 649b4debeb98b59c9b2981805a8d9598
BLAKE2b-256 27ac7247689242c14c06580c55468112ede3836d4a5a1345e2d5f895163b7501

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 d7e854fa5bba5c0b887123fcd1a0c711bbfe57fe3a5b8bbb41012097597ba97f
MD5 b85eb29a142d97d1984c54ed890add2a
BLAKE2b-256 ad6f8fd5be830b1c90d0726a90e7d9ed99197d1da374e6071f088e77b03ca996

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 580.8 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 ac14a5e2564d2f6d8d678058ad46cee2088b010368f17f5999e990e4d2f1dd34
MD5 6f10980098385469dc40ced3c12f7088
BLAKE2b-256 8002f594bc19271c12e90df876d391feadd6ee74715fda226e3d4d041ff2399a

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 2d87acec0332e12c95b1937efd4d0f97cfa086e573672b29af8e23831c7bbf74
MD5 1232f7675baec5e50d274fee6bf715b5
BLAKE2b-256 03741b422c5fbca79e333d1275d601dfae49f11e71d8367aa7296573a114594f

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 580.7 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 52b75d1485240de874c233fb73491891646112f5bb5af71f9bbe9d138a7189cf
MD5 df38754b29103974595031c8169643c8
BLAKE2b-256 ad735804e4c81ed2f8159e641cd7eff215f46d09d7538dfd14927bcb59c2ada2

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 66ac7e102a1dd517631695527929a4dfb6ce5ad2b27fbe424c070cf88c049629
MD5 331ace588955f79d4722759d3004151f
BLAKE2b-256 c7aee609f86808794d08106cc60106a57425b546adeb1970cacc9f1dcf26bc5c

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 579.7 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 54d3e98b228d3b33d35c3d256d762c2e79bd67c4f8b7fd67dfd11a27ab16573a
MD5 3032c313aa05318ee305c3e6f3792445
BLAKE2b-256 48b5347217e8b5e4eb4567c5db6affacb6315137e51cf1c4f486a5ca7811b4ab

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 6ff01c764d471abb06f736334630fbf231e8739b48531c1622903b5a24e3f38e
MD5 59d4b7081802e70128b35502d85c8e70
BLAKE2b-256 e05e5954a2157910990ef4eb0a13be5f68de89697184953cddefeb065503e539

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: flaco-0.4.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 579.5 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flaco-0.4.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 a0c027b45d8c34b8e3f1876494ff66bb91385c2b2b2094e57b0656788d4ee8d9
MD5 4d73d5533c0936b9ce5449fb7f940e3b
BLAKE2b-256 758dca635c0ed0593fbea344094bbcfead0ce8d979563b875466f4bce21d8153

See more details on using hashes here.

File details

Details for the file flaco-0.4.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for flaco-0.4.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f8bd8323eef6edfd2591071fac438f0e046af58bcf2741f9695a27482274c52d
MD5 048be740185fc77063716bf6c034a5c1
BLAKE2b-256 f49f8092e5687862ed9a5f113e960663ebd47d56237a651d8d71953d46f54e07

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page