Fast and Efficient PostgreSQL data into numpy/pandas
Project description
flaco
Perhaps the fastest and most memory efficient way to pull data from PostgreSQL into pandas and numpy. 🚀
Have a gander at the initial benchmarks 🏋
flaco tends to use nearly ~3-5x less memory than standard pandas.read_sql
and about ~3x faster. However, it's probably 50x less stable at the moment. 😜
To whet your appetite, here's a memory profile between flaco and pandas.read_sql
on a table with 2M rows with columns of various types. (see test_benchmarks.py)
*If the test data has null values, you can expect a ~3x saving, instead of the ~5x
you see here; therefore (hot tip 🔥), supply fill values in your queries where possible via coalesce
.
Line # Mem usage Increment Occurences Line Contents
============================================================
99 140.5 MiB 140.5 MiB 1 @profile
100 def memory_profile():
101 140.5 MiB 0.0 MiB 1 stmt = "select * from test_table"
102
103
104 140.9 MiB 0.4 MiB 1 with Database(DB_URI) as con:
105 441.8 MiB 300.9 MiB 1 data = read_sql(stmt, con)
106 441.8 MiB 0.0 MiB 1 _flaco_df = pd.DataFrame(data, copy=False)
107
108
109 441.8 MiB 0.0 MiB 1 engine = create_engine(DB_URI)
110 2091.5 MiB 1649.7 MiB 1 _pandas_df = pd.read_sql(stmt, engine)
Example
from flaco.io import read_sql, Database
uri = "postgresql://postgres:postgres@localhost:5432/postgres"
stmt = "select * from my_big_table"
with Database(uri) as con:
data = read_sql(stmt, con) # dict of column name to numpy array
# If you have pandas installed, you can create a DataFrame
# with zero copying like this:
import pandas as pd
df = pd.DataFrame(data, copy=False)
# If you know the _exact_ rows which will be returned
# you can supply n_rows to perform a single array
# allocation without needing to resize during query reading.
with Database(uri) as con:
data = read_sql(stmt, con, 1_000)
Notes
Is this a drop in replacement for
pandas.read_sql
?
No. It varies in a few ways:
- It will return a
dict
ofstr
➡numpy.ndarray
objects. But this can be passed with zero copies topandas.DataFrame
- When querying integer columns, if a null is encountered, the array will be
converted to
dtype=object
and nulls from PostgreSQL will beNone
. Whereas pandas will convert the underlying array to a float type; where nulls from postgres are basicallynumpy.nan
types. - It lacks basically all of the options
pandas.read_sql
has.
Furthermore, while it's pretty neat this lib can allow faster and less resource intensive use of numpy/pandas against PostgreSQL, it's in early stages of development and you're likely to encounter some sharp edges which include, but not limited to:
- 📝 Poor/non-existant error messages
- 💩 Core dumps
- 🚰 Memory leaks (although I think most are handled now)
- 🦖 Almost complete lack of exception handling from underlying Rust/C interface
- 📍 PostgreSQL
numeric
type should ideally be converted todecimal.Decimal
but usesf64
for now; potentially loosing precision. Note, this is exactly whatpandas.read_sql
does. - ❗ Might not handle all or custom arbitrary PostgreSQL types. If you encounter
such types, either convert them to a supported type like text/json/jsonb
(ie
select my_field::text ...
), or open an issue if a standard type is not supported.
License
Why did you choose such lax licensing? Could you change to a copy left license, please?
...just kidding, no one would ask that. This is dual licensed under Unlicense and MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file flaco-0.4.0-pp37-pypy37_pp73-win_amd64.whl
.
File metadata
- Download URL: flaco-0.4.0-pp37-pypy37_pp73-win_amd64.whl
- Upload date:
- Size: 1.1 MB
- Tags: PyPy, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4087fd8e87eee2c8457fe69d5693cd2c0ae481dd34ebc28839a8e9a6c0bb7ae |
|
MD5 | 179f89d35a2f9294a70512abb9ce5469 |
|
BLAKE2b-256 | 4a1ca82ded14edcd7eddf9382ab797d5fff2ad4f811f8a6eb007f9926b9ba5d1 |
File details
Details for the file flaco-0.4.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
.
File metadata
- Download URL: flaco-0.4.0-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- Upload date:
- Size: 1.0 MB
- Tags: PyPy, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3eb05c3a0f7af98c6cfcf2acb1c0a6c6a24f20685819bd3cf3eefc57f9a79633 |
|
MD5 | 402ac2676c261bd15736a75e79fe1eec |
|
BLAKE2b-256 | 660d4d67e5c697da01e21bfb6a76c1e84f3aea3bcc7938031bd593c7052814b4 |
File details
Details for the file flaco-0.4.0-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 580.9 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 272833b92a89a31ec42fbef63ec2633f625180daf066630824d26573c5c6914e |
|
MD5 | 649b4debeb98b59c9b2981805a8d9598 |
|
BLAKE2b-256 | 27ac7247689242c14c06580c55468112ede3836d4a5a1345e2d5f895163b7501 |
File details
Details for the file flaco-0.4.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7e854fa5bba5c0b887123fcd1a0c711bbfe57fe3a5b8bbb41012097597ba97f |
|
MD5 | b85eb29a142d97d1984c54ed890add2a |
|
BLAKE2b-256 | ad6f8fd5be830b1c90d0726a90e7d9ed99197d1da374e6071f088e77b03ca996 |
File details
Details for the file flaco-0.4.0-cp39-cp39-win_amd64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 580.8 kB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac14a5e2564d2f6d8d678058ad46cee2088b010368f17f5999e990e4d2f1dd34 |
|
MD5 | 6f10980098385469dc40ced3c12f7088 |
|
BLAKE2b-256 | 8002f594bc19271c12e90df876d391feadd6ee74715fda226e3d4d041ff2399a |
File details
Details for the file flaco-0.4.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d87acec0332e12c95b1937efd4d0f97cfa086e573672b29af8e23831c7bbf74 |
|
MD5 | 1232f7675baec5e50d274fee6bf715b5 |
|
BLAKE2b-256 | 03741b422c5fbca79e333d1275d601dfae49f11e71d8367aa7296573a114594f |
File details
Details for the file flaco-0.4.0-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 580.7 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52b75d1485240de874c233fb73491891646112f5bb5af71f9bbe9d138a7189cf |
|
MD5 | df38754b29103974595031c8169643c8 |
|
BLAKE2b-256 | ad735804e4c81ed2f8159e641cd7eff215f46d09d7538dfd14927bcb59c2ada2 |
File details
Details for the file flaco-0.4.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66ac7e102a1dd517631695527929a4dfb6ce5ad2b27fbe424c070cf88c049629 |
|
MD5 | 331ace588955f79d4722759d3004151f |
|
BLAKE2b-256 | c7aee609f86808794d08106cc60106a57425b546adeb1970cacc9f1dcf26bc5c |
File details
Details for the file flaco-0.4.0-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 579.7 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54d3e98b228d3b33d35c3d256d762c2e79bd67c4f8b7fd67dfd11a27ab16573a |
|
MD5 | 3032c313aa05318ee305c3e6f3792445 |
|
BLAKE2b-256 | 48b5347217e8b5e4eb4567c5db6affacb6315137e51cf1c4f486a5ca7811b4ab |
File details
Details for the file flaco-0.4.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ff01c764d471abb06f736334630fbf231e8739b48531c1622903b5a24e3f38e |
|
MD5 | 59d4b7081802e70128b35502d85c8e70 |
|
BLAKE2b-256 | e05e5954a2157910990ef4eb0a13be5f68de89697184953cddefeb065503e539 |
File details
Details for the file flaco-0.4.0-cp36-cp36m-win_amd64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp36-cp36m-win_amd64.whl
- Upload date:
- Size: 579.5 kB
- Tags: CPython 3.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0c027b45d8c34b8e3f1876494ff66bb91385c2b2b2094e57b0656788d4ee8d9 |
|
MD5 | 4d73d5533c0936b9ce5449fb7f940e3b |
|
BLAKE2b-256 | 758dca635c0ed0593fbea344094bbcfead0ce8d979563b875466f4bce21d8153 |
File details
Details for the file flaco-0.4.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
.
File metadata
- Download URL: flaco-0.4.0-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8bd8323eef6edfd2591071fac438f0e046af58bcf2741f9695a27482274c52d |
|
MD5 | 048be740185fc77063716bf6c034a5c1 |
|
BLAKE2b-256 | f49f8092e5687862ed9a5f113e960663ebd47d56237a651d8d71953d46f54e07 |