Oh My Fast Postgres!
Project description
Oh My Fast Postgres!
ohmyfpg
is a Postgres client library for Python that aims to return data as columns. This is often needed when working with numerical data. Usually this is achieved by taking the output of the client library and then convert it either into numpy
arrays or pandas
dataframes. When dealing with a large amount of data, this conversion is not much performant.
The goal of this library is to return data already as numpy
arrays without sacrificing performance. The section of "Performance comparison" goes more in-depth on this topic.
In order to squeeze performance the underlying implementation is written in Rust. The Python layer on top is very thin.
Why ohmyfpg
?
When working with Postgres at work we faced multiple times performance issues. Most of the times our reactions were along the lines of: "OMG", "F*****g PG", etc. So ohmyfpg
is kinda a mix of the two, but where the f
now stands for fast
.
(To be fair, when we faced performance issues with Postgres was most of the times because of our inexperience with tuning the server configurations.)
Installation
pip install ohmyfpg
Quickstart
import asyncio
import ohmyfpg
DSN = 'postgres://postgres:postgres@postgres:5432/postgres'
QUERY = 'SELECT * FROM performance_test'
async def main():
conn = await ohmyfpg.connect(DSN)
print(await conn.fetch(QUERY))
if __name__ == '__main__':
asyncio.run(main())
Performance comparison
The image below compares the performance of ohmyfpg
with asyncpg
. The 4 bars have the following meaning:
ohmyfpg
: plain fetch,asyncpg
: plain fetch,ohmyfpg-pandas
: plain fetch + conversion topandas
Dataframe,asyncpg
: plain fetch + conversion topandas
Dataframe,
See details here, especially how the conversion to pandas
Dataframe has been implemented.
The query is a SELECT *
that has been run on a table with 1mln rows and the following schema:
(id INT, foo_bar_int2 INT2, foo_bar_int4 INT4, foo_bar_int8 INT8, foo_bar_float4 FLOAT4, foo_bar_float8 FLOAT8)
It has been run inside docker with 8 CPU and 8GB of RAM allocated to the daemon on a MBP with 2.2 GHz 6-Core Intel Core i7 and 16GB 2400 MHz DDR4.
Detailed summary
Plain fetch:
--------------------------------------------------
ohmyfpg
avg: 1045.8ms
min: 898ms
p25: 960.75ms
median: 1041.5ms
p75: 1083.0ms
max: 1421ms
--------------------------------------------------
asyncpg
avg: 1194.9ms
min: 1037ms
p25: 1080.5ms
median: 1224.0ms
p75: 1259.75ms
max: 1567ms
--------------------------------------------------
Plain fetch + conversion to pandas
Dataframe:
--------------------------------------------------
ohmyfpg-pandas
avg: 1212.3ms
min: 1131ms
p25: 1166.5ms
median: 1192.0ms
p75: 1220.75ms
max: 1724ms
--------------------------------------------------
asyncpg-pandas
avg: 4013.0333333333333ms
min: 3771ms
p25: 3841.25ms
median: 3912.5ms
p75: 4124.5ms
max: 4708ms
--------------------------------------------------
Limitations
This library is highly experimental and has many limitations:
- no support for
NULL
s with unpredictable outcome, - no support for non-numerical types,
- limited support for authentication,
- no proper logging,
- etc.
Development
How to run the performance comparison
docker compose build script
docker compose up -d postgres
docker compose exec -- postgres psql -U postgres
CREATE TABLE performance_test (id INT, foo_bar_int2 INT2, foo_bar_int4 INT4, foo_bar_int8 INT8, foo_bar_float4 FLOAT4, foo_bar_float8 FLOAT8);
INSERT INTO performance_test (
id,
foo_bar_int2,
foo_bar_int4,
foo_bar_int8,
foo_bar_float4,
foo_bar_float8
) VALUES (
generate_series(1, 1000000),
trunc(random() * (2*32768) - 32768),
trunc(random() * (2*2147483648) - 2147483648),
trunc(random() * (2*9223372036854775808) - 9223372036854775808),
trunc(random()),
trunc(random())
);
docker compose up script
docker compose cp script:/usr/src/app/performance-comparison.png ./performance
How to do basic benchmarking
docker run -p 5432:5432 --name rust-postgres -e POSTGRES_PASSWORD=postgres -d postgres -c log_min_messages=DEBUG5
Data preparation:
CREATE TABLE performance_test (id INT, foo_bar_int2 INT2, foo_bar_int4 INT4, foo_bar_int8 INT8, foo_bar_float4 FLOAT4, foo_bar_float8 FLOAT8);
INSERT INTO performance_test (
id,
foo_bar_int2,
foo_bar_int4,
foo_bar_int8,
foo_bar_float4,
foo_bar_float8
) VALUES (
generate_series(1, 1000000),
trunc(random() * (2*32768) - 32768),
trunc(random() * (2*2147483648) - 2147483648),
trunc(random() * (2*9223372036854775808) - 9223372036854775808),
trunc(random()),
trunc(random())
);
maturin develop --release --manifest-path ohmyfpg/Cargo.toml
python python/examples/simple_query.py
RUST_BACKTRACE=1 cargo run -r -p ohmyfpg_core --example simple_query
How to do basic profiling
sudo CARGO_PROFILE_BENCH_DEBUG=true RUST_BACKTRACE=1 cargo flamegraph -p ohmyfpg_core --example simple_query
CARGO_PROFILE_BENCH_DEBUG=true RUST_BACKTRACE=1 cargo instruments --release -p ohmyfpg_core --example simple_query -t time
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ohmyfpg-0.3.0.tar.gz
.
File metadata
- Download URL: ohmyfpg-0.3.0.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.13.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1371162dd94d43948e9379d4621707af06926e9bbafdbabe7437219c87af9b9e |
|
MD5 | be0dcf633d5e2f76065f8e33b6d7c86c |
|
BLAKE2b-256 | 6058bc81222bf4879e7362752bf708489d5303f158ae925c6d375a65f8683d1b |
File details
Details for the file ohmyfpg-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: ohmyfpg-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.13.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a941e1e00c87345da90cc057c758088c5439f82e89d188037bb1f54515efcfab |
|
MD5 | c9546e00455a9c46d37c0d13a650be38 |
|
BLAKE2b-256 | 48393ab289fd47ad8b9a674ea631c715443dea3c331a1102fb21e6fd1ca9e471 |