Skip to main content

Lightning fast OLAP-style point queries on Pandas DataFrames.

Project description

NanoCube

Lightning fast OLAP-style point queries on Pandas DataFrames.

GitHub license PyPI version PyPI Downloads GitHub last commit unit tests


NanoCube is a super minimalistic, in-memory OLAP cube implementation for lightning fast point queries upon Pandas DataFrames. It consists of only 27 lines of magical code that turns any DataFrame into a multi-dimensional OLAP cube. NanoCube shines when multiple point queries are needed on the same DataFrame, e.g. for financial data analysis, business intelligence or fast web services.

pip install nanocube
import pandas as pd
from nanocube import Cube

# create a DataFrame
df = pd.read_csv('sale_data.csv')
value = df.loc[(df['make'].isin(['Audi', 'BMW']) & (df['engine'] == 'hybrid')]['revenue'].sum().item()

# create a NanoCube and run sum aggregated point queries
cube = Cube(df)
for i in range(1000):
    value = cube.get('revenue', make=['Audi', 'BMW'], engine='hybrid')

Lightning fast - really?

For aggregated point queries NanoCube is 100x to 1,000x times faster than Pandas. For the special purpose, NanoCube is also much faster than all other libraries, like Spark, Polars, Modin, Dask or Vaex. If such libraries are drop-in replacements with Pandas dataframe, you should be able to use them with NanoCube too.

How is this possible?

NanoCube uses a different approach. Roaring Bitmaps (https://roaringbitmap.org) are used to construct a multi-dimensional in-memory presentation of a DataFrame. For each unique value in a column, a bitmap is created that represents the rows in the DataFrame where this value occurs. The bitmaps are then combined to identify the rows relevant for a specific point query. Numpy is finally used for aggregation of results. NanoCube is a by-product of the CubedPandas project (https://github.com/Zeutschler/cubedpandas) and the result of the attempt to make OLAP-style queries on Pandas DataFrames as fast as possible in a minimalistic way.

What price do I need to pay?

First of all, NanoCube is free and MIT licensed. The first price you need to pay is the memory consumption, typically up to 25% on top of the original DataFrame size. The second price is the time needed to initialize the cube, which is mainly proportional to the number of unique values over all dimension columns in the DataFrame. Try the included samples sample.py or notebook to get a feeling for the performance of NanoCube.

(100x to 1000x times faster than Pandas). By default, all non-numeric columns will be used as dimensions and all numeric columns as measures. Roaring Bitmaps (https://roaringbitmap.org) are used to construct and query a multi-dimensional cube, Numpy is used for aggregations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanocube-0.1.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

nanocube-0.1.0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file nanocube-0.1.0.tar.gz.

File metadata

  • Download URL: nanocube-0.1.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for nanocube-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8e36cc9b728570454d2a18d3ffe640c390506337d986e997a439cbcff9f5b695
MD5 a10239ae64dbe2a7cb8d89db48a50374
BLAKE2b-256 bc120e19714f186fd4ccee8ce5270094cfaafa1ef07801053f9b3531cff5bbf0

See more details on using hashes here.

File details

Details for the file nanocube-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nanocube-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for nanocube-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 477bc705d9ddb11f0fe0446bd655650485804a67fd350cd8fe23d3158ef3ba26
MD5 83425bd09e370e01a87a1759cbbac0c1
BLAKE2b-256 d90465e54c219a8c681fc2a88d79cdf03800b119dec9698383d96f4da9af750b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page