Skip to main content

Filter faster, analyze smarter – your DataFrames deserve it!

Project description

CubedPandas

Filter faster, analyze smarter – your DataFrames deserve it!

GitHub license PyPI version Python versions PyPI Downloads GitHub last commit unit tests build documentation codecov


CubedPandas offer a new easy, fast & fun approach to filter, navigate and analyze Pandas dataframes. CubedPandas is inspired by the concept of OLAP databases and aims to bring add comfort and power to Pandas dataframe handling.

For novice users, CubedPandas can be a great help to get started with Pandas, as it hides the complexity and verbosity of Pandas dataframes. For experienced users, CubedPandas can be a productivity booster, as it allows you to write more compact, explicit, readable and maintainable code, e.g. this Pandas code:

# Pandas: calculate the total revenue of all hybrid Audi cars in September 2024
value = df.loc[
    (df['make'] == 'Audi') &
    (df['engine'] == 'hybrid') &
    (df['date'] >= '2024-09-01') & (df['date'] <= '2024-09-30'),
    'revenue'
].sum()

can turn into this equivalent CubedPandas code:

# with CubedPandas:
value = df.cubed.make.Audi.engine.hybrid.date.september_2024.revenue

# or maybe even shorter:
value = df.cubed.Audi.hybrid.sep_2024

# filtering dataframes is as easy as this: just add '.df' at the end
df = df.cubed.make.Audi.engine.hybrid.df

CubedPandas offers a fluent interface based on the data available in the underlying DataFrame. So, filtering, navigation and analysis of Pandas dataframes becomes more intuitive, more readable and more fun.

CubedPandas neither duplicates data nor modifies the underlying DataFrame, and it introduces no performance penalty. In fact, it can sometimes significantly speed up your data processing.

Jupyter notebooks is the perfect habitat for CubedPandas. For further information, please visit the CubedPandas Documentation or try some of the included samples.

Getting Started

CubedPandas is available on pypi.org (https://pypi.org/project/cubedpandas/) and can be installed by

pip install cubedpandas

Using CubedPandas is as simple as wrapping any Pandas dataframe into a cube like this:

import pandas as pd
from cubedpandas import cubed

# Create a dataframe with some sales data
df = pd.DataFrame({"product":  ["Apple",  "Pear",   "Banana", "Apple",  "Pear",   "Banana"],
                   "channel":  ["Online", "Online", "Online", "Retail", "Retail", "Retail"],
                   "customer": ["Peter",  "Peter",  "Paul",   "Paul",   "Mary",   "Mary"  ],
                   "mailing":  [True,     False,    True,     False,    True,     False   ],
                   "revenue":  [100,      150,      300,      200,      250,      350     ],
                   "cost":     [50,       90,       150,      100,      150,      175     ]})

cdf = cubed(df)  # Wrapp your dataframe into a cube and start using it!

CubedPandas automatically infers a multi-dimensional schema from your Pandas dataframe which defines a virtual Cube over the dataframe. By default, numeric columns of the dataframe are considered as Measures - the numeric values to analyse & aggregate - all other columns are considered as Dimensions - to filter, navigate and view the data. The individual values in a dimension column are called the Members of the dimension. In the example above, column channel becomes a dimension with the two members Online and Retail, revenue and cost are our measures.

Although rarely required, you can also define your own schema. Schemas are quite powerful and flexible, as they will allow you to define dimensions and measures, aliases and (planned for upcoming releases) also custom aggregations, business logic, number formating, linked cubes (star-schemas) and much more.

Context please, so I will give you data!

One key feature of CubePandas is an easy & intuitive access to individual Data Cells in multi-dimensional data space. To do so, you'll need to define a multi-dimensional Context so CubedPandas will evaluate, aggregate (sum by default) and return the requested value from the underlying dataframe.

Context objects behave like normal numbers (float, int), so you can use them directly in arithmetic operations. In the following examples, all addresses will refer to the exactly same rows from the dataframe and thereby all return the same value of 100.

# Let Pandas set the scene...
a = df.loc[(df["product"] == "Apple") & (df["channel"] == "Online") & (df["customer"] == "Peter"), "revenue"].sum()

# Can we do better with CubedPandas? 
b = cdf["product:Apple", "channel:Online", "customer:Peter"].revenue  # explicit, readable, flexible and fast  
c = cdf.product["Apple"].channel["Online"].customer[
    "Peter"].revenue  # ...better, if column names are Python-compliant  
d = cdf.product.Apple.channel.Online.customer.Peter.revenue  # ...even better, if member names are Python-compliant

# If there are no ambiguities in your dataframe - what can be easily checked - then you can use this shorthand forms:
e = cdf["Online", "Apple", "Peter", "revenue"]
f = cdf.Online.Apple.Peter.revenue
g = cdf.Online.Apple.Peter  # as 'revenue' is the default (first) measure of the cube, it can be omitted

assert a == b == c == d == e == f == g == 100

Context objects also act as filters on the underlying dataframe. So you can use also CubedPandas for fast and easy filtering only, e.g. like this:

df = df.cubed.product["Apple"].channel["Online"].df
df = df.cubed.Apple.Online.df  # short form, if column names are Python-compliant and there are no ambiguities

Pivot, Drill-Down, Slice & Dice

The Pandas pivot table is a very powerful tool. Unfortunately, it is quite verbose and very hard to master. CubedPandas offers the slice method to create pivot tables in a more intuitive and easy way, e.g. by default

# Let's create a simple pivot table with the revenue for dimensions products and channels
cdf.slice(rows="product", columns="channel", measures="revenue")

For further information, samples and a complete feature list as well as valuable tips and tricks, please visit the CubedPandas Documentation.

Your feedback, ideas and support are very welcome!

Please help improve and extend CubedPandas with your feedback & ideas and use the CubedPandas GitHub Issues to request new features and report bugs. For general questions, discussions and feedback, please use the CubedPandas GitHub Discussions.

If you have fallen in love with CubedPandas or find it otherwise valuable, please consider to become a sponsor of the CubedPandas project so we can push the project forward faster and make CubePandas even more awesome.

...happy cubing!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cubedpandas-0.2.37.tar.gz (57.1 kB view details)

Uploaded Source

Built Distribution

cubedpandas-0.2.37-py3-none-any.whl (67.7 kB view details)

Uploaded Python 3

File details

Details for the file cubedpandas-0.2.37.tar.gz.

File metadata

  • Download URL: cubedpandas-0.2.37.tar.gz
  • Upload date:
  • Size: 57.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for cubedpandas-0.2.37.tar.gz
Algorithm Hash digest
SHA256 5f5cf789b43147e3bb5da1d98d35f154e8bc5752114a60ab7ce848dc42ff539a
MD5 86616dee23ad0536452d714a308aad9a
BLAKE2b-256 1c2437f40a1964f72860dc1bb66df2a50373d46679f7b9914d30cbaca8056222

See more details on using hashes here.

File details

Details for the file cubedpandas-0.2.37-py3-none-any.whl.

File metadata

  • Download URL: cubedpandas-0.2.37-py3-none-any.whl
  • Upload date:
  • Size: 67.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.0

File hashes

Hashes for cubedpandas-0.2.37-py3-none-any.whl
Algorithm Hash digest
SHA256 a017dbf532722117135175578a74deaca9e2920d0640227bcc5e45b2cc58016a
MD5 9927e27a5a84cb889c32ba232939d51c
BLAKE2b-256 bd118f77f9fae0c77d3949cc0ea3ed4e371f552b4be7a154861601876e4b4f9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page