Skip to main content

A lightweight, queryable hypercube engine for multidimensional data analysis

Project description

Cube Alchemy

A lightweight, queryable hypercube engine for multidimensional analytics on top of pandas.

Why It Matters

Reduce glue code and speed up your analysis so you can focus on insights.

  • Speed: Automatic relationship discovery and traversal.

  • Simplicity: Declarative queries achieve slicing and dicing in pure Python with less ad‑hoc joins.

  • Concistency: Define your queries and use them everywhere with the same logic and filtering behavior.

  • Maintainability: Centralized business logic in reusable components.

  • Integration: Power fully interactive analytics apps using frameworks like Streamlit or Panel, or expose it to a web client.

Installation

Requires Python 3.8+.

The simplest way to install it is with pip:

cd 'your_new_project_path'
python -m venv venv
venv\Scripts\activate
pip install cube-alchemy

Basic usage

Transform your collection of pandas DataFrames into a cohesive analytical model in three simple steps:

  • Connect your data - Add your pandas DataFrames to a Hypercube (relationships will be created automatically).

  • Define your metrics and queries.

  • Query with ease - Extract insights.

flowchart LR
  A["Load DataFrames"] --> B["Build Hypercube"]
  B --> C["Define Metrics"]
  C --> D["Define Queries"]
  D --> E["Execute Queries"]
  E --> F["Update Context State (Apply or Remove Filters)"]
  F --> E

Cube Alchemy connects your data by identifying common column names between DataFrames. These shared columns (edges) form the relationships; automatically building bridges between tables (nodes), handling composite keys when needed. The result is a unified schema you can slice and dice and query in a simple and intuetive way.

import pandas as pd
from cube_alchemy import Hypercube

# 1) Define DataFrames (nodes)
products = pd.DataFrame({
    'product_id': [1, 2, 3],
    'category': ['Electronics', 'Home', 'Other'],
    'cost': [300.0, 15.0, 500.0],
})

customers = pd.DataFrame({
    'customer_id': [100, 101, 102, 103],
    'customer_name': ['Acme Co', 'Globex', 'Initech', 'Umbrella'],  
    'segment': ['SMB', 'Enterprise', 'SMB', 'Consumer'],
    'region_id': [7, 8, 7, 9],  
})

regions = pd.DataFrame({
    'region_id': [7, 8, 9],
    'region': ['North', 'West', 'South'],
})

calendar = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'],  
    'month': ['2024-01', '2024-01', '2024-01', '2024-01', '2024-01'],
})

sales = pd.DataFrame({
    'sale_id': [10, 11, 12, 13, 14, 15],
    'product_id': [1, 1, 2, 3, 2, 1],                        
    'customer_id': [100, 101, 102, 103, 100, 102],           
    'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05', '2024-01-03'],  
    'promo_code': ['NEW10', 'NONE', 'DISC5', 'NONE', 'DISC5', 'NEW10'],  
    'qty': [2, 1, 4, 3, 5, 2],
    'price': [500.0, 500.0, 25.0, 800.0, 25.0, 500.0],
})

promos = pd.DataFrame({
    'promo_code': ['NEW10', 'DISC5', 'NONE'],
    'promo_type': ['Launch', 'Discount', 'No Promo'],
})

# 2) Build the hypercube
cube = Hypercube({
    'Product': products,
    'Customer': customers,
    'Region': regions,
    'Calendar': calendar,
    'Sales': sales,
    'Promos': promos,
})

# 3) Inspect your new hypercube model (shared columns are now the edges)

cube.visualize_graph()

Hypercube Graph Visualization

# 4) Define metrics
cube.define_metric(
    name='Revenue',
    expression='[qty] * [price]',
    aggregation='sum'
)

cube.define_metric(
    name='Units',
    expression='[qty]',
    aggregation='sum'
)

cube.define_metric(
    name='Margin',
    expression='([price] - [cost]) * [qty]',
    aggregation='sum'
)

cube.define_metric(
    name='Number of Sales',
    expression='[sale_id]',
    aggregation='count'
)

# 5) Define query/ies
cube.define_query(
    query_name="sales_analysis",
    dimensions={'region', 'category', 'promo_type'},
    metrics=['Revenue', 'Units', 'Margin', 'Number of Sales']
)

# Execute the query
cube.query("sales_analysis")

Output:

   region     category promo_type  Revenue  Units  Margin  Number of Sales
0  North  Electronics     Launch   2000.0      4   800.0                2
1  North         Home   Discount    225.0      9    90.0                2
2  South        Other   No Promo   2400.0      3   900.0                1
3   West  Electronics   No Promo    500.0      1   200.0                1
# 6) Apply a filter and query again
cube.filter({'customer_name': ['Initech']})     
cube.query("sales_analysis")

Output:

   region     category promo_type  Revenue  Units  Margin  Number of Sales
0  North  Electronics     Launch   1000.0      2   400.0                1
1  North         Home   Discount    100.0      4    40.0                1

Notes

  • Multi-hop works out of the box through implicit relationships (shared column names connect DataFrames, just make sure to rename your columns accordigly).

  • If input tables are connected by more than one column, composite keys and tables will be created automatically. The original tables' columns will be replaced with the composite keys.

  • The relationships graph can not contain cycles. The hypercube will not load if there are.

  • Cardinallity is not assumed nor cheched (any-to-any). Links are treated as many-to-many, which can duplicate rows and skew aggregations if not modeled carefully.

  • Columns in Metric Expression use square brackets: [qty], [price], [cost], etc.

  • Metric Aggregation accepts pandas group by pre-defined aggregations (e.g., 'sum', 'mean') or a custom function callable.

  • Addional features such as Context Filter Control, New Context States and Metric filters and row conditions are available but omitted here for brevity and simplicity. See the docs for details.

Full documentation

For concepts, API specs, advanced filtering, context states, full examples and Streamlit integration see:

Creator

Created with 🧠 and ☕ by Juan C. Del Monte

Cube-Alchemy LinkedIn Email GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cube_alchemy-0.1.2.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cube_alchemy-0.1.2-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file cube_alchemy-0.1.2.tar.gz.

File metadata

  • Download URL: cube_alchemy-0.1.2.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for cube_alchemy-0.1.2.tar.gz
Algorithm Hash digest
SHA256 30464effc5974c3e95c87870f88462aa6238d93d5aab7b61f72dd32d064c52b6
MD5 76b86a5fcb8ea061e8d1ef910fe2ef3e
BLAKE2b-256 161a1010d22e8e04ba9af1bb8b5eead4704cabead261e278ab3f3b5cfac8ef45

See more details on using hashes here.

File details

Details for the file cube_alchemy-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cube_alchemy-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for cube_alchemy-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ab54f50482e53c60d11011edf79de283245a7342a2e31a6cc01e4b69111e57d6
MD5 cc5980c4f1c81ef777f559b21a1bbc92
BLAKE2b-256 3dfae01f18e619faaf68aff384c2d7ff273a67a21e9cb97dab20ab6bf51f64d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page