A lightweight, queryable hypercube engine for multidimensional data analysis
Project description
Cube Alchemy
A lightweight, queryable hypercube engine for multidimensional analytics on top of pandas.
Why It Matters
Reduce glue code and speed up your analysis so you can focus on insights.
-
Speed: Automatic relationship discovery and traversal.
-
Simplicity: Declarative queries achieve slicing and dicing in pure Python with less ad‑hoc joins.
-
Concistency: Define your queries and use them everywhere with the same logic and filtering behavior.
-
Maintainability: Centralized business logic in reusable components.
-
Integration: Power fully interactive analytics apps using frameworks like Streamlit or Panel, or expose it to a web client.
Installation
Requires Python 3.8+.
cd 'your_new_project_path'
python -m venv venv
venv\Scripts\activate
pip install cube-alchemy
Basic usage
Transform your collection of pandas DataFrames into a cohesive analytical model in three simple steps:
-
Connect your data - Add your pandas DataFrames to a Hypercube (relationships will be created automatically).
-
Define your metrics and queries.
-
Query with ease - Extract insights.
flowchart LR
A["Load DataFrames"] --> B["Build Hypercube"]
B --> C["Define Metrics"]
C --> D["Define Queries"]
D --> E["Execute Queries"]
E --> F["Update Context State (Apply or Remove Filters)"]
F --> E
Cube Alchemy connects your data by identifying common column names between DataFrames. These shared columns form the relationships; automatically building bridges between tables. The result is a unified schema you can slice and dice and query in a declarative, simple and intuetive way.
import pandas as pd
from cube_alchemy import Hypercube
# 1) Define DataFrames (nodes)
products = pd.DataFrame({
'product_id': [1, 2, 3],
'category': ['Electronics', 'Home', 'Other'],
'cost': [300.0, 15.0, 500.0],
})
customers = pd.DataFrame({
'customer_id': [100, 101, 102, 103],
'customer_name': ['Acme Co', 'Globex', 'Initech', 'Umbrella'],
'segment': ['SMB', 'Enterprise', 'SMB', 'Consumer'],
'region_id': [7, 8, 7, 9],
})
regions = pd.DataFrame({
'region_id': [7, 8, 9],
'region': ['North', 'West', 'South'],
})
calendar = pd.DataFrame({
'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'],
'month': ['2024-01', '2024-01', '2024-01', '2024-01', '2024-01'],
})
sales = pd.DataFrame({
'sale_id': [10, 11, 12, 13, 14, 15],
'product_id': [1, 1, 2, 3, 2, 1],
'customer_id': [100, 101, 102, 103, 100, 102],
'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05', '2024-01-03'],
'promo_code': ['NEW10', 'NONE', 'DISC5', 'NONE', 'DISC5', 'NEW10'],
'qty': [2, 1, 4, 3, 5, 2],
'price': [500.0, 500.0, 25.0, 800.0, 25.0, 500.0],
})
promos = pd.DataFrame({
'promo_code': ['NEW10', 'DISC5', 'NONE'],
'promo_type': ['Launch', 'Discount', 'No Promo'],
})
# 2) Build the hypercube
cube = Hypercube({
'Product': products,
'Customer': customers,
'Region': regions,
'Calendar': calendar,
'Sales': sales,
'Promos': promos,
})
# Inspect your new hypercube model (shared columns are now the edges)
cube.visualize_graph(w=12, h=10, full_column_names=False)
# 3) Define metrics
cube.define_metric(
name='Revenue',
expression='[qty] * [price]',
aggregation='sum'
)
cube.define_metric(
name='Units',
expression='[qty]',
aggregation='sum'
)
cube.define_metric(
name='Margin',
expression='([price] - [cost]) * [qty]',
aggregation='sum'
)
cube.define_metric(
name='Number of Sales',
expression='[sale_id]',
aggregation='count'
)
# 4) Define query/ies
cube.define_query(
query_name="sales_analysis",
dimensions={'region', 'category', 'promo_type'},
metrics=['Revenue', 'Units', 'Margin', 'Number of Sales']
)
# 5) Execute the query (or queries)
cube.query("sales_analysis")
Output:
region category promo_type Revenue Units Margin Number of Sales
0 North Electronics Launch 2000.0 4 800.0 2
1 North Home Discount 225.0 9 90.0 2
2 South Other No Promo 2400.0 3 900.0 1
3 West Electronics No Promo 500.0 1 200.0 1
# 6) Apply a filter and query again
cube.filter({'customer_name': ['Initech']})
cube.query("sales_analysis")
Output:
region category promo_type Revenue Units Margin Number of Sales
0 North Electronics Launch 1000.0 2 400.0 1
1 North Home Discount 100.0 4 40.0 1
Some Notes
-
Multi-hop works out of the box through implicit relationships (shared column names connect DataFrames).
-
If input tables are connected by more than one column, composite keys and tables will be created automatically.
-
Cardinallity is not assumed nor cheched (any-to-any). Links are treated as many-to-many, which can duplicate rows and skew aggregations if not modeled carefully.
-
Addional features such as Filters and New Context States, and more parameters are available but omitted here for brevity and simplicity. See the docs for details.
Full documentation
For concepts, API specs, advanced features, full examples and Streamlit integration see:
Creator
Created with 🧠 and ☕ by Juan C. Del Monte
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cube_alchemy-0.1.5.tar.gz.
File metadata
- Download URL: cube_alchemy-0.1.5.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6347e3f34b7d5efdee22e7619c10574705c60de7308e6fca75d1485d7c3a43b1
|
|
| MD5 |
e7d74e322eb50d0c4983a3ec4a5a23a9
|
|
| BLAKE2b-256 |
d5db274e1a04a1ffbd777a6fe0914ff106f36454a884799966ec2f63a67f7ab5
|
File details
Details for the file cube_alchemy-0.1.5-py3-none-any.whl.
File metadata
- Download URL: cube_alchemy-0.1.5-py3-none-any.whl
- Upload date:
- Size: 28.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0953febe2094dc437a2a3ebc1dc6e02e5f451f47ffaccb14ea3a10a8cf43b5f2
|
|
| MD5 |
5d12d944d589c541b2452932fad4fb67
|
|
| BLAKE2b-256 |
0e56adf4c7037bfd34d09ddcf955f315c18cfc9f0217e93048a7a2d2628c8fc5
|