Utility for simplifying the management of data warehouse dimensions and facts
Project description
easydw
easydw is a Python library for building data warehouse loading pipelines with less boilerplate and more consistency.
It provides reusable abstractions for:
- Connecting to warehouse engines (PostgreSQL and Oracle)
- Loading dimensions with Slowly Changing Dimension strategies (Type 0, Type 1, Type 2)
- Loading facts with key validation and upsert workflows
- Working with Polars DataFrames while persisting through SQLAlchemy
The goal is simple: model dimensions and facts as Python classes, then run predictable and testable load logic.
Why easydw
Data warehouse projects often repeat the same patterns:
- Select data from source or staging
- Detect new records and changed records
- Apply SCD behavior on dimensions
- Upsert facts safely
easydw centralizes these patterns so project-specific code can focus on data rules and transformations instead of database plumbing.
Core concepts
Database abstraction
Database is the core abstraction used by dimensions and facts.
Main operations:
connect(): initialize SQLAlchemy engineselect(table_name=..., query=..., params=...): return Polars DataFrameinsert(df, table_name): bulk insert rows from DataFrameupdate(df, table_name, keys): batch update by key columnsupsert(df, table_name, keys): backend-specific upsert strategydispose(): close engine and clear metadata cache
Included adapters:
PostgresDatabase: nativeON CONFLICT DO UPDATEupsertOracleDatabase: update-then-insert upsert emulation
Dimension strategies
easydw includes 3 SCD-oriented base classes:
DimensionType0: insert-only, no updates for existing keysDimensionType1: overwrite changed records, no historical versionsDimensionType2: close old records and insert new active versions
You implement a concrete dimension by subclassing one of these and providing bind(df), used to attach dimension keys to fact datasets.
Fact loading
Fact defines a standard workflow:
- Build rows in
_prepare_records(from_timestamp, to_timestamp) - Optionally bind dimension keys through
_bind_dimensions(df) - Validate uniqueness on declared
keys - Upsert into warehouse table
Quickstart
This example shows a minimal pipeline using PostgreSQL, one Type 1 dimension, and one fact table.
1) Install
pip install .
2) Create a database connection
from easydw.database import PostgresDatabase
dwh = PostgresDatabase(
name="warehouse",
params={
"user": "dw_user",
"password": "dw_password",
"host": "localhost",
"port": 5432,
"database": "analytics",
"schema": "public",
},
)
dwh.connect()
3) Implement a dimension
import polars as pl
from easydw.dimension import DimensionType1
class CustomerDimension(DimensionType1):
def __init__(self, dwh):
super().__init__(name="dim_customer", dwh=dwh)
def bind(self, df: pl.DataFrame) -> pl.DataFrame:
# Join business key with dimension surrogate key.
dim_df = self.extract().select(["customer_code", "id"])
return df.join(dim_df, on="customer_code", how="left")
Load dimension records:
incoming_customers = pl.DataFrame(
{
"customer_code": ["C001", "C002"],
"customer_name": ["Alice", "Bob"],
"country": ["IT", "FR"],
}
)
customer_dim = CustomerDimension(dwh)
customer_dim.insert(incoming_customers, keys=["customer_code"])
4) Implement a fact
from datetime import datetime
import polars as pl
from easydw.fact import Fact
class SalesFact(Fact):
keys = ["order_id"]
dimensions = [CustomerDimension]
def __init__(self, dwh):
super().__init__(name="fct_sales", dwh=dwh)
def _prepare_records(
self, from_timestamp: datetime, to_timestamp: datetime
) -> pl.DataFrame:
# Replace this with your source extraction logic.
raw = pl.DataFrame(
{
"order_id": [1001, 1002],
"customer_code": ["C001", "C002"],
"amount": [120.0, 75.5],
}
)
return self._bind_dimensions(raw)
Load fact rows:
sales_fact = SalesFact(dwh)
sales_fact.insert(
from_timestamp=datetime(2026, 3, 1),
to_timestamp=datetime(2026, 3, 20),
)
dwh.dispose()
Development
Build package
python -m build
cp dist/*.whl /your/desired/location
pip install /your/desired/location/easydw-*.whl
Run tests
pytest
Requirements
- Python 3.10+
- SQLAlchemy 2.x
- Polars 1.x
- psycopg2 (for PostgreSQL usage)
- oracledb (for Oracle usage)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easydw-1.0.2.tar.gz.
File metadata
- Download URL: easydw-1.0.2.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.12.13 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcd15899ce692dad27bf8f544ff40b7a461da373263bb412081626e918741cdb
|
|
| MD5 |
0a498e934142ae8f54751f8e29b2c101
|
|
| BLAKE2b-256 |
7c3d4836b38a222f79475df0d9585b8b0edc3dde239212c89a9b9712199b9bda
|
File details
Details for the file easydw-1.0.2-py3-none-any.whl.
File metadata
- Download URL: easydw-1.0.2-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.12.13 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
123d0799e41610aa19b4f68f857cd44f82c70694492a955ec5eafa312f714b0e
|
|
| MD5 |
17953e3c01d544a36893040c0d356f2f
|
|
| BLAKE2b-256 |
aaf1d4f110ed955449a67727095162b4a534a8cdf1bc056069f4891abb71d1e9
|