Utility for simplifying the management of data warehouse dimensions and facts

These details have not been verified by PyPI

Project description

easydw

easydw is a Python library for building data warehouse loading pipelines with less boilerplate and more consistency.

It provides reusable abstractions for:

Connecting to warehouse engines (PostgreSQL and Oracle)
Loading dimensions with Slowly Changing Dimension strategies (Type 0, Type 1, Type 2)
Loading facts with key validation and upsert workflows
Working with Polars DataFrames while persisting through SQLAlchemy

The goal is simple: model dimensions and facts as Python classes, then run predictable and testable load logic.

Why easydw

Data warehouse projects often repeat the same patterns:

Select data from source or staging
Detect new records and changed records
Apply SCD behavior on dimensions
Upsert facts safely

easydw centralizes these patterns so project-specific code can focus on data rules and transformations instead of database plumbing.

Core concepts

Database abstraction

Database is the core abstraction used by dimensions and facts.

Main operations:

connect(): initialize SQLAlchemy engine
select(table_name=..., query=..., params=...): return Polars DataFrame
insert(df, table_name): bulk insert rows from DataFrame
update(df, table_name, keys): batch update by key columns
upsert(df, table_name, keys): backend-specific upsert strategy
dispose(): close engine and clear metadata cache

Included adapters:

PostgresDatabase: native ON CONFLICT DO UPDATE upsert
OracleDatabase: update-then-insert upsert emulation

Dimension strategies

easydw includes 3 SCD-oriented base classes:

DimensionType0: insert-only, no updates for existing keys
DimensionType1: overwrite changed records, no historical versions
DimensionType2: close old records and insert new active versions

You implement a concrete dimension by subclassing one of these and providing bind(df), used to attach dimension keys to fact datasets.

Fact loading

Fact defines a standard workflow:

Build rows in _prepare_records(from_timestamp, to_timestamp)
Optionally bind dimension keys through _bind_dimensions(df)
Validate uniqueness on declared keys
Upsert into warehouse table

Quickstart

This example shows a minimal pipeline using PostgreSQL, one Type 1 dimension, and one fact table.

1) Install

pip install .

2) Create a database connection

from easydw.database import PostgresDatabase

dwh = PostgresDatabase(
	name="warehouse",
	params={
		"user": "dw_user",
		"password": "dw_password",
		"host": "localhost",
		"port": 5432,
		"database": "analytics",
		"schema": "public",
	},
)

dwh.connect()

3) Implement a dimension

import polars as pl

from easydw.dimension import DimensionType1


class CustomerDimension(DimensionType1):
	def __init__(self, dwh):
		super().__init__(name="dim_customer", dwh=dwh)

	def bind(self, df: pl.DataFrame) -> pl.DataFrame:
		# Join business key with dimension surrogate key.
		dim_df = self.extract().select(["customer_code", "id"])
		return df.join(dim_df, on="customer_code", how="left")

Load dimension records:

incoming_customers = pl.DataFrame(
	{
		"customer_code": ["C001", "C002"],
		"customer_name": ["Alice", "Bob"],
		"country": ["IT", "FR"],
	}
)

customer_dim = CustomerDimension(dwh)
customer_dim.insert(incoming_customers, keys=["customer_code"])

4) Implement a fact

from datetime import datetime

import polars as pl

from easydw.fact import Fact


class SalesFact(Fact):
	keys = ["order_id"]
	dimensions = [CustomerDimension]

	def __init__(self, dwh):
		super().__init__(name="fct_sales", dwh=dwh)

	def _prepare_records(
		self, from_timestamp: datetime, to_timestamp: datetime
	) -> pl.DataFrame:
		# Replace this with your source extraction logic.
		raw = pl.DataFrame(
			{
				"order_id": [1001, 1002],
				"customer_code": ["C001", "C002"],
				"amount": [120.0, 75.5],
			}
		)

		return self._bind_dimensions(raw)

Load fact rows:

sales_fact = SalesFact(dwh)
sales_fact.insert(
	from_timestamp=datetime(2026, 3, 1),
	to_timestamp=datetime(2026, 3, 20),
)

dwh.dispose()

Development

Build package

python -m build
cp dist/*.whl /your/desired/location
pip install /your/desired/location/easydw-*.whl

Run tests

pytest

Requirements

Python 3.10+
SQLAlchemy 2.x
Polars 1.x
psycopg2 (for PostgreSQL usage)
oracledb (for Oracle usage)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.3

May 1, 2026

This version

1.0.2

Apr 22, 2026

1.0.1

Apr 21, 2026

1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easydw-1.0.2.tar.gz (13.3 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

easydw-1.0.2-py3-none-any.whl (16.9 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file easydw-1.0.2.tar.gz.

File metadata

Download URL: easydw-1.0.2.tar.gz
Upload date: Apr 22, 2026
Size: 13.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.5 cpython/3.12.13 HTTPX/0.28.1

File hashes

Hashes for easydw-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`bcd15899ce692dad27bf8f544ff40b7a461da373263bb412081626e918741cdb`
MD5	`0a498e934142ae8f54751f8e29b2c101`
BLAKE2b-256	`7c3d4836b38a222f79475df0d9585b8b0edc3dde239212c89a9b9712199b9bda`

See more details on using hashes here.

File details

Details for the file easydw-1.0.2-py3-none-any.whl.

File metadata

Download URL: easydw-1.0.2-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 16.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.5 cpython/3.12.13 HTTPX/0.28.1

File hashes

Hashes for easydw-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`123d0799e41610aa19b4f68f857cd44f82c70694492a955ec5eafa312f714b0e`
MD5	`17953e3c01d544a36893040c0d356f2f`
BLAKE2b-256	`aaf1d4f110ed955449a67727095162b4a534a8cdf1bc056069f4891abb71d1e9`

See more details on using hashes here.

easydw 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

easydw

Why easydw

Core concepts

Database abstraction

Dimension strategies

Fact loading

Quickstart

1) Install

2) Create a database connection

3) Implement a dimension

4) Implement a fact

Development

Build package

Run tests

Requirements

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes