Skip to main content

Utility for simplifying the management of data warehouse dimensions and facts

Project description

easydw

easydw is a Python library for building data warehouse loading pipelines with less boilerplate and more consistency.

It provides reusable abstractions for:

  • Connecting to warehouse engines (PostgreSQL and Oracle)
  • Loading dimensions with Slowly Changing Dimension strategies (Type 0, Type 1, Type 2)
  • Loading facts with key validation and upsert workflows
  • Working with Polars DataFrames while persisting through SQLAlchemy

The goal is simple: model dimensions and facts as Python classes, then run predictable and testable load logic.

Why easydw

Data warehouse projects often repeat the same patterns:

  • Select data from source or staging
  • Detect new records and changed records
  • Apply SCD behavior on dimensions
  • Upsert facts safely

easydw centralizes these patterns so project-specific code can focus on data rules and transformations instead of database plumbing.

Core concepts

Database abstraction

Database is the core abstraction used by dimensions and facts.

Main operations:

  • connect(): initialize SQLAlchemy engine
  • select(table_name=..., query=..., params=...): return Polars DataFrame
  • insert(df, table_name): bulk insert rows from DataFrame
  • update(df, table_name, keys): batch update by key columns
  • upsert(df, table_name, keys): backend-specific upsert strategy
  • dispose(): close engine and clear metadata cache

Included adapters:

  • PostgresDatabase: native ON CONFLICT DO UPDATE upsert
  • OracleDatabase: update-then-insert upsert emulation

Dimension strategies

easydw includes 3 SCD-oriented base classes:

  • DimensionType0: insert-only, no updates for existing keys
  • DimensionType1: overwrite changed records, no historical versions
  • DimensionType2: close old records and insert new active versions

You implement a concrete dimension by subclassing one of these and providing bind(df), used to attach dimension keys to fact datasets.

Fact loading

Fact defines a standard workflow:

  1. Build rows in _prepare_records(from_timestamp, to_timestamp)
  2. Optionally bind dimension keys through _bind_dimensions(df)
  3. Validate uniqueness on declared keys
  4. Upsert into warehouse table

Quickstart

This example shows a minimal pipeline using PostgreSQL, one Type 1 dimension, and one fact table.

1) Install

pip install .

2) Create a database connection

from easydw.database import PostgresDatabase

dwh = PostgresDatabase(
	name="warehouse",
	params={
		"user": "dw_user",
		"password": "dw_password",
		"host": "localhost",
		"port": 5432,
		"database": "analytics",
		"schema": "public",
	},
)

dwh.connect()

3) Implement a dimension

import polars as pl

from easydw.dimension import DimensionType1


class CustomerDimension(DimensionType1):
	def __init__(self, dwh):
		super().__init__(name="dim_customer", dwh=dwh)

	def bind(self, df: pl.DataFrame) -> pl.DataFrame:
		# Join business key with dimension surrogate key.
		dim_df = self.extract().select(["customer_code", "id"])
		return df.join(dim_df, on="customer_code", how="left")

Load dimension records:

incoming_customers = pl.DataFrame(
	{
		"customer_code": ["C001", "C002"],
		"customer_name": ["Alice", "Bob"],
		"country": ["IT", "FR"],
	}
)

customer_dim = CustomerDimension(dwh)
customer_dim.insert(incoming_customers, keys=["customer_code"])

4) Implement a fact

from datetime import datetime

import polars as pl

from easydw.fact import Fact


class SalesFact(Fact):
	keys = ["order_id"]
	dimensions = [CustomerDimension]

	def __init__(self, dwh):
		super().__init__(name="fct_sales", dwh=dwh)

	def _prepare_records(
		self, from_timestamp: datetime, to_timestamp: datetime
	) -> pl.DataFrame:
		# Replace this with your source extraction logic.
		raw = pl.DataFrame(
			{
				"order_id": [1001, 1002],
				"customer_code": ["C001", "C002"],
				"amount": [120.0, 75.5],
			}
		)

		return self._bind_dimensions(raw)

Load fact rows:

sales_fact = SalesFact(dwh)
sales_fact.insert(
	from_timestamp=datetime(2026, 3, 1),
	to_timestamp=datetime(2026, 3, 20),
)

dwh.dispose()

Development

Build package

python -m build
cp dist/*.whl /your/desired/location
pip install /your/desired/location/easydw-*.whl

Run tests

pytest

Requirements

  • Python 3.10+
  • SQLAlchemy 2.x
  • Polars 1.x
  • psycopg2 (for PostgreSQL usage)
  • oracledb (for Oracle usage)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easydw-1.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easydw-1.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file easydw-1.0.tar.gz.

File metadata

  • Download URL: easydw-1.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.12.13 HTTPX/0.28.1

File hashes

Hashes for easydw-1.0.tar.gz
Algorithm Hash digest
SHA256 e5348f3d26d4dba3e68edea07c801334eb4fdb4052cae6b5037b2923d969d9ef
MD5 3fb656254e595f9f7ac6d8bb3fd2f03a
BLAKE2b-256 f2265dcd4e51a6302d83a13d8abdf87ff6d06bc3a222de87c79a41ac4d818ebd

See more details on using hashes here.

File details

Details for the file easydw-1.0-py3-none-any.whl.

File metadata

  • Download URL: easydw-1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.12.13 HTTPX/0.28.1

File hashes

Hashes for easydw-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1d449a6e865d3a4cf7be2eda2d92a83bbf9738794ec4f72e33bb4fe76d26d0e
MD5 03a8db0270178a55df974b224b3d91c0
BLAKE2b-256 e46c54e83191cb0b5e883fbc864f03a13c26e61fb6f7b1890e29e48a96a15a25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page