Skip to main content

Utility for simplifying the management of data warehouse dimensions and facts

Project description

easydw

easydw is a Python library for building data warehouse loading pipelines with less boilerplate and more consistency.

It provides reusable abstractions for:

  • Connecting to warehouse engines (PostgreSQL and Oracle)
  • Loading dimensions with Slowly Changing Dimension strategies (Type 0, Type 1, Type 2)
  • Loading facts with key validation and upsert workflows
  • Working with Polars DataFrames while persisting through SQLAlchemy

The goal is simple: model dimensions and facts as Python classes, then run predictable and testable load logic.

Why easydw

Data warehouse projects often repeat the same patterns:

  • Select data from source or staging
  • Detect new records and changed records
  • Apply SCD behavior on dimensions
  • Upsert facts safely

easydw centralizes these patterns so project-specific code can focus on data rules and transformations instead of database plumbing.

Core concepts

Database abstraction

Database is the core abstraction used by dimensions and facts.

Main operations:

  • connect(): initialize SQLAlchemy engine
  • select(table_name=..., query=..., params=...): return Polars DataFrame
  • insert(df, table_name): bulk insert rows from DataFrame
  • update(df, table_name, keys): batch update by key columns
  • upsert(df, table_name, keys): backend-specific upsert strategy
  • dispose(): close engine and clear metadata cache

Included adapters:

  • PostgresDatabase: native ON CONFLICT DO UPDATE upsert
  • OracleDatabase: update-then-insert upsert emulation

Dimension strategies

easydw includes 3 SCD-oriented base classes:

  • DimensionType0: insert-only, no updates for existing keys
  • DimensionType1: overwrite changed records, no historical versions
  • DimensionType2: close old records and insert new active versions

You implement a concrete dimension by subclassing one of these and providing bind(df), used to attach dimension keys to fact datasets.

Fact loading

Fact defines a standard workflow:

  1. Build rows in _prepare_records(from_timestamp, to_timestamp)
  2. Optionally bind dimension keys through _bind_dimensions(df)
  3. Validate uniqueness on declared keys
  4. Upsert into warehouse table

Quickstart

This example shows a minimal pipeline using PostgreSQL, one Type 1 dimension, and one fact table.

1) Install

pip install .

2) Create a database connection

from easydw.database import PostgresDatabase

dwh = PostgresDatabase(
	name="warehouse",
	params={
		"user": "dw_user",
		"password": "dw_password",
		"host": "localhost",
		"port": 5432,
		"database": "analytics",
		"schema": "public",
	},
)

dwh.connect()

3) Implement a dimension

import polars as pl

from easydw.dimension import DimensionType1


class CustomerDimension(DimensionType1):
	def __init__(self, dwh):
		super().__init__(name="dim_customer", dwh=dwh)

	def bind(self, df: pl.DataFrame) -> pl.DataFrame:
		# Join business key with dimension surrogate key.
		dim_df = self.extract().select(["customer_code", "id"])
		return df.join(dim_df, on="customer_code", how="left")

Load dimension records:

incoming_customers = pl.DataFrame(
	{
		"customer_code": ["C001", "C002"],
		"customer_name": ["Alice", "Bob"],
		"country": ["IT", "FR"],
	}
)

customer_dim = CustomerDimension(dwh)
customer_dim.insert(incoming_customers, keys=["customer_code"])

4) Implement a fact

from datetime import datetime

import polars as pl

from easydw.fact import Fact


class SalesFact(Fact):
	keys = ["order_id"]
	dimensions = [CustomerDimension]

	def __init__(self, dwh):
		super().__init__(name="fct_sales", dwh=dwh)

	def _prepare_records(
		self, from_timestamp: datetime, to_timestamp: datetime
	) -> pl.DataFrame:
		# Replace this with your source extraction logic.
		raw = pl.DataFrame(
			{
				"order_id": [1001, 1002],
				"customer_code": ["C001", "C002"],
				"amount": [120.0, 75.5],
			}
		)

		return self._bind_dimensions(raw)

Load fact rows:

sales_fact = SalesFact(dwh)
sales_fact.insert(
	from_timestamp=datetime(2026, 3, 1),
	to_timestamp=datetime(2026, 3, 20),
)

dwh.dispose()

Development

Build package

python -m build
cp dist/*.whl /your/desired/location
pip install /your/desired/location/easydw-*.whl

Run tests

pytest

Requirements

  • Python 3.10+
  • SQLAlchemy 2.x
  • Polars 1.x
  • psycopg2 (for PostgreSQL usage)
  • oracledb (for Oracle usage)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easydw-1.0.1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easydw-1.0.1-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file easydw-1.0.1.tar.gz.

File metadata

  • Download URL: easydw-1.0.1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.12.13 HTTPX/0.28.1

File hashes

Hashes for easydw-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b1db6a116ad4ffe8447c31bbcf8daa79d3b4f7bf61b048d1ece81fcad3675c89
MD5 1da4d7059c9a67da8a4b246c0562ecae
BLAKE2b-256 911a333381338d8915697071bbca3bab8a326560413dedc98e9952ced58952e2

See more details on using hashes here.

File details

Details for the file easydw-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: easydw-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.12.13 HTTPX/0.28.1

File hashes

Hashes for easydw-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 73554fdcaacb9978cb22da2c5b41f3d3f2f1f7373815ce040f2b3cd920dd830a
MD5 4e818d758283b7b6f9637f59109b655b
BLAKE2b-256 7a478c4bd068a3ae0aef56125f52e839a44034329b3e5063931ec48777bdc54a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page