Skip to main content

Utility for simplifying the management of data warehouse dimensions and facts

Project description

easydw

easydw is a Python library for building data warehouse loading pipelines with less boilerplate and more consistency.

It provides reusable abstractions for:

  • Connecting to warehouse engines (PostgreSQL and Oracle)
  • Loading dimensions with Slowly Changing Dimension strategies (Type 0, Type 1, Type 2)
  • Loading facts with key validation and upsert workflows
  • Working with Polars DataFrames while persisting through SQLAlchemy

The goal is simple: model dimensions and facts as Python classes, then run predictable and testable load logic.

Why easydw

Data warehouse projects often repeat the same patterns:

  • Select data from source or staging
  • Detect new records and changed records
  • Apply SCD behavior on dimensions
  • Upsert facts safely

easydw centralizes these patterns so project-specific code can focus on data rules and transformations instead of database plumbing.

Core concepts

Database abstraction

Database is the core abstraction used by dimensions and facts.

Main operations:

  • connect(): initialize SQLAlchemy engine
  • select(table_name=..., query=..., params=...): return Polars DataFrame
  • insert(df, table_name): bulk insert rows from DataFrame
  • update(df, table_name, keys): batch update by key columns
  • upsert(df, table_name, keys): backend-specific upsert strategy
  • dispose(): close engine and clear metadata cache

Included adapters:

  • PostgresDatabase: native ON CONFLICT DO UPDATE upsert
  • OracleDatabase: update-then-insert upsert emulation

Dimension strategies

easydw includes 3 SCD-oriented base classes:

  • DimensionType0: insert-only, no updates for existing keys
  • DimensionType1: overwrite changed records, no historical versions
  • DimensionType2: close old records and insert new active versions

You implement a concrete dimension by subclassing one of these and providing bind(df), used to attach dimension keys to fact datasets.

Fact loading

Fact defines a standard workflow:

  1. Build rows in _prepare_records(from_timestamp, to_timestamp)
  2. Optionally bind dimension keys through _bind_dimensions(df)
  3. Validate uniqueness on declared keys
  4. Upsert into warehouse table

Quickstart

This example shows a minimal pipeline using PostgreSQL, one Type 1 dimension, and one fact table.

1) Install

pip install .

2) Create a database connection

from easydw.database import PostgresDatabase

dwh = PostgresDatabase(
	name="warehouse",
	params={
		"user": "dw_user",
		"password": "dw_password",
		"host": "localhost",
		"port": 5432,
		"database": "analytics",
		"schema": "public",
	},
)

dwh.connect()

3) Implement a dimension

import polars as pl

from easydw.dimension import DimensionType1


class CustomerDimension(DimensionType1):
	def __init__(self, dwh):
		super().__init__(name="dim_customer", dwh=dwh)

	def bind(self, df: pl.DataFrame) -> pl.DataFrame:
		# Join business key with dimension surrogate key.
		dim_df = self.extract().select(["customer_code", "id"])
		return df.join(dim_df, on="customer_code", how="left")

Load dimension records:

incoming_customers = pl.DataFrame(
	{
		"customer_code": ["C001", "C002"],
		"customer_name": ["Alice", "Bob"],
		"country": ["IT", "FR"],
	}
)

customer_dim = CustomerDimension(dwh)
customer_dim.insert(incoming_customers, keys=["customer_code"])

4) Implement a fact

from datetime import datetime

import polars as pl

from easydw.fact import Fact


class SalesFact(Fact):
	keys = ["order_id"]
	dimensions = [CustomerDimension]

	def __init__(self, dwh):
		super().__init__(name="fct_sales", dwh=dwh)

	def _prepare_records(
		self, from_timestamp: datetime, to_timestamp: datetime
	) -> pl.DataFrame:
		# Replace this with your source extraction logic.
		raw = pl.DataFrame(
			{
				"order_id": [1001, 1002],
				"customer_code": ["C001", "C002"],
				"amount": [120.0, 75.5],
			}
		)

		return self._bind_dimensions(raw)

Load fact rows:

sales_fact = SalesFact(dwh)
sales_fact.insert(
	from_timestamp=datetime(2026, 3, 1),
	to_timestamp=datetime(2026, 3, 20),
)

dwh.dispose()

Development

Build package

python -m build
cp dist/*.whl /your/desired/location
pip install /your/desired/location/easydw-*.whl

Run tests

pytest

Requirements

  • Python 3.10+
  • SQLAlchemy 2.x
  • Polars 1.x
  • psycopg2 (for PostgreSQL usage)
  • oracledb (for Oracle usage)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easydw-1.0.3.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easydw-1.0.3-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file easydw-1.0.3.tar.gz.

File metadata

  • Download URL: easydw-1.0.3.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easydw-1.0.3.tar.gz
Algorithm Hash digest
SHA256 c36c3cc70bd09daa7a528390dd697f9aca7f87ef49b0fb5c3567b07dc9f889d8
MD5 1fbbf007cd1b18fb3408912c5ed9a9d6
BLAKE2b-256 26b91db925f2d0769fa377c585bfc2cc6504edd7e6c79800d9d653a336c2844d

See more details on using hashes here.

File details

Details for the file easydw-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: easydw-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easydw-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 72d72b26fd0777bb116a9ee43c6a15d64bf198a76277fc88c751fad12d878b05
MD5 7fd6af99913161f737cd4d58392bcd35
BLAKE2b-256 71924885db516c4043d7094610a87d4d1a1873d845e8db859701b36c6356cceb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page