Skip to main content

hNhM – highly Normalized hybrid Model.

Project description

codecov Code style: black

hNhM(highly Normalized hybrid Model) – data modeling methodology that enables iterative and Agile-driven modifications to your Data Warehouse (DWH). The methodology is based on Anchor Modeling and Data Vault.

The idea is to represent each Entity using 3 types of tables:

  • Hub stores Business and Surrogate keys
  • Attribute stores the history of changes (e.g. Name, Age, Gender...)
  • Group is used to reduce the number of underlying tables by grouping multiple Attributes in one Group. The best practice is to define a Group by the same data source.

The purpose of hnhm is to:

  • Define Entities, Links and Flows declaratively using Python
    • Describe them "logically"
    • The "physical" layer is managed by hnhm
  • Apply the changes to a DWH iteratively
    • hnhm detects changes in the entities and creates new Attributes, Groups, or Links
  • Load data from the Staging layer using Flows in the correct order
    • Ignore, Update, or keep the history using SCD2

Implementation of this package is based on report "How we implemented our data storage model — highly Normalized hybrid Model" by Evgeny Ermakov and Nikolai Grebenshchikov. 1) Yandex Report, habr.com. 2) SmartData Conference, youtube.com

Tutorial

You can find the full code in the dwh folder.

Prerequisites

Install the library

pip install hnhm

Create the dwh directory

mkdir dwh

Spin up the Postgres Database

# dwh/docker-compose.yml
version: "3.9"

volumes:
  postgres_data: { }

services:
  postgres:
    image: postgres:15
    volumes:
      - "postgres_data:/var/lib/postgresql/data"
    ports:
      - "5433:5432"
    environment:
      POSTGRES_DB: hnhm
      POSTGRES_USER: hnhm
      POSTGRES_PASSWORD: 123
      
# $ docker compose up -d

Initialize DWH

Create the __hnhm__.py file with the following contents:

# dwh/__hnhm__.py
from hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState

sql = PostgresPsycopgSql(
    database="hnhm",
    user="hnhm",
    password="123",
    port=5433
)

registry = HnhmRegistry(
    hnhm=HnHm(
        sql=sql,
        state=FileState("state.json"),
    ),
)
  • PostgresPsycopgSql generates and executes SQL
  • HnhmRegistry stores the hnhm object, entities and links
  • HnHm implements core logic and manages the state
  • FileState stores the state of your DWH in a file

Apply the changes to your DWH:

$ hnhm apply dwh

Importing 'registry' object from the module: 'dwh.__hnhm__'.

Your DWH is up to date.

Our DWH is up-to-date because we haven't added any entities and links yet.

Start modeling

Let's add the new Entity. Add the dwh/user.py file with the following contents:

# dwh/user.py
from hnhm import (
    Layout,
    LayoutType,
    String,
    ChangeType,
    HnhmEntity,
)


class User(HnhmEntity):
    """User data."""

    __layout__ = Layout(name="user", type=LayoutType.HNHM)

    user_id = String("User ID.", change_type=ChangeType.IGNORE)

    __keys__ = [user_id]

Add the User entity to the registry:

# dwh/__hnhm__.py
from hnhm import PostgresPsycopgSql, HnHm, HnhmRegistry, FileState
+ from dwh.user import User

sql = PostgresPsycopgSql(
    database="hnhm",
    user="hnhm",
    password="123",
    port=5433
)

registry = HnhmRegistry(
+   entities=[User()],
    hnhm=HnHm(
        sql=sql,
        state=FileState("state.json"),
    ),
)

Apply the changes to your DWH:

$ hnhm apply dwh

Importing 'registry' object from the module: 'dwh.__hnhm__'.

Plan:

+ entity 'HNHM.user'
  + view 'user'
  + hub 'user'

Apply migrations? [y/N]: y
Applied!

We added the new entity User to our DWH

-- View on top the DDS tables
select * from entity__user;

-- Hub
select * from hub__user;

Add Attribute and Group

Let's add an Attribute and a Group to our Entity. Edit the dwh/user.py file:

# dwh/user.py
from hnhm import (
    Layout,
    LayoutType,
    String,
    ChangeType,
    HnhmEntity,
+   Integer
)


class User(HnhmEntity):
    """User data."""

    __layout__ = Layout(name="user", type=LayoutType.HNHM)

    user_id = String("User ID.", change_type=ChangeType.IGNORE)
+   age = Integer("Age.", change_type=ChangeType.UPDATE)
+   first_name = String("First name.", change_type=ChangeType.NEW, group="name")
+   last_name = String("Last name.", change_type=ChangeType.NEW, group="name")

    __keys__ = [user_id]

Apply the changes to your DWH:

$ hnhm apply dwh

Importing 'registry' object from the module: 'dwh.__hnhm__'.

Plan:

[u] entity 'HNHM.user'
  + attribute 'age'
  + group 'name'
    |attribute 'first_name'
    |attribute 'last_name'
  [u] view 'user'

Apply migrations? [y/N]: y
Applied!

Take a look at newly created tables

-- View on top of the DDS tables was updated
select * from entity__user;

-- Attribute 'age'
select * from attr__user__age;

-- Group 'name'
select * from group__user__name;

The physical result:

view: entity__user
┌────────────────────────────────────────────────────────────────┐
│┌───────────────────┐   ┌────────────────┐   ┌─────────────────┐│
 │ group__user__name │   │ hub__user      │   │ attr__user__age │
 │                   │   │                │   │                 │
 │ + user_sk (FK)    ├──►│ + user_sk (PK) │◄──┤ + user_sk (FK)  │
 │ + first_name      │   │ + user_id_bk   │   │ + age           │
 │ + last_name       │   │ + valid_from   │   │ + valid_from    │
 │ + valid_from      │   │ + _hash        │   │ + _hash         │
 │ + valid_to        │   │ + _source      │   │ + _source       │
 │ + _hash           │   │ + _loaded_at   │   │ + _loaded_at    │
 │ + _source         │   └────────────────┘   └─────────────────┘
 │ + _loaded_at      │
 └───────────────────┘

Concepts

Logical level (Python classes)

  • Entity: business entity (User, Review, Order, Booking)
  • Link: the relationship between Entities (UserOrder, UserBooking)
  • Flow: helps to load data from the stage layer to Entities and Links

Physical level (Tables)

  • Hub: hub table contains Entity's Business Keys and Surrogate Key(MD5 hash of concatenated business keys)
  • Attribute: attribute table contains FK to Entity's surrogate key, history of attribute changes, and the valid_from column
  • Group: group table contains FK to Entity's surrogate key, history of changes to group attributes, and the valid_from column
  • Link: link table contains FKs to Entities surrogate keys. Historicity by SCD2

Change types of Attributes and Groups

  • IGNORE: insert the latest new data, ignore updates
  • UPDATE: insert the latest new data, update
  • NEW: full history using SCD2. Adds the valid_to column

Supported Databases

  • Postgres

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hnhm-0.0.11.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

hnhm-0.0.11-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file hnhm-0.0.11.tar.gz.

File metadata

  • Download URL: hnhm-0.0.11.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.5.0-1015-azure

File hashes

Hashes for hnhm-0.0.11.tar.gz
Algorithm Hash digest
SHA256 be87db8d63646ed7d896d2d99aa288559cccc35642d70665a8589db150e3e854
MD5 92c4a460ba4ab4442165273bf104e711
BLAKE2b-256 4964901926e527cd60dd0b51e47e360d675aab3e89a5a31bc69544112f28d80b

See more details on using hashes here.

File details

Details for the file hnhm-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: hnhm-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.5.0-1015-azure

File hashes

Hashes for hnhm-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 2719d1c516eec517662d054203f61fc15e8319588794be68041e977e7e3fecf4
MD5 f97a4778d588954da1c05fd08ca70416
BLAKE2b-256 01cebacc2c989b363d738d8c0bfb33b4a144cb1584345fddf3f7e06141152589

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page