Skip to main content

hNhM – highly Normalized hybrid Model.

Project description

codecov Release Code style: black Downloads Downloads Visitors

hNhM(highly Normalized hybrid Model) – data modeling methodology based on Anchor modeling and Data Vault. Implementation of this package is based on report "How we implemented our data storage model — highly Normalized hybrid Model" by Evgeny Ermakov and Nikolai Grebenshchikov. 1) Yandex Report, habr.com. 2) SmartData Conference, youtube.com

Documentation

https://marchinho11.github.io/hnhm

Basic hNhM concepts

Logical level

  • Entity: business entity (User, Review, Order, Booking)
  • Link: relationship between Entities (UserOrder, UserBooking)
  • Flow: helps to load data from the stage layer to Entities and Links

Physical level - tables

  • Hub: hub table contains Entity's Business Keys and Surrogate Key(MD5 hash of concatenated business keys)
  • Attribute: attribute table contains FK to Entity's surrogate key, history of attribute changes, and the valid_from column
  • Group: group table contains FK to Entity's surrogate key, history of changes to group attributes, and the valid_from column
  • Link: link table contains FKs to Entities surrogate keys. Historicity by SCD2

Change types of Attributes and Groups

  • IGNORE: insert the latest new data, ignore updates
  • UPDATE: insert the latest new data, update
  • NEW: full history using SCD2. Adds the valid_to column

Quick Start

Install hnhm library

pip install hnhm

Create a directory with the name dwh and put the __hnhm__.py file there with the following contents:

# dwh/__hnhm__.py
from hnhm import (
    Layout,
    LayoutType,
    String,
    Integer,
    ChangeType,
    HnHm,
    HnhmEntity,
    HnhmRegistry,
    FileState,
    PostgresPsycopgSql
)


class User(HnhmEntity):
    """User data."""

    __layout__ = Layout(name="user", type=LayoutType.HNHM)

    user_id = String(comment="User ID.", change_type=ChangeType.IGNORE)
    age = Integer(comment="Age.", change_type=ChangeType.UPDATE)
    first_name = String(comment="First name.", change_type=ChangeType.NEW, group="name")
    last_name = String(comment="Last name.", change_type=ChangeType.NEW, group="name")

    __keys__ = [user_id]

sql=PostgresPsycopgSql(database="hnhm", user="postgres")

registry = HnhmRegistry(
    entities=[User()],
    hnhm=HnHm(
        state=FileState("state.json"),
        sql=sql,
    ),
)

Apply the changes to your DWH:

$ hnhm apply dwh

Plan:

+ entity 'HNHM.user'
  + view 'user'
  + hub 'user'
  + attribute 'age'
  + group 'name'
    |attribute 'first_name'
    |attribute 'last_name'

Apply migrations? [y/N]: y
Applied!

The physical result of applied changes:

view: entity__user
┌────────────────────────────────────────────────────────────────┐
│┌───────────────────┐   ┌────────────────┐   ┌─────────────────┐│
 │ group__user__name │   │ hub__user      │   │ attr__user__age │
 │                   │   │                │   │                 │
 │ + user_sk (FK)    ├──►│ + user_sk (PK) │◄──┤ + user_sk (FK)  │
 │ + first_name      │   │ + user_id_bk   │   │ + age           │
 │ + last_name       │   │ + valid_from   │   │ + valid_from    │
 │ + valid_from      │   │ + _source      │   │ + _source       │
 │ + valid_to        │   │ + _loaded_at   │   │ + _loaded_at    │
 │ + _source         │   └────────────────┘   └─────────────────┘
 │ + _loaded_at      │
 └───────────────────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hnhm-0.0.8.tar.gz (18.4 kB view hashes)

Uploaded Source

Built Distribution

hnhm-0.0.8-py3-none-any.whl (27.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page