Skip to main content

DataVault 2.0 code gen

Project description

Data Vault 2.0 scaffolding tool

This tool is designed to streamline the process of creating Data Vault 2.0 entities, such as hubs, links, and satellites. As well as building information layer objects such as dim and fact tables from a multidimensional paradigm.

How it works:

User: provides a staging view stg.[entity_name] (or a table if the staging layer persisted) with all requirements for the [entity_name] defined in the schema (how to define see below).
Tool:

  1. Validates metadata of the provided staging view or table.
  2. Generates the necessary DDL statements to create the Data Vault 2.0 entities.
  3. Generates ELT procedures to load data to the generated entities.
  4. Generates support procedures such as meta.Drop_all_related_to_[entity_name] and elt.Run_all_related_to_[entity_name]

App design (layers):

DV2Modeler (service)

  1. gets user input (stg) and analyzes it, producing stg_info
  2. chooses strategy (scd2dim, link2fact)

Strategy (algorithm)

  1. validates staging using stg_info
  2. generates schema using dialects handler

Dialect handler (repository)

  1. creates DB objects for postgres or MSSQL database
                                     +----------------------+
                                     | hub.[entity_name]    |
                                     +----------------------+
                                                 ^
   o   1.define   +-------------------+          | 3.create
  /|\  ------->   | stg.[entity_name] |          #              +----------------------+
  / \             +-------------------+         /|\  ---------> | sat.[entity_name]    |
  User ---------------------------------------> / \  3.create   +----------------------+
                           2.use                Tool
                                                 | 3.create
                                                 v
                                     +----------------------+
                                     | dim.[entity_name]    |
                                     +----------------------+

How to define a staging view or table:

  • bk_ (BusinessKey) - at least one bk_ column
  • hk_[entity_name] (HashKey) - exactly one hk_[entity_name] column if you want a hub table created
  • LoadDate - required by dv2 standard for an auditability
  • RecordSource - required by dv2 standard for an auditability
  • HashDiff - optional, required if you want to have a scd2 type dim table created
  • IsAvailable - optional, required if you want to track missing/deleted records
  • all other columns will be considered as business columns and will be included to the sat table definition
staging fields scd2dim profile link2fact profile
bk_
hk_[entity_name]
LoadDate
RecordSource
HashDiff
IsAvailable
-- staging view example for the scd2dim profile (mssql)
create view [stg].[UR_officers] as
select cast(31 as bigint) [bk_id]
, core.StringToHash1(cast(31 as bigint)) [hk_UR_officers]
, sysdatetime() [LoadDate]
, cast('LobSystem.dbo.officers_daily' as varchar(200)) [RecordSource]
, core.StringToHash8(
    cast('uri' as nvarchar(100))
    , cast('00000000000000' as varchar(20))
    , cast('NATURAL_PERSON' as varchar(50))
    , cast(null as varchar(20))
    , cast('INDIVIDUALLY' as varchar(50))
    , cast(0 as int)
    , cast('2008-04-07' as date)
    , cast('2008-04-07 18:00:54.000' as datetime)
) [HashDiff]
, cast('uri' as nvarchar(100)) [uri]
, cast('00000000000000' as varchar(20)) [at_legal_entity_registration_number]
, cast('NATURAL_PERSON' as varchar(50)) [entity_type]
, cast(null as varchar(20)) [legal_entity_registration_number]
, cast('INDIVIDUALLY' as varchar(50)) [rights_of_representation_type]
, cast(0 as int) [representation_with_at_least]
, cast('2008-04-07' as date) [registered_on]
, cast('2008-04-07 18:00:54.000' as datetime) [last_modified_at]
, cast(1 as bit) [IsAvailable]

scd2dim profile columns mapping:

stg hub sat dim
hk_[entity_name]
BKs... (uk)BKs... BKs... (pk)BKs...
hk_[entity_name] (pk)hk_[entity_name] (pk)(fk)hk_[entity_name]
LoadDate LoadDate (pk)LoadDate
RecordSource RecordSource RecordSource
HashDiff HashDiff
FLDs... FLDs... FLDs...
IsAvailable IsAvailable IsAvailable
IsCurrent
(pk)DateFrom
DateTo

link2fact profile columns mapping:

stg link sat fact
HKs... (uk)(fk)hk_other_entity_name
hk_[entity_name] (pk)hk_[entity_name] (pk)(fk)hk_[entity_name]
<degenerate_field> (uk)<degenerate_field> <degenerate_field>
LoadDate LoadDate LoadDate
RecordSource RecordSource RecordSource
FLDs... FLDs...

Schemas:

  • core - framework-related code
  • stg - staging layer for both virtual (views) and materialized (tables)
  • hub - hub tables
  • sat - satellite tables
  • dim - dimension tables (information vault)
  • fact - fact tables (information vault)
  • elt - ELT procedures
  • job - top level ELT procedures
  • meta - metadata vault
  • proxy - source data for a materialized staging area (meant for wrapping external data sources as SQL views)

DV2-related schemas layering

data -> ELT -> report
LoB* data staging (E) raw vault (L) business vault (T) information vault
stg hub sal dim
proxy sat fact
pool link
* Line of Business applications

Usage diagram

          +          +-----------+    automation
    +---- + -------> | Dv2Utils  | -------+------+
    |     +   uses   +-----------+               |
    |     +                | uses                | creates
    |     +                v                     |    
    |     +   uses   +-----------+       uses    |
    +---- + -------> | Dv2Helper | --------------+
    |     +          +-----------+               |
    o     +                |                     |       
   /|\    +                | DDL                 |       python
   / \  ==========================================================
 DWH Dev  +        creates |                     |       database
    |     +                v                     V      
    |     +   uses    +--------+   uses   +---------------+
    +---- + ------->  | entity |  ----->  | core objects  |
          +           +--------+          +---------------+
          +                           

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sandwich-0.3.0.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sandwich-0.3.0-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file sandwich-0.3.0.tar.gz.

File metadata

  • Download URL: sandwich-0.3.0.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sandwich-0.3.0.tar.gz
Algorithm Hash digest
SHA256 02963f728eddd8904be3bc3cbe9bb50922996f2b727ddc33fb6c2e2d0d7e55f0
MD5 c1e5c620471a46bb3e06ac3571c97211
BLAKE2b-256 ee73fc9ee1ff943aa5d0225e0fd6b17954869274b59c10b4d4a6cecdbbb87054

See more details on using hashes here.

File details

Details for the file sandwich-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: sandwich-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sandwich-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6d0f2177f78545b46389dd8d2e71013f386928302452b30fc766148c96753a9
MD5 646cc6f748678d5964a31334286c5e46
BLAKE2b-256 6d5d3bdf8cfe2c8361e1272186b141495a114c4c086b47b6480dc65f87797788

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page