DataVault 2.0 code gen
Project description
Data Vault 2.0 scaffolding tool
This tool is designed to streamline the process of creating Data Vault 2.0 entities, such as hubs, links, and satellites. As well as building information layer objects such as dim and fact tables from a multidimensional paradigm.
How it works:
User: provides a staging view stg.[entity_name] (or a table if the staging layer persisted)
with all requirements for the [entity_name] defined in the schema (how to define see below).
Tool:
- Validates metadata of the provided staging view or table.
- Generates the necessary DDL statements to create the Data Vault 2.0 entities.
- Generates ELT procedures to load data to the generated entities.
- Generates support procedures such as
meta.Drop_all_related_to_[entity_name]andelt.Run_all_related_to_[entity_name]
App design (layers):
DV2Modeler (service)
- gets user input (stg) and analyzes it, producing
stg_info - chooses strategy (
scd2dim,link2fact)
Strategy (algorithm)
- validates staging using
stg_info - generates schema using dialects handler
Dialect handler (repository)
- creates DB objects for postgres or MSSQL database
+----------------------+
| hub.[entity_name] |
+----------------------+
^
o 1.define +-------------------+ | 3.create
/|\ -------> | stg.[entity_name] | # +----------------------+
/ \ +-------------------+ /|\ ---------> | sat.[entity_name] |
User ---------------------------------------> / \ 3.create +----------------------+
2.use Tool
| 3.create
v
+----------------------+
| dim.[entity_name] |
+----------------------+
How to define a staging view or table:
bk_(BusinessKey) - at least onebk_columnhk_[entity_name](HashKey) - exactly onehk_[entity_name]column if you want ahubtable createdLoadDate- required by dv2 standard for an auditabilityRecordSource- required by dv2 standard for an auditabilityHashDiff- optional, required if you want to have a scd2 typedimtable createdIsAvailable- optional, required if you want to track missing/deleted records- all other columns will be considered as business columns and will be included to the
sattable definition
| staging fields | scd2dim profile | link2fact profile |
|---|---|---|
| bk_ | ✅ | |
hk_[entity_name] |
✅ | |
| LoadDate | ✅ | |
| RecordSource | ✅ | |
| HashDiff | ✅ | |
| IsAvailable | ✅ |
-- staging view example for the scd2dim profile (mssql)
create view [stg].[UR_officers] as
select cast(31 as bigint) [bk_id]
, core.StringToHash1(cast(31 as bigint)) [hk_UR_officers]
, sysdatetime() [LoadDate]
, cast('LobSystem.dbo.officers_daily' as varchar(200)) [RecordSource]
, core.StringToHash8(
cast('uri' as nvarchar(100))
, cast('00000000000000' as varchar(20))
, cast('NATURAL_PERSON' as varchar(50))
, cast(null as varchar(20))
, cast('INDIVIDUALLY' as varchar(50))
, cast(0 as int)
, cast('2008-04-07' as date)
, cast('2008-04-07 18:00:54.000' as datetime)
) [HashDiff]
, cast('uri' as nvarchar(100)) [uri]
, cast('00000000000000' as varchar(20)) [at_legal_entity_registration_number]
, cast('NATURAL_PERSON' as varchar(50)) [entity_type]
, cast(null as varchar(20)) [legal_entity_registration_number]
, cast('INDIVIDUALLY' as varchar(50)) [rights_of_representation_type]
, cast(0 as int) [representation_with_at_least]
, cast('2008-04-07' as date) [registered_on]
, cast('2008-04-07 18:00:54.000' as datetime) [last_modified_at]
, cast(1 as bit) [IsAvailable]
scd2dim profile columns mapping:
| stg | hub | sat | dim |
|---|---|---|---|
hk_[entity_name] |
|||
| BKs... | (uk)BKs... | BKs... | (pk)BKs... |
hk_[entity_name] |
(pk)hk_[entity_name] |
(pk)(fk)hk_[entity_name] |
|
| LoadDate | LoadDate | (pk)LoadDate | |
| RecordSource | RecordSource | RecordSource | |
| HashDiff | HashDiff | ||
| FLDs... | FLDs... | FLDs... | |
| IsAvailable | IsAvailable | IsAvailable | |
| IsCurrent | |||
| (pk)DateFrom | |||
| DateTo |
link2fact profile columns mapping:
| stg | link | sat | fact |
|---|---|---|---|
| HKs... | (uk)(fk)hk_other_entity_name |
||
hk_[entity_name] |
(pk)hk_[entity_name] |
(pk)(fk)hk_[entity_name] |
|
| <degenerate_field> | (uk)<degenerate_field> | <degenerate_field> | |
| LoadDate | LoadDate | LoadDate | |
| RecordSource | RecordSource | RecordSource | |
| FLDs... | FLDs... |
Schemas:
core- framework-related codestg- staging layer for both virtual (views) and materialized (tables)hub- hub tablessat- satellite tablesdim- dimension tables (information vault)fact- fact tables (information vault)elt- ELT proceduresjob- top level ELT proceduresmeta- metadata vaultproxy- source data for a materialized staging area (meant for wrapping external data sources as SQL views)
DV2-related schemas layering
data -> ELT -> report
| LoB* data | staging (E) | raw vault (L) | business vault (T) | information vault |
|---|---|---|---|---|
| stg | hub | sal | dim | |
| proxy | sat | fact | ||
| pool | link | |||
| * Line of Business applications |
Usage diagram
+ +-----------+ automation
+---- + -------> | Dv2Utils | -------+------+
| + uses +-----------+ |
| + | uses | creates
| + v |
| + uses +-----------+ uses |
+---- + -------> | Dv2Helper | --------------+
| + +-----------+ |
o + | |
/|\ + | DDL | python
/ \ ==========================================================
DWH Dev + creates | | database
| + v V
| + uses +--------+ uses +---------------+
+---- + -------> | entity | -----> | core objects |
+ +--------+ +---------------+
+
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sandwich-0.3.0.tar.gz.
File metadata
- Download URL: sandwich-0.3.0.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02963f728eddd8904be3bc3cbe9bb50922996f2b727ddc33fb6c2e2d0d7e55f0
|
|
| MD5 |
c1e5c620471a46bb3e06ac3571c97211
|
|
| BLAKE2b-256 |
ee73fc9ee1ff943aa5d0225e0fd6b17954869274b59c10b4d4a6cecdbbb87054
|
File details
Details for the file sandwich-0.3.0-py3-none-any.whl.
File metadata
- Download URL: sandwich-0.3.0-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6d0f2177f78545b46389dd8d2e71013f386928302452b30fc766148c96753a9
|
|
| MD5 |
646cc6f748678d5964a31334286c5e46
|
|
| BLAKE2b-256 |
6d5d3bdf8cfe2c8361e1272186b141495a114c4c086b47b6480dc65f87797788
|