A package to extract data from Collibra
Project description
DataExtraction
Overview
This program is used to extract data from an implementation of Collibra into a SQL envrionment. In our case, we are extracting from Gore's instance of Collibra and into EXL_MDSDev. The program will extract the following objects:
- Assets
- Attributes
- Attribute Types
- Domains
- Communities
- Relations
- Relation Types
- Responsibilities Each of these object types are stored as their own table in SQL.
Instruction For Use
Setup
The source executable file is kept in -- Artifact URL --. This folder contains a main.exe file and all of its dependencies. You will need add a 'prod_config.yml' file into the root of this folder. This config file can be found within the source code of the project and should be structured like so:
API_CONFIG:
limit: 1000000
AUTH:
username: <Valid admin username in environment>
password: <Valid admin password in environment>
auth-header: <Auto-generated basic auth token, generated using postman>
ENVIRONMENT:
gore: wlgore-<Envrionment Instance (dev,test,prod)>.collibra.com
Running
Open a cmd prompt in the root of the project folder. Type main.exe and hit enter. The program will start to run and log its progress. During the run, the program will extract all data and overwrite the raw sql tables.
SQL Tables
The following sql tables are created/overwritten during the run on this program:
- collibra_assets_raw
- collibra_attributes_raw
- collibra_attribute_types_raw
- collibra_communities_raw
- collibra_domains_raw
- collibra_relations_raw
- collibra_relation_types_raw
- collibra_responsibilities_raw
- collibra_users_raw
SQL Stored Procedures
The following stored procedures are run on the raw tables to manipulate the data and add batch id's:
- collibra.load_collibra_assets
- collibra.load_collirba_attributes
- collibra.load_collibra_attribute_types
- collibra.load_collibra_communities
- collibra.load_collibra_domains
- collibra.load_collibra_relations
- collibra.load_collibra_relation_types
- collibra.load_collibra_responsibilities
- collibra.load_collibra_users
All of these procedures may be run simultaneously with the collibra._load_entire_batch procedure.
Migrating from dev/test to prod
In order to change the environment in which the data extracton runs, the prod_config.yml file within the src folder will need to be changed. The username and password will need to be changed to that of a user in the new environment. Additionally the gore environment variable will need to be changed to the prod instance's URL. Example of a correctly configured prod_config.yml file for the prod environment:
API_CONFIG:
limit: 1000000
AUTH:
username: <Valid admin username in PROD environment>
password: <Valid admin password in PROD environment>
auth-header: <Auto-generated basic auth token, generated using postman>
ENVIRONMENT:
gore: wlgore.collibra.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file DataExtractionPackage-1.1.0.tar.gz
.
File metadata
- Download URL: DataExtractionPackage-1.1.0.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 552607ff8d0cbe95bdd7ed6e5ad88d1ac9eb95add685081c620700861f803338 |
|
MD5 | 7a02e77c134afbda0a586c97e16944e0 |
|
BLAKE2b-256 | 7a9f02635576fbc131f09d575800b8c686c037c42e8d5665a6993847a60e2fa8 |
File details
Details for the file DataExtractionPackage-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: DataExtractionPackage-1.1.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce2816bbb53ab55cb6ef3ab681e0778b71fec271b16d70ffa8df16979acc349b |
|
MD5 | b1bb0bdd356351081da88bbb36cb2890 |
|
BLAKE2b-256 | e7592590b1ae6a4e58987f77ce1b754ce0d1d3aae5f73378971993e7d0f7fe57 |