Commons library for ingesting RDBMS metadata into Google Cloud Data Catalog
Project description
google-datacatalog-rdbms-connector
Common resources for Data Catalog RDBMS connectors.
Disclaimer: This is not an officially supported Google product.
Table of Contents
- 1. Installation
- 2. Install from source
- 3. Developer environment
- 4. Setting up the RDBMS on a new connector
1. Installation
Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.
With virtualenv, it's possible to install this library without needing system install permissions, and without clashing with the installed system dependencies. Make sure you use Python 3.6+.
1.1. Mac/Linux
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install google-datacatalog-rdbms-connector
1.2. Windows
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
<your-env>\Scripts\activate
<your-env>\Scripts\pip.exe install google-datacatalog-rdbms-connector
2. Install from source
2.1. Get the code
git clone https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms/
cd datacatalog-connectors-rdbms/google-datacatalog-rdbms-connector
2.2. Virtualenv
Using virtualenv is optional, but strongly recommended.
2.2.1. Install Python 3.6
2.2.2. Create and activate a virtualenv
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
source <your-env>/bin/activate
2.2.3. Install
pip install .
3. Developer environment
3.1. Install and run YAPF formatter
pip install --upgrade yapf
# Auto update files
yapf --in-place --recursive src tests
# Show diff
yapf --diff --recursive src tests
# Set up pre-commit hook
# From the root of your git project.
curl -o pre-commit.sh https://raw.githubusercontent.com/google/yapf/master/plugins/pre-commit.sh
chmod a+x pre-commit.sh
mv pre-commit.sh .git/hooks/pre-commit
3.2. Install and run Flake8 linter
pip install --upgrade flake8
flake8 src tests
3.3. Install the package in editable mode (i.e. setuptools “develop mode”)
pip install --editable .
3.4. Run the unit tests
python setup.py test
4. Setting up the RDBMS on a new connector
To set up the RDBMS connector to work with a Relational Database 3 files are needed.
metadata_definition.json
metadata_query.sql
- Extending the
metadata_scraper
class and implementing your rdbms connection method:_create_rdbms_connection
for the metadata_definition file your have fields available for 3 levels:
table_container_def
table_def
column_def
If you want working examples please take a look at the already implemented connectors for: Oracle, Teradata, MySQL, PostgreSQL, Greenplum, Redshift and SQLServer.
For the metadata_defition
target fields you have the following options as target
:
Level | Target | Description | Mandatory |
---|---|---|---|
table_container_def | creator | Creator of the Table Container. | N |
table_container_def | owner | Owner of the Table Container. | N |
table_container_def | update_user | Last user that updated the Table Container. | N |
table_container_def | desc | Table Container Description. | N |
table_def | num_rows | Number of rows contained in the Table. | N |
table_def | creator | Creator of the Table. | N |
table_def | owner | Owner of the Table. | N |
table_def | update_user | Last user that updated the Table. | N |
table_def | desc | Table Description. | N |
If those fields are configured they will be used to create Tags.
For columns they are used to create the Data Catalog Entry schema, two target
fields are used:
Level | Target | Description | Mandatory |
---|---|---|---|
column_def | type | Column type. | Y |
column_def | desc | Column description. | N |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file google-datacatalog-rdbms-connector-0.5.0.tar.gz
.
File metadata
- Download URL: google-datacatalog-rdbms-connector-0.5.0.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1936705f649ab731f99c1e079a50a7949f28fe0be211304f13f4a5bcb211b4c6 |
|
MD5 | 2958c69b16c5ad8c94ebc72aef6ebfb9 |
|
BLAKE2b-256 | c29f10813b44efb195866a5f5bbd65290a3fc76f55ac84e68906b3d392ebfef9 |
File details
Details for the file google_datacatalog_rdbms_connector-0.5.0-py2.py3-none-any.whl
.
File metadata
- Download URL: google_datacatalog_rdbms_connector-0.5.0-py2.py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6752e55e2459bdac0918c7bb46b1758cafd6d31bd8cc1e36a591fe7f21a2ac01 |
|
MD5 | e55e280f5f89be99ab1e4801dcb7695b |
|
BLAKE2b-256 | dce48d684d834e2f79ce84bdbaaa83ee87a02746ace30abe24bfb42a1e118dca |