Skip to main content

Commons library for ingesting RDBMS metadata into Google Cloud Data Catalog

Project description

google-datacatalog-rdbms-connector

Common resources for Data Catalog RDBMS connectors.

Python package PyPi License Issues

Disclaimer: This is not an officially supported Google product.

Table of Contents


1. Installation

Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.

With virtualenv, it's possible to install this library without needing system install permissions, and without clashing with the installed system dependencies. Make sure you use Python 3.6+.

1.1. Mac/Linux

pip3 install virtualenv
virtualenv --python python3.6 <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install google-datacatalog-rdbms-connector

1.2. Windows

pip3 install virtualenv
virtualenv --python python3.6 <your-env>
<your-env>\Scripts\activate
<your-env>\Scripts\pip.exe install google-datacatalog-rdbms-connector

2. Install from source

2.1. Get the code

git clone https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms/
cd datacatalog-connectors-rdbms/google-datacatalog-rdbms-connector

2.2. Virtualenv

Using virtualenv is optional, but strongly recommended.

2.2.1. Install Python 3.6
2.2.2. Create and activate a virtualenv
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
source <your-env>/bin/activate
2.2.3. Install
pip install .

3. Developer environment

3.1. Install and run YAPF formatter

pip install --upgrade yapf

# Auto update files
yapf --in-place --recursive src tests

# Show diff
yapf --diff --recursive src tests

# Set up pre-commit hook
# From the root of your git project.
curl -o pre-commit.sh https://raw.githubusercontent.com/google/yapf/master/plugins/pre-commit.sh
chmod a+x pre-commit.sh
mv pre-commit.sh .git/hooks/pre-commit

3.2. Install and run Flake8 linter

pip install --upgrade flake8
flake8 src tests

3.3. Install the package in editable mode (i.e. setuptools “develop mode”)

pip install --editable .

3.4. Run the unit tests

python setup.py test

4. Setting up the RDBMS on a new connector

To set up the RDBMS connector to work with a Relational Database 3 files are needed.

  • metadata_definition.json
  • metadata_query.sql
  • Extending the metadata_scraper class and implementing your rdbms connection method: _create_rdbms_connection

for the metadata_definition file your have fields available for 3 levels:

  • table_container_def
  • table_def
  • column_def

If you want working examples please take a look at the already implemented connectors for: Oracle, Teradata, MySQL, PostgreSQL, Greenplum, Redshift and SQLServer.

For the metadata_defition target fields you have the following options as target:

Level Target Description Mandatory
table_container_def creator Creator of the Table Container. N
table_container_def owner Owner of the Table Container. N
table_container_def update_user Last user that updated the Table Container. N
table_container_def desc Table Container Description. N
table_def num_rows Number of rows contained in the Table. N
table_def creator Creator of the Table. N
table_def owner Owner of the Table. N
table_def update_user Last user that updated the Table. N
table_def desc Table Description. N

If those fields are configured they will be used to create Tags.

For columns they are used to create the Data Catalog Entry schema, two target fields are used:

Level Target Description Mandatory
column_def type Column type. Y
column_def desc Column description. N

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

google_datacatalog_rdbms_connector-0.6.0-py2.py3-none-any.whl (20.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file google-datacatalog-rdbms-connector-0.6.0.tar.gz.

File metadata

  • Download URL: google-datacatalog-rdbms-connector-0.6.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6

File hashes

Hashes for google-datacatalog-rdbms-connector-0.6.0.tar.gz
Algorithm Hash digest
SHA256 533528b81f5272e01349b61b7eb1363cff1330ab2d3f5a70d6cadeab5d38d4d0
MD5 d6ef19f56e457bead905253ae98829b5
BLAKE2b-256 bf59f667fd2d62b3c9061f6b638edac9728df4161c5df6939563e2c3b9977862

See more details on using hashes here.

File details

Details for the file google_datacatalog_rdbms_connector-0.6.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for google_datacatalog_rdbms_connector-0.6.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8837938aec55ed77c5c120bc87ec8d112829b8ca9b0ab579daf4fc9bf442da8a
MD5 ce0c9781ba0e165a6caae2e4ab32dea3
BLAKE2b-256 e7698f2fbaa1617e6d441c6840e5afcf18279b7fddddd1113a248b2ba96eee7a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page