Commons library for ingesting RDBMS metadata into Google Cloud Data Catalog
Project description
google-datacatalog-rdbms-connector
Common resources for Data Catalog RDBMS connectors.
Disclaimer: This is not an officially supported Google product.
Table of Contents
- 1. Installation
- 2. Install from source
- 3. Developer environment
- 4. Setting up the RDBMS on a new connector
1. Installation
Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.
With virtualenv, it's possible to install this library without needing system install permissions, and without clashing with the installed system dependencies. Make sure you use Python 3.6+.
1.1. Mac/Linux
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install google-datacatalog-rdbms-connector
1.2. Windows
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
<your-env>\Scripts\activate
<your-env>\Scripts\pip.exe install google-datacatalog-rdbms-connector
2. Install from source
2.1. Get the code
git clone https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms/
cd datacatalog-connectors-rdbms/google-datacatalog-rdbms-connector
2.2. Virtualenv
Using virtualenv is optional, but strongly recommended.
2.2.1. Install Python 3.6
2.2.2. Create and activate a virtualenv
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
source <your-env>/bin/activate
2.2.3. Install
pip install .
3. Developer environment
3.1. Install and run YAPF formatter
pip install --upgrade yapf
# Auto update files
yapf --in-place --recursive src tests
# Show diff
yapf --diff --recursive src tests
# Set up pre-commit hook
# From the root of your git project.
curl -o pre-commit.sh https://raw.githubusercontent.com/google/yapf/master/plugins/pre-commit.sh
chmod a+x pre-commit.sh
mv pre-commit.sh .git/hooks/pre-commit
3.2. Install and run Flake8 linter
pip install --upgrade flake8
flake8 src tests
3.3. Install the package in editable mode (i.e. setuptools “develop mode”)
pip install --editable .
3.4. Run the unit tests
python setup.py test
4. Setting up the RDBMS on a new connector
To set up the RDBMS connector to work with a Relational Database 3 files are needed.
metadata_definition.json
metadata_query.sql
- Extending the
metadata_scraper
class and implementing your rdbms connection method:_create_rdbms_connection
for the metadata_definition file your have fields available for 3 levels:
table_container_def
table_def
column_def
If you want working examples please take a look at the already implemented connectors for: Oracle, Teradata, MySQL, PostgreSQL, Greenplum, Redshift and SQLServer.
For the metadata_defition
target fields you have the following options as target
:
Level | Target | Description | Mandatory |
---|---|---|---|
table_container_def | creator | Creator of the Table Container. | N |
table_container_def | owner | Owner of the Table Container. | N |
table_container_def | update_user | Last user that updated the Table Container. | N |
table_container_def | desc | Table Container Description. | N |
table_def | num_rows | Number of rows contained in the Table. | N |
table_def | creator | Creator of the Table. | N |
table_def | owner | Owner of the Table. | N |
table_def | update_user | Last user that updated the Table. | N |
table_def | desc | Table Description. | N |
table_def | table_size_MB | Table size, in MB. | N |
If those fields are configured they will be used to create Tags.
For columns they are used to create the Data Catalog Entry schema, two target
fields are used:
Level | Target | Description | Mandatory |
---|---|---|---|
column_def | type | Column type. | Y |
column_def | desc | Column description. | N |
4.1 Add support for optional queries
You can use user configuration file and execute optional queries to scrape additional metadata from a database. You would need to do the following:
- Create additional SQL queries. At the moment, support for two optional queries are implemented in this common
RDBMS package:
- Statement for refreshing metadata (e.g. ANALYZE)
- Query to scrape number of rows in each table
- Extend the
query_assembler
class and implement a methods for getting optional queries:_get_refresh_statement
and_get_path_to_num_rows_query
. - Implement
_get_query_assembler
and_execute_refresh_query
on the extension of thematadata_scraper
class.
You can see working examples of implementing optional queries in the connectors code for PostgreSQL and MySQL.
Please also refer to these samples to see how user configuration file ingest_cfg.yaml
should look like.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file google-datacatalog-rdbms-connector-0.10.1.tar.gz
.
File metadata
- Download URL: google-datacatalog-rdbms-connector-0.10.1.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4906e6091f4a14e1e0778ef7da0f14005265b3316d4b17fcd09f2b6868baff10 |
|
MD5 | a26bee5a6a753bd9d55425486c12ad50 |
|
BLAKE2b-256 | 2656db1a843ebaa4f912238bcb918eb4283e8c6b60e83e5f105c45feb9a928f4 |
File details
Details for the file google_datacatalog_rdbms_connector-0.10.1-py2.py3-none-any.whl
.
File metadata
- Download URL: google_datacatalog_rdbms_connector-0.10.1-py2.py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b67a6996f839348162dcbe7aa13cdcc98c96a410095637588e44fd87573f32b |
|
MD5 | 6a3a5d005e2e7fa6f95719efc6ef33a9 |
|
BLAKE2b-256 | 28e33ddd34108951d2c65478fcbbdb49ed44a0364ff8212a6d32a33c45452406 |