Skip to main content

A package to manage Google Cloud Data Catalog helper commands and scripts

Project description

A Python package to manage Google Cloud Data Catalog helper commands and scripts.

Disclaimer: This is not an officially supported Google product.

1. Environment setup

1.1. Python + virtualenv

Using virtualenv is optional, but strongly recommended unless you use Docker.

1.1.1. Install Python 3.6+

1.1.2. Create a folder

This is recommended so all related stuff will reside at same place, making it easier to follow below instructions.

mkdir ./datacatalog-util
cd ./datacatalog-util

All paths starting with ``./`` in the next steps are relative to the ``utilsr`` folder.

1.1.3. Create and activate an isolated Python environment

pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate

1.1.4. Install the package

pip install --upgrade .

1.2. Docker

Docker may be used as an alternative to run the script. In this case, please disregard the Virtualenv setup instructions.

1.2.1. Get the source code

git clone https://github.com/mesmacosta/datacatalog-util
cd ./datacatalog-util

1.3. Auth credentials

1.3.1. Create a service account and grant it below roles

  • BigQuery Metadata Viewer

  • Data Catalog Admin

  • A custom role with bigquery.datasets.updateTag and bigquery.tables.updateTag permissions

1.3.2. Download a JSON key and save it as

  • ./credentials/datacatalog-util.json

1.3.3. Set the environment variables

This step may be skipped if you’re usingDocker.

export GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-util.json

2. Load Tags from CSV file

2.1. Create a CSV file representing the Tags to be created

Tags are composed of as many lines as required to represent all of their fields. The columns are described as follows:

Column

Description

Mandatory

linked_resource

Full name of the asset the Entry refers to.

Y

template_name

Resource name of the Tag Template for the Tag.

Y

column

Attach Tags to a column belonging to the Entry schema.

N

field_id

Id of the Tag field.

Y

field_value

Value of the Tag field.

Y

TIPS - sample-input/create-tags for reference; - Data Catalog Sample Tags (Google Sheets) may help to create/export the CSV.

2.2. Run the datacatalog-util script

  • Python + virtualenv

datacatalog-util create-tags --csv-file CSV_FILE_PATH
  • Docker

docker build --rm --tag datacatalog-util .
docker run --rm --tty \
  --volume CREDENTIALS_FILE_FOLDER:/credentials --volume CSV_FILE_FOLDER:/data \
  datacatalog-util create-tags --csv-file /data/CSV_FILE_NAME

3. Export Tags to CSV file

3.1. A list of CSV files, each representing one Template will be created.

One file with summary with stats about each template, will also be created on the same directory.

The columns for the summary file are described as follows:

Column

Description

template_name

Resource name of the Tag Template for the Tag.

tags_count

Number of tags found from the template.

tagged_entries_count

Number of tagged entries with the template.

tagged_columns_count

Number of tagged columns with the template.

tag_string_fields_count

Number of used String fields on tags of the template.

tag_bool_fields_count

Number of used Bool fields on tags of the template.

tag_double_fields_count

Number of used Double fields on tags of the template.

tag_timestamp_fields_count

Number of used Timestamp fields on tags of the template.

tag_enum_fields_count

Number of used Enum fields on tags of the template.

The columns for each template file are described as follows:

Column

Description

relative_resource_name

Full resource name of the asset the Entry refers to.

linked_resource

Full name of the asset the Entry refers to.

template_name

Resource name of the Tag Template for the Tag.

tag_name

Resource name of the Tag.

column

Attach Tags to a column belonging to the Entry schema.

field_id

Id of the Tag field.

field_type

Type of the Tag field.

field_value

Value of the Tag field.

3.2. Run the datacatalog-util script

  • Python + virtualenv

datacatalog-util export-tags --project-ids my-project --dir-path DIR_PATH

4. Load Templates from CSV file

4.1. Create a CSV file representing the Templates to be created

Templates are composed of as many lines as required to represent all of their fields. The columns are described as follows:

Column

Description

Mandatory

template_name

Resource name of the Tag Template for the Tag.

Y

display_name

Resource name of the Tag Template for the Tag.

Y

field_id

Id of the Tag Template field.

Y

field_display_name

Display name of the Tag Template field.

Y

field_type

Type of the Tag Template field.

Y

enum_values

Values for the Enum field.

N

4.2. Run the datacatalog-util script - Create the Tag Templates

  • Python + virtualenv

datacatalog-util create-tag-templates --csv-file CSV_FILE_PATH

4.3. Run the datacatalog-util script - Delete the Tag Templates

  • Python + virtualenv

datacatalog-util delete-tag-templates --csv-file CSV_FILE_PATH

TIPS - sample-input/create-tag-templates for reference;

5. Export Templates to CSV file

5.1. A CSV file representing the Templates will be created

Templates are composed of as many lines as required to represent all of their fields. The columns are described as follows:

Column

Description

template_name

Resource name of the Tag Template for the Tag.

display_name

Resource name of the Tag Template for the Tag.

field_id

Id of the Tag Template field.

field_display_name

Display name of the Tag Template field.

field_type

Type of the Tag Template field.

enum_values

Values for the Enum field.

5.2. Run the datacatalog-util script

  • Python + virtualenv

datacatalog-util export-tag-templates --project-ids my-project --file-path CSV_FILE_PATH

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacatalog-util-0.1.0.tar.gz (17.2 kB view hashes)

Uploaded Source

Built Distribution

datacatalog_util-0.1.0-py2.py3-none-any.whl (14.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page