Skip to main content

A package to manage Google Cloud Data Catalog Tag export scripts

Project description

Datacatalog Tag Exporter

CircleCI PyPi License Issues

A Python package to manage Google Cloud Data Catalog Tag export scripts.

Disclaimer: This is not an officially supported Google product.

Table of Contents


Executing in Cloud Shell

# Set your SERVICE ACCOUNT, for instructions go to 1.3. Auth credentials
# This name is just a suggestion, feel free to name it following your naming conventions
export GOOGLE_APPLICATION_CREDENTIALS=~/datacatalog-tag-exporter-sa.json

# Install datacatalog-tag-exporter 
pip3 install datacatalog-tag-exporter --user

# Add to your PATH
export PATH=~/.local/bin:$PATH

# Look for available commands
datacatalog-tag-exporter --help

1. Environment setup

1.1. Python + virtualenv

Using virtualenv is optional, but strongly recommended unless you use Docker.

1.1.1. Install Python 3.6+

1.1.2. Get the source code

git clone https://github.com/mesmacosta/datacatalog-tag-exporter
cd ./datacatalog-tag-exporter

All paths starting with ./ in the next steps are relative to the datacatalog-tag-exporter folder.

1.1.3. Create and activate an isolated Python environment

pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate

1.1.4. Install the package

pip install --upgrade .

1.2. Docker

Docker may be used as an alternative to run the script. In this case, please disregard the Virtualenv setup instructions.

1.3. Auth credentials

1.3.1. Create a service account and grant it below roles

  • Data Catalog Admin

1.3.2. Download a JSON key and save it as

This name is just a suggestion, feel free to name it following your naming conventions

  • ./credentials/datacatalog-tag-exporter-sa.json

1.3.3. Set the environment variables

This step may be skipped if you're using Docker.

export GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-tag-exporter-sa.json

2. Export Tags to CSV file

2.1. A list of CSV files, each representing one Template will be created.

One file with summary with stats about each template, will also be created on the same directory.

The columns for the summary file are described as follows:

Column Description
template_name Resource name of the Tag Template for the Tag.
tags_count Number of tags found from the template.
tagged_entries_count Number of tagged entries with the template.
tagged_columns_count Number of tagged columns with the template.
tag_string_fields_count Number of used String fields on tags of the template.
tag_bool_fields_count Number of used Bool fields on tags of the template.
tag_double_fields_count Number of used Double fields on tags of the template.
tag_timestamp_fields_count Number of used Timestamp fields on tags of the template.
tag_enum_fields_count Number of used Enum fields on tags of the template.

The columns for each template file are described as follows:

Column Description
relative_resource_name Full resource name of the asset the Entry refers to.
linked_resource Full name of the asset the Entry refers to.
template_name Resource name of the Tag Template for the Tag.
tag_name Resource name of the Tag.
column Attach Tags to a column belonging to the Entry schema.
field_id Id of the Tag field.
field_type Type of the Tag field.
field_value Value of the Tag field.

2.2. Run the datacatalog-tag-exporter script

  • Python + virtualenv
datacatalog-tag-exporter tags export --project-ids my-project --dir-path DIR_PATH

2.2.1 Run the datacatalog-tag-exporter filtering Tag Templates

  • Python + virtualenv
datacatalog-tag-exporter tags export --project-ids my-project \
--dir-path DIR_PATH \
--tag-templates-names projects/my-project/locations/us-central1/tagTemplates/my-template,\
projects/my-project/locations/us-central1/tagTemplates/my-template-2 

History

0.1.0 (2020-04-15)

  • First release on PyPI.

0.2.0 (2020-05-08)

  • Added option to export tags after creation date.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacatalog-tag-exporter-0.3.2.tar.gz (15.1 kB view hashes)

Uploaded Source

Built Distribution

datacatalog_tag_exporter-0.3.2-py2.py3-none-any.whl (11.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page