Skip to main content

A package to manage Google Cloud Data Catalog tags, loading metadata from external sources

Project description

datacatalog-tag-manager

A Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources. Currently supports the CSV file format.

CircleCI

Table of Contents

1. Environment setup

1.1. Python + virtualenv

Using virtualenv is optional, but strongly recommended unless you use Docker.

1.1.1. Install Python 3.6+

1.1.2. Create a folder

This is recommended so all related stuff will reside at same place, making it easier to follow below instructions.

mkdir ./datacatalog-tag-manager
cd ./datacatalog-tag-manager

All paths starting with ./ in the next steps are relative to the datacatalog-tag-manager folder.

1.1.3. Create and activate an isolated Python environment

pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate

1.1.4. Install the package

pip install --upgrade datacatalog-tag-manager

1.2. Docker

Docker may be used as an alternative to run datacatalog-tag-manager. In this case, please disregard the above virtualenv setup instructions.

1.2.1. Get the source code

git clone https://github.com/ricardolsmendes/datacatalog-tag-manager
cd ./datacatalog-tag-manager

1.3. Auth credentials

1.3.1. Create a service account and grant it below roles

  • BigQuery Metadata Viewer
  • Data Catalog TagTemplate User
  • A custom role with bigquery.datasets.updateTag and bigquery.tables.updateTag permissions

1.3.2. Download a JSON key and save it as

  • ./credentials/datacatalog-tag-manager.json

1.3.3. Set the environment variables

This step may be skipped if you're using Docker.

export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-tag-manager.json

2. Manage Tags

2.1. Create or Update

The metadata schema to create or update Tags is described below. Use as many lines as needed to describe all the Tags and Fields you need.

Column Description Mandatory
linked_resource OR entry_name Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries yes
template_name Resource name of the Tag Template for the Tag yes
column Attach Tags to a column belonging to the Entry schema no
field_id Id of the Tag field yes
field_value Value of the Tag field yes

TIPS

2.1.1. From a CSV file

  • Python + virtualenv
datacatalog-tags upsert --csv-file CSV_FILE_PATH
  • Docker
docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
  --volume CREDENTIALS_FILE_FOLDER:/credentials --volume CSV_FILE_FOLDER:/data \
  datacatalog-tag-manager upsert --csv-file /data/CSV_FILE_NAME

2.2. Delete

The metadata schema to delete Tags is described below. Use as many lines as needed to delete all the Tags you want.

Column Description Mandatory
linked_resource OR entry_name Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries yes
template_name Resource name of the Tag Template of the Tag yes
column Delete Tags from a column belonging to the Entry schema no

TIPS

2.2.1. From a CSV file

  • Python + virtualenv
datacatalog-tags delete --csv-file CSV_FILE_PATH
  • Docker
docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
  --volume CREDENTIALS_FILE_FOLDER:/credentials --volume CSV_FILE_FOLDER:/data \
  datacatalog-tag-manager delete --csv-file /data/CSV_FILE_NAME

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacatalog-tag-manager-2.1.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

datacatalog_tag_manager-2.1.0-py2.py3-none-any.whl (10.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file datacatalog-tag-manager-2.1.0.tar.gz.

File metadata

  • Download URL: datacatalog-tag-manager-2.1.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.9

File hashes

Hashes for datacatalog-tag-manager-2.1.0.tar.gz
Algorithm Hash digest
SHA256 d2a5aab710837b1a0030c0400989d1c4ded5486b439d4c5f0234ecf37ba5698e
MD5 b2bd0fa0a7bccda8e0b1a0a55baa1346
BLAKE2b-256 676963d68030cd54d88a3ad59bf5d26a1e305afe19d52b1a13a96918e75cc22b

See more details on using hashes here.

File details

Details for the file datacatalog_tag_manager-2.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: datacatalog_tag_manager-2.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.6.9

File hashes

Hashes for datacatalog_tag_manager-2.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0c9c6d37d2b3286a39136124d1bf5014572071b4399226d8105d7d547faf46c4
MD5 ac606c3d84a231ff9009ec7573734c5d
BLAKE2b-256 4524b4d51e768ca8487684146874dc4471b3b5c6c0c65c83a49fa5bb1b714abb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page