A package to manage Google Cloud Data Catalog tags, loading metadata from external sources
Project description
datacatalog-tag-manager
A Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources. Currently supports the CSV file format.
Table of Contents
1. Environment setup
1.1. Python + virtualenv
Using virtualenv is optional, but strongly recommended unless you use Docker.
1.1.1. Install Python 3.6+
1.1.2. Create a folder
This is recommended so all related stuff will reside at same place, making it easier to follow below instructions.
mkdir ./datacatalog-tag-manager
cd ./datacatalog-tag-manager
All paths starting with ./
in the next steps are relative to the datacatalog-tag-manager
folder.
1.1.3. Create and activate an isolated Python environment
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
1.1.4. Install the package
pip install --upgrade datacatalog-tag-manager
1.2. Docker
Docker may be used as an alternative to run datacatalog-tag-manager
. In this case, please
disregard the above virtualenv setup instructions.
1.2.1. Get the source code
git clone https://github.com/ricardolsmendes/datacatalog-tag-manager
cd ./datacatalog-tag-manager
1.3. Auth credentials
1.3.1. Create a service account and grant it below roles
- BigQuery Metadata Viewer
- Data Catalog TagTemplate User
- A custom role with
bigquery.datasets.updateTag
andbigquery.tables.updateTag
permissions
1.3.2. Download a JSON key and save it as
./credentials/datacatalog-tag-manager.json
1.3.3. Set the environment variables
This step may be skipped if you're using Docker.
export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-tag-manager.json
2. Manage Tags
2.1. Create or Update
2.1.1. From a CSV file
- SCHEMA
The metadata schema to create or update Tags is presented below. Use as many lines as needed to describe all the Tags and Fields you need.
Column | Description | Mandatory |
---|---|---|
linked_resource OR entry_name | Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries | yes |
template_name | Resource name of the Tag Template for the Tag | yes |
column | Attach Tags to a column belonging to the Entry schema | no |
field_id | Id of the Tag field | yes |
field_value | Value of the Tag field | yes |
- SAMPLE INPUT
- sample-input/upsert-tags for reference;
- Data Catalog Sample Tags (Google Sheets) might help to create/export a CSV file.
- COMMANDS
Python + virtualenv
datacatalog-tags upsert --csv-file <CSV-FILE-PATH>
Docker
docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
--volume <CREDENTIALS-FILE-FOLDER>:/credentials --volume <CSV-FILE-FOLDER>:/data \
datacatalog-tag-manager upsert --csv-file /data/<CSV-FILE-PATH>
2.2. Delete
2.2.1. From a CSV file
- SCHEMA
The metadata schema to delete Tags is presented below. Use as many lines as needed to delete all the Tags you want.
Column | Description | Mandatory |
---|---|---|
linked_resource OR entry_name | Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries | yes |
template_name | Resource name of the Tag Template of the Tag | yes |
column | Delete Tags from a column belonging to the Entry schema | no |
- SAMPLE INPUT
- sample-input/delete-tags for reference;
- Data Catalog Sample Tags (Google Sheets) might help to create/export a CSV file.
- COMMANDS
Python + virtualenv
datacatalog-tags delete --csv-file <CSV-FILE-PATH>
Docker
docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
--volume <CREDENTIALS-FILE-FOLDER>:/credentials --volume <CSV-FILE-FOLDER>:/data \
datacatalog-tag-manager delete --csv-file /data/<CSV-FILE-PATH>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datacatalog-tag-manager-2.2.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddbbb080dbb539d01a426d4ab4098d203165e0faa69d1bb9354ab991eb11d1f2 |
|
MD5 | ec3098ebfb54a0bff656393666251642 |
|
BLAKE2b-256 | 4273293425c5ad461f522ed2e1aaea4a7c8b7f776d138b67191f268e988bce31 |
Hashes for datacatalog_tag_manager-2.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3554246a5498b0d60698160d772e49cfa13e2b04bab8b2ff2e584425a5106660 |
|
MD5 | 76803489b99ff758b42431dbd2ea6230 |
|
BLAKE2b-256 | df3bd6a8515d7388c21217006d4bbf772bb9a6b45a88360f79ceb60a234aaf56 |