Skip to main content

A package to manage Google Cloud Data Catalog custom entries

Project description

datacatalog-custom-entries-manager

A Python package intended to manage Google Cloud Data Catalog custom entries, loading metadata from external sources. Currently supports the CSV and JSON file formats.

It is built on top of GoogleCloudPlatform/datacatalog-connectors and, differently from the existing connectors, allows ingesting metadata with no need to connect to other systems than Data Catalog. Known use cases include validating Custom Entries ingestion workloads before coding their specific features and loading metadata into development / PoC environments.

In case you need not only Entries but also Tags to validate your model/workload, consider giving datacatalog-custom-model-manager a try.

Continuous Integration Continuous Delivery

Table of Contents


1. Environment setup

1.1. Python + virtualenv

Using virtualenv is optional, but strongly recommended unless you use Docker.

1.1.1. Install Python 3.6+

1.1.2. Create a folder

This is recommended so all related stuff will reside at the same place, making it easier to follow below instructions.

mkdir ./datacatalog-custom-entries-manager
cd ./datacatalog-custom-entries-manager

All paths starting with ./ in the next steps are relative to the datacatalog-custom-entries-manager folder.

1.1.3. Create and activate an isolated Python environment

pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate

1.1.4. Install the package

pip install --upgrade datacatalog-custom-entries-manager

1.2. Docker

Docker may be used as an alternative to run datacatalog-custom-entries-manager. In this case, please disregard the above virtualenv setup instructions.

1.2.1. Get the source code

git clone https://github.com/ricardolsmendes/datacatalog-custom-entries-manager
cd ./datacatalog-custom-entries-manager

1.3. Auth credentials

1.3.1. Create a service account and grant it below roles

  • DataCatalog entryGroup Owner
  • DataCatalog entry Owner
  • Data Catalog Viewer

1.3.2. Download a JSON key and save it as

  • ./credentials/datacatalog-custom-entries-manager.json

1.3.3. Set the environment variables

This step can be skipped if you're using Docker.

export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-custom-entries-manager.json

2. Manage Custom Entries

2.1. Synchronize

2.1.1. To a CSV file

  • SCHEMA

The metadata schema to synchronize Custom Entries is presented below. Use as many lines as needed to describe all Data Catalog Entries you need.

Column Description Mandatory
user_specified_system Indicates the Entry source system yes
group_id Id of the Entry Group the Entry belongs to yes
linked_resource The resource a metadata Entry refers to yes
display_name Display information such as title and description; a short name to identify the Entry (the entry_id field will be generated as a normalized version of the display name) yes
description Can consist of several sentences that describe the Entry contents no
user_specified_type A custom value indicating the Entry type yes
created_at The creation time of the underlying resource, not of the Data Catalog Entry (format: YYYY-MM-DDTHH:MM:SSZ) no
updated_at The last-modified time of the underlying resource, not of the Data Catalog Entry (format: YYYY-MM-DDTHH:MM:SSZ) no
  • SAMPLE INPUT
  1. sample-input/csv for reference;
  2. Data Catalog Sample Custom Entries (Google Sheets) might help to create/export a CSV file.
  • COMMANDS

Python + virtualenv

datacatalog-custom-entries sync \
  --csv-file <CSV-FILE-PATH> \
  --project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>

Docker

docker build --rm --tag datacatalog-custom-entries-manager .
docker run --rm --tty \
  --volume <CREDENTIALS-FILE-FOLDER>:/credentials --volume <CSV-FILE-FOLDER>:/data \
  datacatalog-custom-entries-manager sync \
  --csv-file /data/<CSV-FILE-PATH> \
  --project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>

2.1.2. To a JSON file

  • STRUCTURE

The metadata structure to synchronize Custom Entries is presented below. Use as many objects as needed to describe all Data Catalog Entries you need.

{
  "userSpecifiedSystems": [
    {
      "name": "STRING",
      "entryGroups": [
        {
          "id": "STRING",
          "entries": [
            {
              "linkedResource": "STRING",
              "displayName": "STRING",
              "description": "STRING (optional)",
              "type": "STRING",
              "createdAt": "STRING (optional, format: YYYY-MM-DDTHH:MM:SSZ)",
              "updatedAt": "STRING (optional, format: YYYY-MM-DDTHH:MM:SSZ)"
            }
          ]
        }
      ]
    }
  ]
}
  • SAMPLE INPUT
  1. sample-input/json for reference;
  • COMMANDS

Python + virtualenv

datacatalog-custom-entries sync \
  --json-file <JSON-FILE-PATH> \
  --project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>

Docker

docker build --rm --tag datacatalog-custom-entries-manager .
docker run --rm --tty \
  --volume <CREDENTIALS-FILE-FOLDER>:/credentials --volume <CSV-FILE-FOLDER>:/data \
  datacatalog-custom-entries-manager sync \
  --json-file <JSON-FILE-PATH> \
  --project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file datacatalog-custom-entries-manager-0.1.2.tar.gz.

File metadata

  • Download URL: datacatalog-custom-entries-manager-0.1.2.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.6.12

File hashes

Hashes for datacatalog-custom-entries-manager-0.1.2.tar.gz
Algorithm Hash digest
SHA256 77aa9e1e73844e0cf3604acb90f8ac99aac550a189069c691c60befd7ff1ea7c
MD5 d7ee3afcce3974619173348190996d19
BLAKE2b-256 de83564da4cf91cfb1915044a26df09e3d56d8c27e35adb7f41cfbc6f3735328

See more details on using hashes here.

File details

Details for the file datacatalog_custom_entries_manager-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for datacatalog_custom_entries_manager-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3e53d92046a7b32b5a0ef9a4b63b632682914ee73c3aed63f62ac303fca41823
MD5 406b8b6f3bf9c39dbb91a421bdec6a02
BLAKE2b-256 a4abfe1db0629bbb5a8321449e29a92f1040823ba695be1c65b59333da45dd49

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page