Skip to main content

A package for performing Data Catalog operations on object storage solutions

Project description

datacatalog-object-storage-processor

A package for performing Data Catalog operations on object storage solutions.

CircleCI PyPi License Issues

Table of Contents


1. Environment setup

1.1. Get the code

git clone https://github.com/mesmacosta/datacatalog-object-storage-processor
cd datacatalog-object-storage-processor

1.2. Auth credentials

1.2.1. Create a service account and grant it below roles
  • Data Catalog Admin
  • Storage Admin or Custom Role with storage.buckets.list acl
1.2.2. Download a JSON key and save it as
  • ./credentials/datacatalog-object-storage-processor-sa.json

1.3. Virtualenv

Using virtualenv is optional, but strongly recommended unless you use Docker.

1.3.1. Install Python 3.6+
1.3.2. Create and activate an isolated Python environment
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
1.3.3. Install the dependencies
pip install --upgrade --editable .
1.3.4. Set environment variables
export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-object-storage-processor-sa.json

1.4. Docker

Docker may be used as an alternative to run all the scripts. In this case, please disregard the Virtualenv install instructions.

2. Create DataCatalog entries based on object storage files

2.1. python main.py

  • python
datacatalog-object-storage-processor \
  object-storage create-entries --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name \
  --bucket-prefix my_bucket
  • docker
docker build --rm --tag datacatalog-object-storage-processor .
docker run --rm --tty -v your_credentials_folder:/data datacatalog-object-storage-processor \
  --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name \
  --bucket-prefix my_bucket

3 Delete up object storage entries on entry group

Delete entries for given entry group

datacatalog-object-storage-processor \
  object-storage delete-entries --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name

Disclaimers

This is not an officially supported Google product.

History

0.1.0 (2020-05-01)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file datacatalog-object-storage-processor-0.1.2.tar.gz.

File metadata

  • Download URL: datacatalog-object-storage-processor-0.1.2.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.0

File hashes

Hashes for datacatalog-object-storage-processor-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2cfd591e7c7469656e24966fb7a32dbeabf03d50b8d2b1a53a0dd7ea8d774dc5
MD5 cdf38ae3f705c92ac8cedf9ea8579f74
BLAKE2b-256 5b6798d1066b343f3316b71eaeef4c889dc250a649071727ea1d889edcc1a2fa

See more details on using hashes here.

File details

Details for the file datacatalog_object_storage_processor-0.1.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for datacatalog_object_storage_processor-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e2c24aa69a15c749bc600c7e95b435d61ed4e2462428f85fdc54fe9ef5818af4
MD5 7de4e96396f64cec3a638b7e1187a400
BLAKE2b-256 e32751f9eab355b93d5fd1d504bf0be3af95bca3fad6fa0fe6c23add5bd1bc5c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page