Skip to main content

CUREd+ metadata tool: generates a list of all the columns in every table in the database.

Project description

Tests passing

CUREd+ metadata generator

The CUREd+ metadata generator tool generates a list of all the columns in every table in the database.

The data in the target bucket must be arranged in the following directory structure: <data_set_id>/<table_id>/data/*.parquet

This script will generate a CSV file with the following columns:

  • data_set_id
  • table_id
  • column_name
  • data_type

Installation

Ensure Python is installed. (See this tutorial.)

Install AWS command-line interface (CLI). Configure your access key using the aws configure command.

Install this package using the Python package manager:

pip install curedcolumns

Usage

The basic usage of this app is to specify the AWS CLI profile and the bucket name you want to inspect.

curedcolumns --profile $AWS_PROFILE $AWS_BUCKET --output $OUTPUT_FILE

You should create an AWS profile using the aws configure command.

aws configure --profile $AWS_PROFILE

To view the command line options:

$ curedcolumns --help 
usage: curedcolumns [-h] [-v] [--version] [-l LOGLEVEL] [--prefix PREFIX] [--no-sign-request] [--profile PROFILE] [-d DELIMITER] [-o OUTPUT] [-f] bucket

List all the field names for all the data sets in a bucket on AWS S3 object storage and display the metadata in CSV format. This assumes a folder structure in this layout: <data_set_id>/<table_id>/data/*.parquet

positional arguments:
  bucket                S3 bucket location URI

options:
  -h, --help            show this help message and exit
  -v, --verbose
  --version             Show the version number of this tool
  -l LOGLEVEL, --loglevel LOGLEVEL
  --prefix PREFIX       Limits the response to keys that begin with the specified prefix.
  --no-sign-request
  --profile PROFILE     AWS profile to use
  -d DELIMITER, --delimiter DELIMITER
                        Column separator character
  -o OUTPUT, --output OUTPUT
                        Output file path. Default: screen
  -f, --force           Overwrite output file if it already exists

Example

Use the AWS CLI profile named "clean"

curedcolumns --profile clean s3://my_bucket.aws.com

Development

See CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

curedcolumns-0.1.3.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

curedcolumns-0.1.3-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file curedcolumns-0.1.3.tar.gz.

File metadata

  • Download URL: curedcolumns-0.1.3.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.8

File hashes

Hashes for curedcolumns-0.1.3.tar.gz
Algorithm Hash digest
SHA256 47e61cc01c71eee0bab990611142ea525deafb9a52b09532f3b7c43c96628d25
MD5 d7e563004de77b35701a0c0aaa485646
BLAKE2b-256 520846940ed02d756653b5aa64d7260a2925d210eee31db860037e887bd72a27

See more details on using hashes here.

File details

Details for the file curedcolumns-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: curedcolumns-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.8

File hashes

Hashes for curedcolumns-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a335636ee79ac076934055e757a3470dfa9c34f4d687ba0274e6fc87116007c4
MD5 566b6d9d327970f16734c491e49a216b
BLAKE2b-256 2dcb6a3d5c93bf8acbd19d8284ef1a15369d3f6678ee78a31fd0c8a5b2f7affb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page