Skip to main content

CLI tool for datalake operation

Project description

Datalake CLI

This project develops a of Command Line Interface (CLI) tool designed to facilitate the migration of data from Sage ERP systems into a structured datalake and data-warehouse architecture on Google Cloud. Aimed at enhancing data management and analytics capabilities, these tools support project-specific datalake environments identified by unique tags.

Getting Started

  1. Configuration Creation:

Install the tool

pip3 install shopcloud-datalake

Set up your configuration directory:

mkdir config-dir

Create a new Datalake configuration:

datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config create
  1. Configuration Synchronization:

Sync your configuration files to the project bucket:

datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config sync
  1. Data Migration Execution:

Run the data migration process with or without specifying a table:

datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run --partition-date=YYYY-MM-DD
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run <table> --partition-date=YYYY-MM-DD

Architektur

flowchart LR
    subgraph Data-Lake
    Sage[(Sage)] --> datalake-cli
    GCS_SCHEMA[(Storage)] --> |gs://shopcloud-datalake-sage-schema| datalake-cli
    datalake-cli --> |gs://shopcloud-datalake-sage-data| GCS_DATA[(Storage)]
    end
    subgraph Data-Warehouse
    GCS_DATA[(Storage)] --> SCDS[(BigQuery)]
    end

FAQs

  • Where are the configurations stored? Configurations are stored in a Google Cloud Storage bucket associated with each project.
  • What is the structure of the Datalake? Each project has a dedicated Google Cloud Project for data storage.
  • What file format is used? Data is stored in Parquet format for efficiency and performance. How is data partitioned? Data is partitioned using BigQuery's TimePartitioning feature.

Development

# run unit tests
$ python3 -m unittest
# run unit tests with coverage
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage html -d coverage_report
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage xml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shopcloud-datalake-1.5.0.tar.gz (18.6 kB view hashes)

Uploaded Source

Built Distribution

shopcloud_datalake-1.5.0-py3-none-any.whl (24.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page