Skip to main content

CLI tool for datalake operation

Project description

Datalake CLI

This project develops a of Command Line Interface (CLI) tool designed to facilitate the migration of data from Sage ERP systems into a structured datalake and data-warehouse architecture on Google Cloud. Aimed at enhancing data management and analytics capabilities, these tools support project-specific datalake environments identified by unique tags.

Getting Started

  1. Configuration Creation:

Install the tool

pip3 install shopcloud-datalake

Set up your configuration directory:

mkdir config-dir

Create a new Datalake configuration:

datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config create
  1. Configuration Synchronization:

Sync your configuration files to the project bucket:

datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config sync
  1. Data Migration Execution:

Run the data migration process with or without specifying a table:

datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run --partition-date=YYYY-MM-DD
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run <table> --partition-date=YYYY-MM-DD

Architektur

flowchart LR
    subgraph Data-Lake
    Sage[(Sage)] --> datalake-cli
    GCS_SCHEMA[(Storage)] --> |gs://shopcloud-datalake-sage-schema| datalake-cli
    datalake-cli --> |gs://shopcloud-datalake-sage-data| GCS_DATA[(Storage)]
    end
    subgraph Data-Warehouse
    GCS_DATA[(Storage)] --> SCDS[(BigQuery)]
    end

FAQs

  • Where are the configurations stored? Configurations are stored in a Google Cloud Storage bucket associated with each project.
  • What is the structure of the Datalake? Each project has a dedicated Google Cloud Project for data storage.
  • What file format is used? Data is stored in Parquet format for efficiency and performance. How is data partitioned? Data is partitioned using BigQuery's TimePartitioning feature.

Development

# run unit tests
$ python3 -m unittest
# run unit tests with coverage
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage html -d coverage_report
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage xml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shopcloud-datalake-1.5.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

shopcloud_datalake-1.5.0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file shopcloud-datalake-1.5.0.tar.gz.

File metadata

  • Download URL: shopcloud-datalake-1.5.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.8

File hashes

Hashes for shopcloud-datalake-1.5.0.tar.gz
Algorithm Hash digest
SHA256 cb05f915dcd87faa3e1cf81bd18754fa72fd3e82ac26d04abd84ee9996a7879d
MD5 8383f10e807e5b7ade950986fad3d350
BLAKE2b-256 0c7d8eae72d491aae3e35173f4fc3574c337e9c39402e1d721a566beaf3e7c26

See more details on using hashes here.

File details

Details for the file shopcloud_datalake-1.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for shopcloud_datalake-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60d76b0952bd9f5013640d3f206617fc011fc6121c68d85a2912e5ed2bf97d3c
MD5 5d99d8b6d7cf46b754173be72242bed0
BLAKE2b-256 3d97a53825a526d73a81b373890e05c15148a5f357a6295db6dec64ee3cde018

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page