CLI tool for datalake operation
Project description
Datalake CLI
This project develops a of Command Line Interface (CLI) tool designed to facilitate the migration of data from Sage ERP systems into a structured datalake and data-warehouse architecture on Google Cloud. Aimed at enhancing data management and analytics capabilities, these tools support project-specific datalake environments identified by unique tags.
Getting Started
- Configuration Creation:
Install the tool
pip3 install shopcloud-datalake
Set up your configuration directory:
mkdir config-dir
Create a new Datalake configuration:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config create
- Configuration Synchronization:
Sync your configuration files to the project bucket:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config sync
- Data Migration Execution:
Run the data migration process with or without specifying a table:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run --partition-date=YYYY-MM-DD
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run <table> --partition-date=YYYY-MM-DD
Architektur
flowchart LR
subgraph Data-Lake
Sage[(Sage)] --> datalake-cli
GCS_SCHEMA[(Storage)] --> |gs://shopcloud-datalake-sage-schema| datalake-cli
datalake-cli --> |gs://shopcloud-datalake-sage-data| GCS_DATA[(Storage)]
end
subgraph Data-Warehouse
GCS_DATA[(Storage)] --> SCDS[(BigQuery)]
end
FAQs
- Where are the configurations stored? Configurations are stored in a Google Cloud Storage bucket associated with each project.
- What is the structure of the Datalake? Each project has a dedicated Google Cloud Project for data storage.
- What file format is used? Data is stored in Parquet format for efficiency and performance. How is data partitioned? Data is partitioned using BigQuery's TimePartitioning feature.
Development
# run unit tests
$ python3 -m unittest
# run unit tests with coverage
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage html -d coverage_report
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage xml
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
shopcloud-datalake-1.5.0.tar.gz
(18.6 kB
view hashes)
Built Distribution
Close
Hashes for shopcloud_datalake-1.5.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60d76b0952bd9f5013640d3f206617fc011fc6121c68d85a2912e5ed2bf97d3c |
|
MD5 | 5d99d8b6d7cf46b754173be72242bed0 |
|
BLAKE2b-256 | 3d97a53825a526d73a81b373890e05c15148a5f357a6295db6dec64ee3cde018 |