CLI tool for datalake operation
Project description
Datalake CLI
This project develops a of Command Line Interface (CLI) tool designed to facilitate the migration of data from Sage ERP systems into a structured datalake and data-warehouse architecture on Google Cloud. Aimed at enhancing data management and analytics capabilities, these tools support project-specific datalake environments identified by unique tags.
Getting Started
- Configuration Creation:
Install the tool
pip3 install shopcloud-datalake
Set up your configuration directory:
mkdir config-dir
Create a new Datalake configuration:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config create
- Configuration Synchronization:
Sync your configuration files to the project bucket:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config sync
- Data Migration Execution:
Run the data migration process with or without specifying a table:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run --partition-date=YYYY-MM-DD
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run <table> --partition-date=YYYY-MM-DD
Architektur
flowchart LR
subgraph Data-Lake
Sage[(Sage)] --> datalake-cli
GCS_SCHEMA[(Storage)] --> |gs://shopcloud-datalake-sage-schema| datalake-cli
datalake-cli --> |gs://shopcloud-datalake-sage-data| GCS_DATA[(Storage)]
end
subgraph Data-Warehouse
GCS_DATA[(Storage)] --> SCDS[(BigQuery)]
end
FAQs
- Where are the configurations stored? Configurations are stored in a Google Cloud Storage bucket associated with each project.
- What is the structure of the Datalake? Each project has a dedicated Google Cloud Project for data storage.
- What file format is used? Data is stored in Parquet format for efficiency and performance. How is data partitioned? Data is partitioned using BigQuery's TimePartitioning feature.
Development
# run unit tests
$ python3 -m unittest
# run unit tests with coverage
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage html -d coverage_report
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage xml
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file shopcloud-datalake-1.5.0.tar.gz
.
File metadata
- Download URL: shopcloud-datalake-1.5.0.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb05f915dcd87faa3e1cf81bd18754fa72fd3e82ac26d04abd84ee9996a7879d |
|
MD5 | 8383f10e807e5b7ade950986fad3d350 |
|
BLAKE2b-256 | 0c7d8eae72d491aae3e35173f4fc3574c337e9c39402e1d721a566beaf3e7c26 |
File details
Details for the file shopcloud_datalake-1.5.0-py3-none-any.whl
.
File metadata
- Download URL: shopcloud_datalake-1.5.0-py3-none-any.whl
- Upload date:
- Size: 24.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60d76b0952bd9f5013640d3f206617fc011fc6121c68d85a2912e5ed2bf97d3c |
|
MD5 | 5d99d8b6d7cf46b754173be72242bed0 |
|
BLAKE2b-256 | 3d97a53825a526d73a81b373890e05c15148a5f357a6295db6dec64ee3cde018 |