CLI tool for datalake operation
Project description
Datalake CLI
This project develops a of Command Line Interface (CLI) tool designed to facilitate the migration of data from Sage ERP systems into a structured datalake and data-warehouse architecture on Google Cloud. Aimed at enhancing data management and analytics capabilities, these tools support project-specific datalake environments identified by unique tags.
Getting Started
- Configuration Creation:
Install the tool
pip3 install shopcloud-datalake
Set up your configuration directory:
mkdir config-dir
Create a new Datalake configuration:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config create
- Configuration Synchronization:
Sync your configuration files to the project bucket:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" config sync
- Data Migration Execution:
Run the data migration process with or without specifying a table:
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run --partition-date=YYYY-MM-DD
datalake --project="your-google-cloud-project-id" --base-dir="config-dir" run <table> --partition-date=YYYY-MM-DD
Architektur
flowchart LR
subgraph Data-Lake
Sage[(Sage)] --> datalake-cli
GCS_SCHEMA[(Storage)] --> |gs://shopcloud-datalake-sage-schema| datalake-cli
datalake-cli --> |gs://shopcloud-datalake-sage-data| GCS_DATA[(Storage)]
end
subgraph Data-Warehouse
GCS_DATA[(Storage)] --> SCDS[(BigQuery)]
end
FAQs
- Where are the configurations stored? Configurations are stored in a Google Cloud Storage bucket associated with each project.
- What is the structure of the Datalake? Each project has a dedicated Google Cloud Project for data storage.
- What file format is used? Data is stored in Parquet format for efficiency and performance. How is data partitioned? Data is partitioned using BigQuery's TimePartitioning feature.
Development
# run unit tests
$ python3 -m unittest
# run unit tests with coverage
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage html -d coverage_report
$ python3 -m coverage run --source=tests,shopcloud_datalake -m unittest discover && python3 -m coverage xml
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
shopcloud-datalake-1.4.0.tar.gz
(18.7 kB
view hashes)
Built Distribution
Close
Hashes for shopcloud_datalake-1.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b34a710d4caf74d5cc60a557581673247bd36882a15a778e8e3b720f5c2052a |
|
MD5 | 124cae40f811eb8848b80ad1b3990428 |
|
BLAKE2b-256 | 63643690b6f4c81fec07c603301576e35a1e841288664a22594344db7b146158 |