This Project is data engineering framework which implements all data engineering ingestion patterns
Project description
ProjectOneflow Package is metadata-driven framework which implements all data-engineering patterns as a workload with deployment in-place
Quick Start
To Test Locally, run the below command
Install
Data Engineering Package is deployed on Pypi package manager. To install the package: 1. Run the below code to install the code `shell pip install projectoneflow `
<!– To Get Started
Please use below command: `shell projectoneflow blueprint create -o <TARGET_FOLDER_PATH> ` Above command will be asking few questions, which generates the pipeline folder in which pipeline json template is created following your answers. You need to specify <TARGET_FOLDER_PATH> which is used to write the generated template files, if not specified it saves to current directory. –> ## Let’s discuss project structure <!– start project structure –> The whole package is structured in same way as described as under namespace projectoneflow with sub-module in this namespace is a folder in source project folder ### Below are the modules: cli: contains code related to cli command implementation, and reference to sub-commands implementation
exception: contains code related to custom exceptions which are used to raise in this project
execution: contains code related to execution operators and task context implementation
observability: contains code related to logging, instrumentation, event-listener implementation
pipeline: contains code related to different deployment like terraform etc
schemas: contains all schema definition which ever used in this package
secrets: contains implementation of task specific secret scope manager
state: contains code related task specific state manager
task: contains code related to task specific implementation
utils: contains code related to utilities used in this package
All above modules are placed under src/projectoneflow folder
<!– end project structure –>
ProjectOneflow Design
Every pipeline/tranformation in data-engineering can be expressed as three stages which are input -> execution -> output
To explain further, input corresponds to source/producer from where we are extracting data for transformation
Execution stage is where core transformation logic is defined which takes input/producer data and applies some transformations and returns the transformed data
Ouput stage is where transformed data is written to consumer/sink.
By following above flow as the foundational design, on top of it each stage will be moving in different state, so to capture that projectoneflow follows the operator model
Where each stage is a operator which follows the flow pre-step execution -> stage -> post-step execution, here pre-step and post-step are configured with each operator as features.
These operators will operator in sequence using task model, where each task has there implementation with will have support state management, logging, event-listeners.
On top of these, task are executed by the pipeline. where pipeline is wrapper to execute the task as dag. Pipeline are deployed in databricks, some other enviornments using terraform provider or In future extendable provider.
To Refer more about the commands or API documentation, please refer this [docs 🔗](https://github.com/narramukhesh/projectone/tree/main/projectoneflow).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file projectoneflow-1.0.0.tar.gz.
File metadata
- Download URL: projectoneflow-1.0.0.tar.gz
- Upload date:
- Size: 89.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ee12624ca744e8da452f92abf42804ef57286913c135a716b9e2bde546c88a8
|
|
| MD5 |
b3044307963bbcc5c0465098d3a2fd04
|
|
| BLAKE2b-256 |
f0b56b2006ce698b82e5ae420a87de6dd63b793fabb0f6412561ee7f7047309b
|