Skip to main content

Lightweight Declarative Data Framework

Project description

DeFlow

test pypi version python support version size gh license code style: black

A Lightweight Declarative Data Framework that allow you to run data pipelines by YAML config template.

[!NOTE] I want to use this project is the real-world use-case for my Workflow package that able to handle production data pipeline with the DataOps strategy.

[!WARNING] This framework does not allow you to custom your pipeline yet. If you want to create your workflow, you can implement it by your custom template reference this package.

In my opinion, I think it should not create duplicate workflow codes if I can write with dynamic input parameters on the one template workflow that just change the input parameters per use-case instead. This way I can handle a lot of logical workflows in our orgs with only metadata configuration. It called Metadata Driven Data Workflow.

📦 Installation

pip install -U deflow

Support data framework version:

Version Supported Description
1 Progress Large scale base on stream, group, process, and routing.
2 Progress Medium scale base on pipeline, and node.
3 Progress Lightweight base on dag, and task.

[!NOTE] I think it should stop with 3 versions of data framework.

:dart: Framework

⭕ Version 1

[!NOTE] This project will create the data framework Version 1 first.

After initialize your data framework project with Version 1, your data pipeline config files will store with this file structure:

conf/
 ├─ routes/
 │   ╰─ routing.yml
 ├─ shared/
 │   ├─ { c_conn_01 }.yml
 │   ╰─ { c_conn_02 }.yml
 ├─ stream/
 │   ╰─ { s_stream_01 }/
 │       ├─ { g_group_01 }.tier.priority/
 │       │   ├─ { p_proces_01 }.yml
 │       │   ╰─ { p_proces_02 }.yml
 │       ├─ { g_group_02 }.tier.priority/
 │       │   ├─ { p_proces_01 }.yml
 │       │   ╰─ { p_proces_02 }.yml
 │       ╰─ config.yml
 ╰─ .confignore

⭕ Version 2

After initialize your data framework project with Version 2, your data pipeline config files will store with this file structure:

conf/
 ├─ pipeline/
 │   ╰─ { p_pipe_01 }/
 │       ├─ config.yml
 │       ├─ { n_node_01 }.yml
 │       ╰─ { n_node_02 }.yml
 ╰─ .confignore

⭕ Version 3

[!NOTE] This version is the same DAG and Task strategy like Airflow.

conf/
 ├─ dag/
 │   ╰─ { dag_cm_d }/
 │       ├─ assets/
 │       │   ├─ { some-asset }.sql
 │       │   ╰─ { some-asset }.json
 │       ├─ config.yml
 │       ╰─ variables.yml
 ╰─ .confignore

Getting Started

You can run the data flow by:

from deflow.flow import Flow
from ddeutil.workflow import Result

flow: Result = (
    Flow(name="s_stream_01", version="v1")
    .option("conf_paths", ["./data/conf"])
    .run(mode="N")
)

:cookie: Configuration

This package configuration:

Name Component Default Description
DEFLOW_CORE_CONF_PATH CORE ./conf A config path to get data framework configuration.
DEFLOW_CORE_VERSION CORE v1 A specific data framework version.
DEFLOW_CORE_REGISTRY_CALLER CORE . A registry of caller function.

Relate workflow configuration that will impact this package:

💬 Contribute

I do not think this project will go around the world because it has specific propose, and you can create by your coding without this project dependency for long term solution. So, on this time, you can open the GitHub issue on this project 🙌 for fix bug or request new feature if you want it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deflow-0.0.6.tar.gz (34.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deflow-0.0.6-py3-none-any.whl (28.9 kB view details)

Uploaded Python 3

File details

Details for the file deflow-0.0.6.tar.gz.

File metadata

  • Download URL: deflow-0.0.6.tar.gz
  • Upload date:
  • Size: 34.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for deflow-0.0.6.tar.gz
Algorithm Hash digest
SHA256 c9238659375e88275460d304413b4f0043eb2667142a256a16c350e0f7da69fc
MD5 d712d9ef6dc6bbea7a1a98009c333a5b
BLAKE2b-256 b8424ff7e72a9f73c6d1d67e93bfb4e2165d0918d6a1bf76da754890a127da55

See more details on using hashes here.

Provenance

The following attestation bundles were made for deflow-0.0.6.tar.gz:

Publisher: publish.yml on ddeutils/deflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deflow-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: deflow-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for deflow-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f09069b4ed02c08913705038205075077c4612adc012e50ac0dfaf0d9ed4ae77
MD5 a47b2432860705b6fc3a4351f3aab9d6
BLAKE2b-256 50a3eded10998b6b40f907bc0c34c5ae998b99414a31e6307f86edfd69f186c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for deflow-0.0.6-py3-none-any.whl:

Publisher: publish.yml on ddeutils/deflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page