Skip to main content

Python lightweight workflow management framework with data exploration features

Project description

Sinagot

Package version Supported Python versions


Source Code: https://gitlab.com/YannBeauxis/sinagot


Sinagot is a Python lightweight workflow management framework using Ray as distributed computing engine.

The key features are:

  • Easy to use: Design workflow with simple Python classes and functions without external configuration files.
  • Data exploration: Access to computed data directly with object attributes, including complex type as pandas DataFrame.
  • Scalable: The Ray engine enable seamless scaling of workflows to external clusters.

Installation

pip install sinagot

Getting started

import pandas as pd
import sinagot as sg

# Decorate functions to use them as workflow step
@sg.step
def multiply(df: pd.DataFrame, factor: int) -> pd.DataFrame:
    return df * factor


@sg.step
def get_single_data(df: pd.DataFrame) -> int:
    return int(df.iloc[0, 0])


# Design a workflow
class TestWorkflow(sg.Workflow):
    raw_data: pd.DataFrame = sg.seed() # seed is input data
    factor: int = sg.seed()
    multiplied_data: pd.DataFrame = multiply.step(raw_data, factor=factor)
    final_data: int = get_single_data.step(multiplied_data)


# Create a workspace on top of workflow for storage policy of data produced
class TestWorkspace(sg.Workspace[TestWorkflow]):
    raw_data = sg.LocalStorage("raw_data/data-{workflow_id}.csv")
    factor = sg.LocalStorage("params/factor")
    multiplied_data = sg.LocalStorage(
        "computed/multiplied_data-{workflow_id}.csv", write_kwargs={"index": False}
    )
    # In this example final_data is not stored and computed on demand


# Create a workspace with local storage folder root path parameter
ws = TestWorkspace("/path/to/local_storage")

# Access to a single workflow with its ID
wf = ws["001"]

# Access to item data, computed automatically if it does not exist in storage
display(wf.multiplied_data)
print(wf.final_data)

In this example, the storage dataset is structured as follows :

├── params/
│   └── factor
├── raw_data/
│   ├── data-{item_id}.csv
│   └── ...
└── computed/
    ├── step-1-{item_id}.csv
    └── ...

And the workflow is :

Development Roadmap

Sinagot is at an early development stage but ready to be tested on actual datasets for workflows prototyping.

Features development roadmap will be prioritized depending on usage feedbacks, so feel free to post an issue if you have any requirement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinagot-0.5.3.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

sinagot-0.5.3-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file sinagot-0.5.3.tar.gz.

File metadata

  • Download URL: sinagot-0.5.3.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.12 Linux/5.4.109+

File hashes

Hashes for sinagot-0.5.3.tar.gz
Algorithm Hash digest
SHA256 ec2f0566e733dab98bad9f2d395ac64f69863925a98613213e83d4b0f5cf8e6f
MD5 1ad511fe005c115f953457cc567e7429
BLAKE2b-256 f8eb1b031706b7a1c06df49639a137f4a618678cac6008c0caedf6ff093c4753

See more details on using hashes here.

File details

Details for the file sinagot-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: sinagot-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.12 Linux/5.4.109+

File hashes

Hashes for sinagot-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 28993a07504221ed99fab4888c5eaf2775761c7029fc1fdc228b988aa42b2cc7
MD5 e7ef0cc494cf0daf1981457bbb17729b
BLAKE2b-256 17e9b605047ba9e7bbb49f70bb52d7afa7ea1d44b99f07443670c127fea5982a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page