Yet Another Python (data) Pipeline
Project description
yapp - Yet Another Python (data) Pipeline
yapp is a simple python data pipeline framework, it is inspired by ploomber.
Development is still in early stages, many things may change in the future
yapp strives to be as simple as possible and make you focus on the correctness of your algorithms. It's developed with specific requirements and built according to those: it may be the best choice for you once completed, or may be not. For sure it isn't right now.
Usage
Pipelines are described using yaml files:
pipelines.yml
defines the pipelines [required]config.yml
defines a global configuration (e.g. inputs and outputs)
A Pipeline is made up of Jobs. A Job represents a step of the pipeline, it takes inputs as parameters and returns a dict of outputs. The pipeline.yml file defines the dependencies of every Job in the Pipeline. They are resolved and then they are run one at the time (even if it may be possible to run them in parallel, this is a willingly design choice).
Pipelines can have hooks to perform specific task before or after each task (such as updating some kind of status monitor)
You can run a pipeline using:
yapp PIPELINE_NAME [PATH]
yapp automatically searches for classes and functions you use in your yaml files. It searches in, in order: 1. The pipeline directory (if it exists) 2. Top level directory of your code 3. yapp built-in modules
TODOs
Basic features still missing:
- yapp cli command
- Finalize yaml files specification
- Proper code organization
- Package
- A good and working example
- hooks
- Working global config for multiple pipelines
Possibly lower priority
- TESTS.
GoodBetter logging- docstrings
- Add sample pipeline status monitor class
- Consider permitting repeted tasks in a single pipeline (can this be useful?)
- For each step, keep track of the inputs required in future steps. So that unneeded ones can be removed from memory
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for yapp_pipelines-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5db20a2ac259362686a7e8de1d90576e03f0c6547d7ab6f2c0e0b513028c92e7 |
|
MD5 | 931c7fccc985fcbd2be256a77a840c01 |
|
BLAKE2b-256 | c176b482f1a108f17b0f3990f2eb188e7f9e8a8e72823fa8e3b1f42c8b8d4a60 |