Flypipe
Project description
Flypipe
Flypipe is a Python framework to simplify development, management and maintenance of transformation pipelines, which are commonly used in the data, feature and ML model space.
Each transformation is implemented in a small, composable function, a special decorator is then used to define it as a Flypipe node, which is the primary model Flypipe uses. Metadata on the node decorator allows for multiple nodes to be linked together into a Directed Acyclic Graph (DAG).
from flypipe.node import node
@node(
type="pandas",
dependencies=[t0.select("fruit").alias("df")]
)
def t1(df):
categories = {'mango': 'sweet', 'lemon': 'sour'}
df['flavour'] = df['fruit']
df = df.replace({'flavour': categories})
return df
Flypipe Pipelines
As each node (transformation) is connected to its ancestors, we can easily view the pipeline graphically in a html page
(my_graph.html()
) or execute it by invoking my_graph.run()
What Flypipe aims to facilitate?
- Free open-source tool for data transformations
- Facilitate streaming pipeline development (improved use of caches)
- Increase pipeline stability (better use of unittests)
- End-to-end transformation lineage
- Create development standards for Data Engineers, Machine Learning Engineers and Data Scientists
- Improve re-usability of transformations in different pipelines & contexts via composable nodes
- Faster integration and portability of pipelines to different contexts with different available technology stacks:
- Flexibility to use and mix up pyspark/pandas on spark/pandas in transformations seamlessly
- As a simple wheel package, it's very lightweight and unopinionated about runtime environment. This allows for it to be easily integrated into Databricks and independently of Databricks.
- Low latency for on-demand feature generation and predictions
- Framework level optimisations and dynamic transformations help to make even complex transformation pipelines low latency. This in turn allows for on-demand feature generation/predictions.
Commonly used
Source Code
API code is available at https://github.com/flypipe/flypipe.
Documentation
Full documentation is available at https://flypipe.github.io/flypipe/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flypipe-4.0.1.tar.gz
.
File metadata
- Download URL: flypipe-4.0.1.tar.gz
- Upload date:
- Size: 46.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c58e412708bd4ee3f847faccc20da72a280e59b236994975dfbf7a84763499b9 |
|
MD5 | 322c55ef92b45289b85c22923d2a8edc |
|
BLAKE2b-256 | 1fd67a8f976e835c6e078696038bd65632e6f0d3c5004087108ed499ca8f1f7f |
File details
Details for the file flypipe-4.0.1-py3-none-any.whl
.
File metadata
- Download URL: flypipe-4.0.1-py3-none-any.whl
- Upload date:
- Size: 1.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b8d62fe264ba94a36aa3cad4f2eea2fa665e91b686ba7be5823aafdb6f00f66 |
|
MD5 | 199a665d004c5ef8718b93f0a219c48f |
|
BLAKE2b-256 | 117d49d35d27d5d1ec69ddffbcfd225db3241ac5220fc5b18f046a37ee84b9a3 |