Flypipe
Project description
Flypipe
Flypipe is a Python framework to simplify development, management and maintenance of transformation pipelines, which are commonly used in the data, feature and ML model space.
Each transformation is implemented in a small, composable function, a special decorator is then used to define it as a Flypipe node, which is the primary model Flypipe uses. Metadata on the node decorator allows for multiple nodes to be linked together into a Directed Acyclic Graph (DAG).
from flypipe.node import node
@node(
type="pandas",
dependencies=[t0.select("fruit").alias("df")]
)
def t1(df):
categories = {'mango': 'sweet', 'lemon': 'sour'}
df['flavour'] = df['fruit']
df = df.replace({'flavour': categories})
return df
Flypipe Pipelines
As each node (transformation) is connected to its ancestors, we can easily view the pipeline graphically in a html page
(my_graph.html()
) or execute it by invoking my_graph.run()
What Flypipe aims to facilitate?
- End-to-end transformation lineage
- Create development standards for Data Engineers, Machine Learning Engineers and Data Scientists
- Improve re-usability of transformations in different pipelines & contexts via composable nodes
- Faster integration and portability of pipelines to different contexts with different available technology stacks:
- Flexibility to use and mix up pyspark/pandas on spark/pandas in transformations seamlessly
- As a simple wheel package, it's very lightweight and unopinionated about runtime environment. This allows for it to be easily integrated into Databricks and independently of Databricks.
- Low latency for on-demand feature generation and predictions
- Framework level optimisations and dynamic transformations help to make even complex transformation pipelines low latency. This in turn allows for on-demand feature generation/predictions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.