Skip to main content

Flypipe

Project description

Flypipe

verification

Flypipe is a Python framework to simplify development, management and maintenance of transformation pipelines, which are commonly used in the data, feature and ML model space.

Each transformation is implemented in a small, composable function, a special decorator is then used to define it as a Flypipe node, which is the primary model Flypipe uses. Metadata on the node decorator allows for multiple nodes to be linked together into a Directed Acyclic Graph (DAG).

from flypipe.node import node


@node(
  type="pandas",
  dependencies=[t0.select("fruit").alias("df")]
)
def t1(df):
  categories = {'mango': 'sweet', 'lemon': 'sour'}
  df['flavour'] = df['fruit']
  df = df.replace({'flavour': categories})
  return df

Flypipe Pipelines

As each node (transformation) is connected to its ancestors, we can easily view the pipeline graphically in a html page (my_graph.html()) or execute it by invoking my_graph.run()

Flypipe Graph Pipeline

What Flypipe aims to facilitate?

  • Free open-source tool for data transformations
  • Facilitate streaming pipeline development (improved use of caches)
  • Increase pipeline stability (better use of unittests)
  • End-to-end transformation lineage
  • Create development standards for Data Engineers, Machine Learning Engineers and Data Scientists
  • Improve re-usability of transformations in different pipelines & contexts via composable nodes
  • Faster integration and portability of pipelines to different contexts with different available technology stacks:
    • Flexibility to use and mix up pyspark/pandas on spark/pandas in transformations seamlessly
    • As a simple wheel package, it's very lightweight and unopinionated about runtime environment. This allows for it to be easily integrated into Databricks and independently of Databricks.
  • Low latency for on-demand feature generation and predictions
  • Framework level optimisations and dynamic transformations help to make even complex transformation pipelines low latency. This in turn allows for on-demand feature generation/predictions.

Commonly used

Databricks Python

Source Code

API code is available at https://github.com/flypipe/flypipe.

Documentation

Full documentation is available at https://flypipe.github.io/flypipe/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flypipe-4.0.0.tar.gz (46.1 MB view hashes)

Uploaded Source

Built Distribution

flypipe-4.0.0-py3-none-any.whl (1.9 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page