Skip to main content

Flypipe

Project description

Flypipe

verification

Flypipe is a Python framework to simplify development, management and maintenance of transformation pipelines, which are commonly used in the data, feature and ML model space.

Each transformation is implemented in a small, composable function, a special decorator is then used to define it as a Flypipe node, which is the primary model Flypipe uses. Metadata on the node decorator allows for multiple nodes to be linked together into a Directed Acyclic Graph (DAG).

from flypipe.node import node


@node(
  type="pandas",
  dependencies=[t0.select("fruit").alias("df")]
)
def t1(df):
  categories = {'mango': 'sweet', 'lemon': 'sour'}
  df['flavour'] = df['fruit']
  df = df.replace({'flavour': categories})
  return df

Flypipe Pipelines

As each node (transformation) is connected to its ancestors, we can easily view the pipeline graphically in a html page (my_graph.html()) or execute it by invoking my_graph.run()

Flypipe Graph Pipeline

What Flypipe aims to facilitate?

  • Free open-source tool for data transformations
  • Facilitate streaming pipeline development (improved use of caches)
  • Increase pipeline stability (better use of unittests)
  • End-to-end transformation lineage
  • Create development standards for Data Engineers, Machine Learning Engineers and Data Scientists
  • Improve re-usability of transformations in different pipelines & contexts via composable nodes
  • Faster integration and portability of pipelines to different contexts with different available technology stacks:
    • Flexibility to use and mix up pyspark/pandas on spark/pandas in transformations seamlessly
    • As a simple wheel package, it's very lightweight and unopinionated about runtime environment. This allows for it to be easily integrated into Databricks and independently of Databricks.
  • Low latency for on-demand feature generation and predictions
  • Framework level optimisations and dynamic transformations help to make even complex transformation pipelines low latency. This in turn allows for on-demand feature generation/predictions.

Commonly used

Databricks Python

Source Code

API code is available at https://github.com/flypipe/flypipe.

Documentation

Full documentation is available at https://flypipe.github.io/flypipe/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flypipe-4.0.1.tar.gz (46.1 MB view details)

Uploaded Source

Built Distribution

flypipe-4.0.1-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file flypipe-4.0.1.tar.gz.

File metadata

  • Download URL: flypipe-4.0.1.tar.gz
  • Upload date:
  • Size: 46.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for flypipe-4.0.1.tar.gz
Algorithm Hash digest
SHA256 c58e412708bd4ee3f847faccc20da72a280e59b236994975dfbf7a84763499b9
MD5 322c55ef92b45289b85c22923d2a8edc
BLAKE2b-256 1fd67a8f976e835c6e078696038bd65632e6f0d3c5004087108ed499ca8f1f7f

See more details on using hashes here.

File details

Details for the file flypipe-4.0.1-py3-none-any.whl.

File metadata

  • Download URL: flypipe-4.0.1-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for flypipe-4.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4b8d62fe264ba94a36aa3cad4f2eea2fa665e91b686ba7be5823aafdb6f00f66
MD5 199a665d004c5ef8718b93f0a219c48f
BLAKE2b-256 117d49d35d27d5d1ec69ddffbcfd225db3241ac5220fc5b18f046a37ee84b9a3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page