Skip to main content

Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)

Project description

Flowfile Logo
Flowfile

Main Repository: Edwardvaneechoud/Flowfile
Documentation: Website - Core - Worker - Frontend - Technical Architecture

Flowfile is an open-source data platform that combines a visual pipeline builder, a data catalog with Delta Lake storage, scheduling, Kafka ingestion, sandboxed Python execution, and a Polars-compatible Python API — all in a single pip install.

Quick Start

pip install Flowfile
flowfile run ui

This starts the backend services and opens the visual ETL interface in your browser.

What You Get

  • Visual pipeline builder with 30+ nodes for joins, filters, aggregations, fuzzy matching, pivots, and more
  • Data catalog with Delta Lake storage, version history, and lineage tracking
  • Scheduling — interval-based or triggered by catalog table updates
  • Kafka/Redpanda ingestion as a canvas node with automatic schema inference
  • Sandboxed Python execution in isolated Docker containers
  • Code generation — export visual flows as standalone Python/Polars scripts
  • Flow parameters${variable} substitution, configurable via UI or CLI
  • Cloud storage — S3, Azure Data Lake Storage, Google Cloud Storage
  • Database connectivity — PostgreSQL, MySQL, SQL Server, Oracle, DuckDB, and more
  • Python API with Polars-like syntax and visual flow graph generation

Python API

import flowfile as ff
from flowfile import col, open_graph_in_editor

df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the pipeline on the visual canvas
open_graph_in_editor(result.flow_graph)

Common Operations

import flowfile as ff
from flowfile import col, when, lit

# Read from various sources
df = ff.read_csv("data.csv")
df_pq = ff.read_parquet("data.parquet")

# Transform
filtered = df.filter(col("value") > 150)
with_status = df.with_columns([
    when(col("value") > 200).then(lit("High")).otherwise(lit("Low")).alias("status")
])

# Aggregate
by_category = df.group_by("category").agg([
    col("value").sum().alias("total"),
    col("value").mean().alias("average")
])

# Join
joined = df.join(other_df, left_on="id", right_on="product_id")

# Visualize any pipeline
ff.open_graph_in_editor(joined.flow_graph)

Code Generation

Export visual flows as standalone Python/Polars scripts:

Code Generation

Package Components

  • Core Service (flowfile_core) — ETL engine, catalog, scheduler, auth
  • Worker Service (flowfile_worker) — CPU-intensive data processing
  • Web UI — Browser-based visual pipeline builder
  • FlowFrame API (flowfile_frame) — Polars-compatible Python library
  • Scheduler (flowfile_scheduler) — Interval and table-trigger scheduling

CLI

flowfile run ui                              # Start web UI
flowfile run core --host 0.0.0.0             # Start core service
flowfile run worker --host 0.0.0.0           # Start worker service
flowfile run flow pipeline.json              # Run a flow
flowfile run flow pipeline.json --param key=value  # Run with parameters

More Options

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowfile-0.10.0.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowfile-0.10.0-py3-none-any.whl (6.7 MB view details)

Uploaded Python 3

File details

Details for the file flowfile-0.10.0.tar.gz.

File metadata

  • Download URL: flowfile-0.10.0.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.10.0.tar.gz
Algorithm Hash digest
SHA256 bb1fdbe628be2d77b382bbc16a2168eac3b4a7fa30eca882b11730920925a3b7
MD5 2ce2b47570790402e9c60ba40f4d4f28
BLAKE2b-256 4affed838be0b16a466d73f4fbcb73b2620234e9ed78e26ada45e7c4c29ee014

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.10.0.tar.gz:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowfile-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: flowfile-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 6.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a0ecb7d19d38f7be34e6a434c83440c95424fa610d69ad622bc5b91ac0a85d1
MD5 43ff18649a4e25fd5be1af39e59b1bc4
BLAKE2b-256 b043548d39176a369ec4b399d9e9beb4228edb36ac1d46ea72c42c1f34eb2ca2

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.10.0-py3-none-any.whl:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page