Skip to main content

Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)

Project description

Flowfile Logo
Flowfile

Main Repository: Edwardvaneechoud/Flowfile
Documentation: Website - Core - Worker - Frontend - Technical Architecture

Flowfile is an open-source data platform that combines a visual pipeline builder, a data catalog with Delta Lake storage, scheduling, Kafka ingestion, sandboxed Python execution, and a Polars-compatible Python API — all in a single pip install.

Quick Start

pip install Flowfile
flowfile run ui

This starts the backend services and opens the visual ETL interface in your browser.

What You Get

  • Visual pipeline builder with 30+ nodes for joins, filters, aggregations, fuzzy matching, pivots, and more
  • Data catalog with Delta Lake storage, version history, and lineage tracking
  • Scheduling — interval-based or triggered by catalog table updates
  • Kafka/Redpanda ingestion as a canvas node with automatic schema inference
  • Sandboxed Python execution in isolated Docker containers
  • Code generation — export visual flows as standalone Python/Polars scripts
  • Flow parameters${variable} substitution, configurable via UI or CLI
  • Cloud storage — S3, Azure Data Lake Storage, Google Cloud Storage
  • Database connectivity — PostgreSQL, MySQL, SQL Server, Oracle, DuckDB, and more
  • Python API with Polars-like syntax and visual flow graph generation

Python API

import flowfile as ff
from flowfile import col, open_graph_in_editor

df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the pipeline on the visual canvas
open_graph_in_editor(result.flow_graph)

Common Operations

import flowfile as ff
from flowfile import col, when, lit

# Read from various sources
df = ff.read_csv("data.csv")
df_pq = ff.read_parquet("data.parquet")

# Transform
filtered = df.filter(col("value") > 150)
with_status = df.with_columns([
    when(col("value") > 200).then(lit("High")).otherwise(lit("Low")).alias("status")
])

# Aggregate
by_category = df.group_by("category").agg([
    col("value").sum().alias("total"),
    col("value").mean().alias("average")
])

# Join
joined = df.join(other_df, left_on="id", right_on="product_id")

# Visualize any pipeline
ff.open_graph_in_editor(joined.flow_graph)

Code Generation

Export visual flows as standalone Python/Polars scripts:

Code Generation

Package Components

  • Core Service (flowfile_core) — ETL engine, catalog, scheduler, auth
  • Worker Service (flowfile_worker) — CPU-intensive data processing
  • Web UI — Browser-based visual pipeline builder
  • FlowFrame API (flowfile_frame) — Polars-compatible Python library
  • Scheduler (flowfile_scheduler) — Interval and table-trigger scheduling

CLI

flowfile run ui                              # Start web UI
flowfile run core --host 0.0.0.0             # Start core service
flowfile run worker --host 0.0.0.0           # Start worker service
flowfile run flow pipeline.json              # Run a flow
flowfile run flow pipeline.json --param key=value  # Run with parameters

More Options

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowfile-0.9.2.tar.gz (5.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowfile-0.9.2-py3-none-any.whl (6.0 MB view details)

Uploaded Python 3

File details

Details for the file flowfile-0.9.2.tar.gz.

File metadata

  • Download URL: flowfile-0.9.2.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.9.2.tar.gz
Algorithm Hash digest
SHA256 6020deb26b664d1f37ee64653a796c04802904550aecfa65277af114eec32823
MD5 32ba2c0162173980b8e5dd3f96007652
BLAKE2b-256 9b0062c174118dda2e9bf01df7d555a411874f8e1c91afd344b1e0ebe1ac7d14

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.9.2.tar.gz:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowfile-0.9.2-py3-none-any.whl.

File metadata

  • Download URL: flowfile-0.9.2-py3-none-any.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0703b4f7b6e1f50834caa6059269335b84ef0f5420c6e2b0ad7165eef33d255c
MD5 2d3ecb619bc9b89db0aac0eb98f6ece9
BLAKE2b-256 f159530e6b2313e9f950a547d4969d68285c96ad3579d62dffecf9f467794bc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.9.2-py3-none-any.whl:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page