Skip to main content

Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)

Project description

Flowfile Logo
Flowfile

Main Repository: Edwardvaneechoud/Flowfile
Documentation: Website - Core - Worker - Frontend - Technical Architecture

Flowfile is an open-source data platform that combines a visual pipeline builder, a data catalog with Delta Lake storage, scheduling, Kafka ingestion, sandboxed Python execution, and a Polars-compatible Python API — all in a single pip install.

Quick Start

pip install Flowfile
flowfile run ui

This starts the backend services and opens the visual ETL interface in your browser.

What You Get

  • Visual pipeline builder with 30+ nodes for joins, filters, aggregations, fuzzy matching, pivots, and more
  • Data catalog with Delta Lake storage, version history, and lineage tracking
  • Scheduling — interval-based or triggered by catalog table updates
  • Kafka/Redpanda ingestion as a canvas node with automatic schema inference
  • Sandboxed Python execution in isolated Docker containers
  • Code generation — export visual flows as standalone Python/Polars scripts
  • Flow parameters${variable} substitution, configurable via UI or CLI
  • Cloud storage — S3, Azure Data Lake Storage, Google Cloud Storage
  • Database connectivity — PostgreSQL, MySQL, SQL Server, Oracle, DuckDB, and more
  • Python API with Polars-like syntax and visual flow graph generation

Python API

import flowfile as ff
from flowfile import col, open_graph_in_editor

df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the pipeline on the visual canvas
open_graph_in_editor(result.flow_graph)

Common Operations

import flowfile as ff
from flowfile import col, when, lit

# Read from various sources
df = ff.read_csv("data.csv")
df_pq = ff.read_parquet("data.parquet")

# Transform
filtered = df.filter(col("value") > 150)
with_status = df.with_columns([
    when(col("value") > 200).then(lit("High")).otherwise(lit("Low")).alias("status")
])

# Aggregate
by_category = df.group_by("category").agg([
    col("value").sum().alias("total"),
    col("value").mean().alias("average")
])

# Join
joined = df.join(other_df, left_on="id", right_on="product_id")

# Visualize any pipeline
ff.open_graph_in_editor(joined.flow_graph)

Code Generation

Export visual flows as standalone Python/Polars scripts:

Code Generation

Package Components

  • Core Service (flowfile_core) — ETL engine, catalog, scheduler, auth
  • Worker Service (flowfile_worker) — CPU-intensive data processing
  • Web UI — Browser-based visual pipeline builder
  • FlowFrame API (flowfile_frame) — Polars-compatible Python library
  • Scheduler (flowfile_scheduler) — Interval and table-trigger scheduling

CLI

flowfile run ui                              # Start web UI
flowfile run core --host 0.0.0.0             # Start core service
flowfile run worker --host 0.0.0.0           # Start worker service
flowfile run flow pipeline.json              # Run a flow
flowfile run flow pipeline.json --param key=value  # Run with parameters

More Options

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowfile-0.10.1.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowfile-0.10.1-py3-none-any.whl (6.7 MB view details)

Uploaded Python 3

File details

Details for the file flowfile-0.10.1.tar.gz.

File metadata

  • Download URL: flowfile-0.10.1.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.10.1.tar.gz
Algorithm Hash digest
SHA256 aed56feeb6a1a97122315e424a1c114e93ab56bc18445f965c4e7b49a480a6dc
MD5 ac3f92c44695ca9b17cddefbeadc5acf
BLAKE2b-256 5ae1b467148a17a069c536a74948222cdc2d5aaa2fda191b436925df7715f6a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.10.1.tar.gz:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowfile-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: flowfile-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 6.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9375ddba103b3c0d956a866ed3a42f911bebf2f1ad053030508832799543bbab
MD5 831a16d0ebac1495c6d5b8b768655417
BLAKE2b-256 ccfc469c7ddee06eace425c782c473a93b3b0a645d597f328f6857b84f736738

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.10.1-py3-none-any.whl:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page