Skip to main content

Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)

Project description

Flowfile Logo
Flowfile

Main Repository: Edwardvaneechoud/Flowfile
Documentation: Website - Core - Worker - Frontend - Technical Architecture

Flowfile is an open-source data platform that combines a visual pipeline builder, a data catalog with Delta Lake storage, scheduling, Kafka ingestion, sandboxed Python execution, and a Polars-compatible Python API — all in a single pip install.

Quick Start

pip install Flowfile
flowfile run ui

This starts the backend services and opens the visual ETL interface in your browser.

What You Get

  • Visual pipeline builder with 30+ nodes for joins, filters, aggregations, fuzzy matching, pivots, and more
  • Data catalog with Delta Lake storage, version history, and lineage tracking
  • Scheduling — interval-based or triggered by catalog table updates
  • Kafka/Redpanda ingestion as a canvas node with automatic schema inference
  • Sandboxed Python execution in isolated Docker containers
  • Code generation — export visual flows as standalone Python/Polars scripts
  • Flow parameters${variable} substitution, configurable via UI or CLI
  • Cloud storage — S3, Azure Data Lake Storage, Google Cloud Storage
  • Database connectivity — PostgreSQL, MySQL, SQL Server, Oracle, DuckDB, and more
  • Python API with Polars-like syntax and visual flow graph generation

Python API

import flowfile as ff
from flowfile import col, open_graph_in_editor

df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the pipeline on the visual canvas
open_graph_in_editor(result.flow_graph)

Common Operations

import flowfile as ff
from flowfile import col, when, lit

# Read from various sources
df = ff.read_csv("data.csv")
df_pq = ff.read_parquet("data.parquet")

# Transform
filtered = df.filter(col("value") > 150)
with_status = df.with_columns([
    when(col("value") > 200).then(lit("High")).otherwise(lit("Low")).alias("status")
])

# Aggregate
by_category = df.group_by("category").agg([
    col("value").sum().alias("total"),
    col("value").mean().alias("average")
])

# Join
joined = df.join(other_df, left_on="id", right_on="product_id")

# Visualize any pipeline
ff.open_graph_in_editor(joined.flow_graph)

Code Generation

Export visual flows as standalone Python/Polars scripts:

Code Generation

Package Components

  • Core Service (flowfile_core) — ETL engine, catalog, scheduler, auth
  • Worker Service (flowfile_worker) — CPU-intensive data processing
  • Web UI — Browser-based visual pipeline builder
  • FlowFrame API (flowfile_frame) — Polars-compatible Python library
  • Scheduler (flowfile_scheduler) — Interval and table-trigger scheduling

CLI

flowfile run ui                              # Start web UI
flowfile run core --host 0.0.0.0             # Start core service
flowfile run worker --host 0.0.0.0           # Start worker service
flowfile run flow pipeline.json              # Run a flow
flowfile run flow pipeline.json --param key=value  # Run with parameters

More Options

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowfile-0.8.2.tar.gz (5.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowfile-0.8.2-py3-none-any.whl (5.7 MB view details)

Uploaded Python 3

File details

Details for the file flowfile-0.8.2.tar.gz.

File metadata

  • Download URL: flowfile-0.8.2.tar.gz
  • Upload date:
  • Size: 5.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.8.2.tar.gz
Algorithm Hash digest
SHA256 93a677d1ed3c6a72277c26bd055eb628a6676c38830df128c6da5640e03555d6
MD5 f056c088f614ca198faac408288727cc
BLAKE2b-256 244c134c5adf2799fd6cd8ff4e7725f7871d83f05810b4179b4bcacbd0f57e3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.8.2.tar.gz:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowfile-0.8.2-py3-none-any.whl.

File metadata

  • Download URL: flowfile-0.8.2-py3-none-any.whl
  • Upload date:
  • Size: 5.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 815f5614cc203c7c6c32ff8696831301be55f6858febc7b9e4db0293d137d849
MD5 be2ce646545b8b786e7f3e7884447276
BLAKE2b-256 e52faedda51ed7e2a20d9b9073ce434b37772a608e414fb7bd97654a6f77093e

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.8.2-py3-none-any.whl:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page