Skip to main content

Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)

Project description

Flowfile Logo
Flowfile

Main Repository: Edwardvaneechoud/Flowfile
Documentation: Website - Core - Worker - Frontend - Technical Architecture

Flowfile is an open-source data platform that combines a visual pipeline builder, a data catalog with Delta Lake storage, scheduling, Kafka ingestion, sandboxed Python execution, and a Polars-compatible Python API — all in a single pip install.

Quick Start

pip install Flowfile
flowfile run ui

This starts the backend services and opens the visual ETL interface in your browser.

What You Get

  • Visual pipeline builder with 30+ nodes for joins, filters, aggregations, fuzzy matching, pivots, and more
  • Data catalog with Delta Lake storage, version history, and lineage tracking
  • Scheduling — interval-based or triggered by catalog table updates
  • Kafka/Redpanda ingestion as a canvas node with automatic schema inference
  • Sandboxed Python execution in isolated Docker containers
  • Code generation — export visual flows as standalone Python/Polars scripts
  • Flow parameters${variable} substitution, configurable via UI or CLI
  • Cloud storage — S3, Azure Data Lake Storage, Google Cloud Storage
  • Database connectivity — PostgreSQL, MySQL, SQL Server, Oracle, DuckDB, and more
  • Python API with Polars-like syntax and visual flow graph generation

Python API

import flowfile as ff
from flowfile import col, open_graph_in_editor

df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the pipeline on the visual canvas
open_graph_in_editor(result.flow_graph)

Common Operations

import flowfile as ff
from flowfile import col, when, lit

# Read from various sources
df = ff.read_csv("data.csv")
df_pq = ff.read_parquet("data.parquet")

# Transform
filtered = df.filter(col("value") > 150)
with_status = df.with_columns([
    when(col("value") > 200).then(lit("High")).otherwise(lit("Low")).alias("status")
])

# Aggregate
by_category = df.group_by("category").agg([
    col("value").sum().alias("total"),
    col("value").mean().alias("average")
])

# Join
joined = df.join(other_df, left_on="id", right_on="product_id")

# Visualize any pipeline
ff.open_graph_in_editor(joined.flow_graph)

Code Generation

Export visual flows as standalone Python/Polars scripts:

Code Generation

Package Components

  • Core Service (flowfile_core) — ETL engine, catalog, scheduler, auth
  • Worker Service (flowfile_worker) — CPU-intensive data processing
  • Web UI — Browser-based visual pipeline builder
  • FlowFrame API (flowfile_frame) — Polars-compatible Python library
  • Scheduler (flowfile_scheduler) — Interval and table-trigger scheduling

CLI

flowfile run ui                              # Start web UI
flowfile run core --host 0.0.0.0             # Start core service
flowfile run worker --host 0.0.0.0           # Start worker service
flowfile run flow pipeline.json              # Run a flow
flowfile run flow pipeline.json --param key=value  # Run with parameters

More Options

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowfile-0.9.1.tar.gz (5.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowfile-0.9.1-py3-none-any.whl (5.8 MB view details)

Uploaded Python 3

File details

Details for the file flowfile-0.9.1.tar.gz.

File metadata

  • Download URL: flowfile-0.9.1.tar.gz
  • Upload date:
  • Size: 5.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.9.1.tar.gz
Algorithm Hash digest
SHA256 38ac6f26c77ed892f927011626353c80c0c9c413a6fae54551a120abff949f07
MD5 5f529e2d9cad3921630cd4987db2c768
BLAKE2b-256 2686d1a111b6cf4d2b4cfa3a7501fceaa1aabdabd96d7212e5099d7bd6838783

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.9.1.tar.gz:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowfile-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: flowfile-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 5.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flowfile-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9ba0814ec0956f7e07f7c4834f77d10fac08485a1e453f43799eb5798f5d3c00
MD5 3014bf75619f8c7cb8d23d27d09df02b
BLAKE2b-256 eb2cb1d9e1526d6ff193ea19e72f90be32545c51f6e145dce30da2f0f73bf21d

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.9.1-py3-none-any.whl:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page