Skip to main content

Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)

Project description

Flowfile Logo
Flowfile

Main Repository: Edwardvaneechoud/Flowfile
Documentation: Website - Core - Worker - Frontend - Technical Architecture

Flowfile is a visual ETL tool and Python library suite that combines drag-and-drop workflow building with the speed of Polars dataframes. Build data pipelines visually, transform data using powerful nodes, or define data flows programmatically with Python and analyze results - all with high-performance data processing. Export your visual flows as standalone Python/Polars code for production deployment.

🚀 Getting Started

Installation

Install Flowfile directly from PyPI:

pip install Flowfile

Quick Start: Web UI

The easiest way to get started is by launching the web-based UI:

# Start the Flowfile web UI with integrated services
flowfile run ui

This will:

  • Start the combined core and worker services
  • Launch a web interface in your browser
  • Provide access to the full visual ETL capabilities

Options:

# Customize host
flowfile run ui --host 0.0.0.0

# Start without opening a browser
flowfile run ui --no-browser

You can also start the web UI programmatically:

import flowfile

# Start with default settings
flowfile.start_web_ui()

# Or customize
flowfile.start_web_ui(open_browser=False)

Using the FlowFrame API

Flowfile provides a Polars-like API for defining data pipelines programmatically:

import flowfile as ff
from flowfile import col, open_graph_in_editor

# Create a data pipeline
df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

# Process the data
result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the graph in the web UI (starts the server if needed)
open_graph_in_editor(result.flow_graph)

📦 Package Components

The Flowfile PyPI package includes:

  • Core Service (flowfile_core): The main ETL engine using Polars
  • Worker Service (flowfile_worker): Handles computation-intensive tasks
  • Web UI: Browser-based visual ETL interface
  • FlowFrame API (flowfile_frame): Polars-like API for Python coding

✨ Key Features

Visual ETL with Web UI

  • No Installation Required: Launch directly from the pip package
  • Drag-and-Drop Interface: Build data pipelines visually
  • Integrated Services: Combined core and worker services
  • Browser-Based: Access from any device on your network
  • Code Generation: Export visual flows as Python/Polars scripts

FlowFrame API

  • Familiar Syntax: Polars-like API makes it easy to learn
  • ETL Graph Generation: Automatically builds visual workflows
  • Lazy Evaluation: Operations are not executed until needed
  • Interoperability: Move between code and visual interfaces

Data Operations

  • Data Cleaning & Transformation: Complex joins, filtering, etc.
  • High Performance: Built on Polars for efficient processing
  • Data Integration: Handle various file formats
  • ETL Pipeline Building: Create reusable workflows

🔄 Common FlowFrame Operations

import flowfile as ff
from flowfile import col, when, lit

# Read data
df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})
# df_parquet = ff.read_parquet("data.parquet")
# df_csv = ff.read_csv("data.csv")

other_df = ff.from_dict({
    "product_id": [1, 2, 3, 4, 6],
    "product_name": ["WidgetA", "WidgetB", "WidgetC", "WidgetD", "WidgetE"],
    "supplier": ["SupplierX", "SupplierY", "SupplierX", "SupplierZ", "SupplierY"]
}, flow_graph=df.flow_graph  # Assign the data to the same graph
)

# Filter
filtered = df.filter(col("value") > 150)

# Transform
result = df.select(
    col("id"),
    (col("value") * 2).alias("double_value")
)

# Conditional logic
with_status = df.with_columns([
    when(col("value") > 200).then(lit("High")).otherwise(lit("Low")).alias("status")
])

# Group and aggregate
by_category = df.group_by("category").agg([
    col("value").sum().alias("total"),
    col("value").mean().alias("average")
])

# Join data
joined = df.join(other_df, left_on="id", right_on="product_id")

joined.flow_graph.flow_settings.execution_location = "auto"
joined.flow_graph.flow_settings.execution_mode = "Development"
ff.open_graph_in_editor(joined.flow_graph)  # opens the graph in the UI!

📝 Code Generation

Export your visual flows as standalone Python/Polars code for production use:

Code Generation

Simply click the "Generate code" button in the visual editor to:

  • Generate clean, readable Python/Polars code
  • Export flows without Flowfile dependencies
  • Deploy workflows in any Python environment
  • Share ETL logic with team members

🧰 Command-Line Interface

# Show help and version info
flowfile

# Start the web UI
flowfile run ui [options]

# Run individual services
flowfile run core --host 0.0.0.0 --port 63578
flowfile run worker --host 0.0.0.0 --port 63579

📚 Resources

🖥️ Full Application Options

For the complete visual ETL experience, you have additional options:

  • Desktop Application: Download from the main repository
  • Docker Setup: Run with Docker Compose
  • Manual Setup: For development environments

📋 Development Roadmap

See the main repository for the latest development roadmap and TODO list.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowfile-0.5.1.tar.gz (4.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowfile-0.5.1-py3-none-any.whl (5.0 MB view details)

Uploaded Python 3

File details

Details for the file flowfile-0.5.1.tar.gz.

File metadata

  • Download URL: flowfile-0.5.1.tar.gz
  • Upload date:
  • Size: 4.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flowfile-0.5.1.tar.gz
Algorithm Hash digest
SHA256 7e8435047cf9a96b9003cad5515231263d2538e19d1c55666b1b3411354cfd2d
MD5 eeab98eea02d653734a39cec62225648
BLAKE2b-256 07461fbf91c7777c6dc7270505081627b969c306592f84f5e6a35f4bdaffafff

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.5.1.tar.gz:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowfile-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: flowfile-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 5.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flowfile-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5eec1445ddeb22033a53ff5c589198ba8267b5120f75aef5d7b1e8bbadb872db
MD5 f5c6a1654413f665072a9185013547ff
BLAKE2b-256 8ca9c162a297bea10f452df0fa2b4cea2ec8638b7a721f51a55a2b6d72a3d8a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.5.1-py3-none-any.whl:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page