Skip to main content

Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)

Project description

Flowfile Logo
Flowfile

Main Repository: Edwardvaneechoud/Flowfile
Documentation: Website - Core - Worker - Frontend - Technical Architecture

Flowfile is a visual ETL tool and Python library suite that combines drag-and-drop workflow building with the speed of Polars dataframes. Build data pipelines visually, transform data using powerful nodes, or define data flows programmatically with Python and analyze results - all with high-performance data processing. Export your visual flows as standalone Python/Polars code for production deployment.

🚀 Getting Started

Installation

Install Flowfile directly from PyPI:

pip install Flowfile

Quick Start: Web UI

The easiest way to get started is by launching the web-based UI:

# Start the Flowfile web UI with integrated services
flowfile run ui

This will:

  • Start the combined core and worker services
  • Launch a web interface in your browser
  • Provide access to the full visual ETL capabilities

Options:

# Customize host
flowfile run ui --host 0.0.0.0

# Start without opening a browser
flowfile run ui --no-browser

You can also start the web UI programmatically:

import flowfile

# Start with default settings
flowfile.start_web_ui()

# Or customize
flowfile.start_web_ui(open_browser=False)

Using the FlowFrame API

Flowfile provides a Polars-like API for defining data pipelines programmatically:

import flowfile as ff
from flowfile import col, open_graph_in_editor

# Create a data pipeline
df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})

# Process the data
result = df.filter(col("value") > 150).with_columns([
    (col("value") * 2).alias("double_value")
])

# Open the graph in the web UI (starts the server if needed)
open_graph_in_editor(result.flow_graph)

📦 Package Components

The Flowfile PyPI package includes:

  • Core Service (flowfile_core): The main ETL engine using Polars
  • Worker Service (flowfile_worker): Handles computation-intensive tasks
  • Web UI: Browser-based visual ETL interface
  • FlowFrame API (flowfile_frame): Polars-like API for Python coding

✨ Key Features

Visual ETL with Web UI

  • No Installation Required: Launch directly from the pip package
  • Drag-and-Drop Interface: Build data pipelines visually
  • Integrated Services: Combined core and worker services
  • Browser-Based: Access from any device on your network
  • Code Generation: Export visual flows as Python/Polars scripts

FlowFrame API

  • Familiar Syntax: Polars-like API makes it easy to learn
  • ETL Graph Generation: Automatically builds visual workflows
  • Lazy Evaluation: Operations are not executed until needed
  • Interoperability: Move between code and visual interfaces

Data Operations

  • Data Cleaning & Transformation: Complex joins, filtering, etc.
  • High Performance: Built on Polars for efficient processing
  • Data Integration: Handle various file formats
  • ETL Pipeline Building: Create reusable workflows

🔄 Common FlowFrame Operations

import flowfile as ff
from flowfile import col, when, lit

# Read data
df = ff.from_dict({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"],
    "value": [100, 200, 150, 300, 250]
})
# df_parquet = ff.read_parquet("data.parquet")
# df_csv = ff.read_csv("data.csv")

other_df = ff.from_dict({
    "product_id": [1, 2, 3, 4, 6],
    "product_name": ["WidgetA", "WidgetB", "WidgetC", "WidgetD", "WidgetE"],
    "supplier": ["SupplierX", "SupplierY", "SupplierX", "SupplierZ", "SupplierY"]
}, flow_graph=df.flow_graph  # Assign the data to the same graph
)

# Filter
filtered = df.filter(col("value") > 150)

# Transform
result = df.select(
    col("id"),
    (col("value") * 2).alias("double_value")
)

# Conditional logic
with_status = df.with_columns([
    when(col("value") > 200).then(lit("High")).otherwise(lit("Low")).alias("status")
])

# Group and aggregate
by_category = df.group_by("category").agg([
    col("value").sum().alias("total"),
    col("value").mean().alias("average")
])

# Join data
joined = df.join(other_df, left_on="id", right_on="product_id")

joined.flow_graph.flow_settings.execution_location = "auto"
joined.flow_graph.flow_settings.execution_mode = "Development"
ff.open_graph_in_editor(joined.flow_graph)  # opens the graph in the UI!

📝 Code Generation

Export your visual flows as standalone Python/Polars code for production use:

Code Generation

Simply click the "Generate code" button in the visual editor to:

  • Generate clean, readable Python/Polars code
  • Export flows without Flowfile dependencies
  • Deploy workflows in any Python environment
  • Share ETL logic with team members

🧰 Command-Line Interface

# Show help and version info
flowfile

# Start the web UI
flowfile run ui [options]

# Run individual services
flowfile run core --host 0.0.0.0 --port 63578
flowfile run worker --host 0.0.0.0 --port 63579

📚 Resources

🖥️ Full Application Options

For the complete visual ETL experience, you have additional options:

  • Desktop Application: Download from the main repository
  • Docker Setup: Run with Docker Compose
  • Manual Setup: For development environments

📋 Development Roadmap

See the main repository for the latest development roadmap and TODO list.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowfile-0.6.3.tar.gz (5.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowfile-0.6.3-py3-none-any.whl (5.4 MB view details)

Uploaded Python 3

File details

Details for the file flowfile-0.6.3.tar.gz.

File metadata

  • Download URL: flowfile-0.6.3.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flowfile-0.6.3.tar.gz
Algorithm Hash digest
SHA256 6806251200c602de9f6c5e2b5f4347d0e3990a92680d5c5c609ebbefeac538de
MD5 db8e02f901566782c8d27d6cb769b932
BLAKE2b-256 32ce03361c6039ff8ed79103d430dd34add290b06dc04426002bcb6d4239fde0

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.6.3.tar.gz:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flowfile-0.6.3-py3-none-any.whl.

File metadata

  • Download URL: flowfile-0.6.3-py3-none-any.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flowfile-0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 51900f3fc4d7690d9b5e7b4867b4ff2da19b05281d1c4e54ed878dbbf0f2a907
MD5 dbba4f46eb5a16a48389d38e5ab3287e
BLAKE2b-256 d7015ebafa0dc50af0b415b34433e7e1e5446b7a87cdde823d812ddd4114a763

See more details on using hashes here.

Provenance

The following attestation bundles were made for flowfile-0.6.3-py3-none-any.whl:

Publisher: pypi-release.yml on Edwardvaneechoud/Flowfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page