A horizontally scalable data movement and transformation framework

These details have not been verified by PyPI

Project description

Reflowfy

A horizontally scalable data movement and transformation framework

Reflowfy enables you to build pipelines that fetch data from sources, apply custom transformations, and send results to destinations—all with millions+ record scalability.

🎯 Key Features

Horizontally Scalable: Process millions of records in parallel
Kafka-Based: Reliable message queue for job distribution
Kubernetes-Native: KEDA autoscaling from 0 to N workers
Order-Independent: Maximum parallelism without coordination overhead
User-Extensible: Plugin architecture for sources, destinations, and transformations
Two Execution Modes: Local testing and distributed production execution

🏗 Architecture

User Request
    ↓ HTTP
API (FastAPI) ────→ ReflowManager Service (port 8001)
    │                    ↓
    │                PostgreSQL (state + checkpoints)
    │                    ↑                   ↓
    │                    │               Kafka Producer (rate limited) → Kafka Topic (reflow.jobs)
    │                    │                   ↓
    └─→ Execution Tracking               Worker Pool (KEDA scaled)
                                             ↓
                                        Destinations

Components:

ReflowManager: Orchestrates jobs, enforces rate limits, and tracks state.
PostgreSQL: Central source of truth for execution state and checkpoints.
Kafka: Reliable job queue for load balancing.
Workers: Consumers that process jobs and report status directly to PostgreSQL.
KEDA: Autoscales workers based on Kafka lag.

🚀 Quick Start

Get up and running in minutes using the CLI.

1. Install

pip install reflowfy

2. Initialize Project

Create a new project directory with a sample pipeline and Docker configuration:

reflowfy init my_project
cd my_project

3. Run Locally

Start the full stack (API, Manager, Worker, Kafka, Postgres) locally using Docker Compose:

# Verify everything builds
reflowfy run --build

# Run in background
reflowfy run -d

4. Deploy

Deploy to OpenShift/Kubernetes with a single command:

reflowfy deploy

🧠 Core Concepts

Reflowfy uses a class-based architecture for pipelines, allowing for dynamic configuration and modular design.

1. Create Custom Transformations

Transformations are reusable units of logic that process batches of records. To create one, subclass BaseTransformation:

from reflowfy import BaseTransformation

class XmlToJson(BaseTransformation):
    name = "xml_to_json"  # Unique identifier
    
    def apply(self, records, context):
        """
        Process a batch of records.
        Records are passed as a list of dictionaries (or source-specific format).
        Return the modified list.
        """
        return [self._parse_xml(r) for r in records]

    def _parse_xml(self, record):
        # ... logic ...
        return record

2. Build a Pipeline

Pipelines connect sources, transformations, and destinations. Subclass AbstractPipeline to define your logic:

from reflowfy import AbstractPipeline, pipeline_registry
from reflowfy import elastic_source, kafka_destination
from .transformations import XmlToJson

class ElasticXmlPipeline(AbstractPipeline):
    name = "elastic_xml_pipeline"
    rate_limit = {"jobs_per_second": 50}

    def define_source(self, params):
        """
        Define source based on runtime parameters.
        Parameters allow you to change behavior at runtime (e.g., time ranges).
        """
        return elastic_source(
            url="http://elasticsearch:9200",
            index="logs-*",
            base_query={
                "query": {
                    "range": {
                        "@timestamp": {
                            "gte": "{{ start_time }}",  # Jinja template support
                            "lte": "{{ end_time }}"
                        }
                    }
                }
            }
        )

    def define_transformations(self, params):
        """List of transformations to apply in order."""
        return [XmlToJson()]

    def define_destination(self, params):
        """Define where data should go."""
        return kafka_destination(
            bootstrap_servers="kafka:9092",
            topic="processed-logs"
        )

# Register the pipeline so the worker and API can find it
pipeline_registry.register(ElasticXmlPipeline())

3. Run Pipeline

You can run your pipeline locally or in production via the API:

# Production Execution (Async via Kafka)
curl -X POST http://localhost:8001/run \
  -H "Content-Type: application/json" \
  -d '{
    "pipeline_name": "elastic_xml_pipeline",
    "runtime_params": {
      "start_time": "2024-01-01",
      "end_time": "2024-01-02"
    }
  }'

# Dry Run (Preview without side effects)
curl -X POST http://localhost:8001/run ... -d '{..., "dry_run": true}'

📦 Installation

# Using pip
pip install -e .

# Using Docker
docker build -f Dockerfile.api -t reflowfy-api .
docker build -f Dockerfile.worker -t reflowfy-worker .

🔌 Built-in Connectors

Sources

Elasticsearch: Scroll-based pagination with runtime parameters
SQL: ID range and offset-based splitting (Postgres, MySQL, etc.)
HTTP API: Offset/cursor pagination with authentication

Destinations

Kafka: Batching, compression, health checks
HTTP: Webhooks with retry logic

⚙️ Configuration

Environment Variables

API:

API_HOST=0.0.0.0
API_PORT=8000
KAFKA_BOOTSTRAP_SERVERS=kafka:9092
KAFKA_TOPIC=reflow.jobs

Worker:

KAFKA_BOOTSTRAP_SERVERS=kafka:9092
KAFKA_TOPIC=reflow.jobs
KAFKA_GROUP_ID=reflowfy-workers

Mode	Endpoint	Use Case	Kafka	Workers
Distributed	`POST /run`	Production execution	✅	✅
Dry Run	`POST /run` (dry_run=true)	Preview/Testing	❌	❌

🐳 Kubernetes Deployment

Reflowfy streamlines deployment to Kubernetes/OpenShift using the CLI and Helm.

Deployment Concept

The deployment process uses your local configuration to configure the cluster.

Configuration: The .env file in your project root is the source of truth. It defines connection strings (Kafka, Registry, DB).
CLI: The reflowfy deploy command reads this config and triggers a Helm upgrade.

Deployed Objects

When you run reflowfy deploy, the following objects are created in your namespace:

ReflowAPI (Deployment + Service): The entry point for triggering pipeline runs.
ReflowManager (Deployment + Service): Orchestrates job distribution and manages state.
ReflowWorker (Deployment + KEDA ScaledObject): The worker pool that processes jobs. KEDA automatically scales this deployment based on Kafka lag (0 to N replicas).
PostgreSQL (Optional): If DEPLOY_POSTGRES=True, a dedicated Postgres instance is deployed. Otherwise, the system connects to your external DB.

How to Deploy

Configure environment: Ensure your .env file has the correct registry and Kafka settings.
```
REGISTRY=my.registry.com
dataset=my-project
KAFKA_BOOTSTRAP_SERVERS=my-kafka:9092
```
Run Deploy:
```
reflowfy deploy
```
This will build/push images (if requested), generate the Helm values from your .env, and apply the chart.

📝 License

MIT

🤝 Contributing

Contributions welcome! This is a production-grade framework designed for real-world data processing at scale.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

0.38

Apr 27, 2026

0.37

Apr 20, 2026

0.36

Apr 10, 2026

0.35

Apr 6, 2026

0.34

Mar 11, 2026

0.33

Mar 11, 2026

0.32

Mar 10, 2026

0.31

Mar 10, 2026

0.30

Mar 10, 2026

0.29

Mar 10, 2026

0.28

Mar 10, 2026

0.26

Feb 12, 2026

0.25

Feb 11, 2026

0.24

Feb 11, 2026

0.23

Feb 11, 2026

This version

0.22

Feb 11, 2026

0.21

Feb 11, 2026

0.2

Feb 10, 2026

0.1.34

Jan 28, 2026

0.1.33

Jan 28, 2026

0.1.29

Jan 27, 2026

0.1.28

Jan 27, 2026

0.1.27

Jan 27, 2026

0.1.26

Jan 27, 2026

0.1.25

Jan 27, 2026

0.1.24

Jan 27, 2026

0.1.23

Jan 27, 2026

0.1.22

Jan 27, 2026

0.1.21

Jan 26, 2026

0.1.20

Jan 26, 2026

0.1.19

Jan 26, 2026

0.1.18

Jan 26, 2026

0.1.17

Jan 26, 2026

0.1.16

Jan 26, 2026

0.1.15

Jan 26, 2026

0.1.14

Jan 26, 2026

0.1.13

Jan 26, 2026

0.1.12

Jan 26, 2026

0.1.11

Jan 25, 2026

0.1.10

Jan 25, 2026

0.1.9

Jan 25, 2026

0.1.7

Jan 25, 2026

0.1.6

Jan 25, 2026

0.1.5

Jan 25, 2026

0.1.4

Jan 25, 2026

0.1.3

Jan 25, 2026

0.1.2

Jan 25, 2026

0.1.1

Jan 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reflowfy-0.22.tar.gz (187.9 kB view details)

Uploaded Feb 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reflowfy-0.22-py3-none-any.whl (219.1 kB view details)

Uploaded Feb 11, 2026 Python 3

File details

Details for the file reflowfy-0.22.tar.gz.

File metadata

Download URL: reflowfy-0.22.tar.gz
Upload date: Feb 11, 2026
Size: 187.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for reflowfy-0.22.tar.gz
Algorithm	Hash digest
SHA256	`834589c3697c562235347a83e5b91c8e739d45412f4763af1686cce19a12e879`
MD5	`452e17f3170179c25130fe69d7316511`
BLAKE2b-256	`f94e82ed9db279caf6f2195549b8642811c19fd9ffe26e629000c35d44b86d46`

See more details on using hashes here.

File details

Details for the file reflowfy-0.22-py3-none-any.whl.

File metadata

Download URL: reflowfy-0.22-py3-none-any.whl
Upload date: Feb 11, 2026
Size: 219.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for reflowfy-0.22-py3-none-any.whl
Algorithm	Hash digest
SHA256	`69107190c741eb5bc26b54c3f46d00d776e2423d62f1b9e5674beede38e6db89`
MD5	`7fef6294bd9b9f776e690ccb17698fe9`
BLAKE2b-256	`4f1ccd90846d51f7f2dc203b9f293d469a663c843d26810f5e352a3f9b0b6c91`

See more details on using hashes here.

reflowfy 0.22

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Reflowfy

🎯 Key Features

🏗 Architecture

🚀 Quick Start

1. Install

2. Initialize Project

3. Run Locally

4. Deploy

🧠 Core Concepts

1. Create Custom Transformations

2. Build a Pipeline

3. Run Pipeline

📦 Installation

🔌 Built-in Connectors

Sources

Destinations

⚙️ Configuration

Environment Variables

🐳 Kubernetes Deployment

Deployment Concept

Deployed Objects

How to Deploy

📝 License

🤝 Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes