A horizontally scalable data movement and transformation framework

These details have not been verified by PyPI

Project description

Reflowfy

A horizontally scalable data movement and transformation framework

Reflowfy enables you to build pipelines that fetch data from sources, apply custom transformations, and send results to destinations—all with millions+ record scalability.

🎯 Key Features

Horizontally Scalable: Process millions of records in parallel
Kafka-Based: Reliable message queue for job distribution
Kubernetes-Native: KEDA autoscaling from 0 to N workers
Order-Independent: Maximum parallelism without coordination overhead
User-Extensible: Plugin architecture for sources, destinations, and transformations
Two Execution Modes: Local testing and distributed production execution

🏗 Architecture

User Request
    ↓ HTTP
API (FastAPI) ────→ ReflowManager Service (port 8001)
    │                    ↓
    │                PostgreSQL (state + checkpoints)
    │                    ↓
    │                Kafka Producer (rate limited) → Kafka Topic (reflow.jobs)
    │                    ↓
    │                Worker Pool (KEDA scaled)
    │                    ↓
    └─→ Execution Tracking  Destinations

Components:

API: Orchestration, job splitting, route generation
ReflowManager: Rate limiting, state management, checkpointing
PostgreSQL: Persistent state storage for executions and checkpoints
Kafka: Job queue and load balancing
Workers: Generic executors that process jobs
KEDA: Kafka lag-based autoscaling

🚀 Quick Start

1. Define a Custom Transformation

from reflowfy import BaseTransformation

class XmlToJson(BaseTransformation):
    name = "xml_to_json"
    
    def apply(self, records, context):
        # Your transformation logic
        return [parse_xml(r) for r in records]

2. Build a Pipeline

from reflowfy import build_pipeline, pipeline_registry
from reflowfy import elastic_source, kafka_destination

# Configure source with runtime parameters
source = elastic_source(
    url="http://elasticsearch:9200",
    index="logs-*",
    base_query={
        "query": {
            "range": {
                "@timestamp": {
                    "gte": "{{ start_time }}",  # Runtime parameter
                    "lte": "{{ end_time }}"
                }
            }
        }
    }
)

# Configure destination
destination = kafka_destination(
    bootstrap_servers="kafka:9092",
    topic="processed-logs"
)

# Build and register
pipeline = build_pipeline(
    name="elastic_xml_pipeline",
    source=source,
    transformations=[XmlToJson()],
    destination=destination,
    rate_limit={"jobs_per_second": 50}
)

pipeline_registry.register(pipeline)

3. Start the API

# In your main.py
from reflowfy.api.app import main
import examples.xml_to_json_pipeline  # Import to trigger registration

if __name__ == "__main__":
    main()

4. Execute Pipeline

Run Distributed (async via Kafka):

curl -X POST http://localhost:8001/run \
  -H "Content-Type: application/json" \
  -d '{
    "pipeline_name": "elastic_xml_pipeline",
    "runtime_params": {
      "start_time": "2024-01-01",
      "end_time": "2024-01-02"
    }
  }'

Dry Run (Preview jobs without executing):

curl -X POST http://localhost:8001/run \
  -H "Content-Type: application/json" \
  -d '{
    "pipeline_name": "elastic_xml_pipeline",
    "runtime_params": {
      "start_time": "2024-01-01",
      "end_time": "2024-01-02"
    },
    "dry_run": true
  }'

Returns a preview of the job execution plan, sample records, and configuration.

📦 Installation

# Using pip
pip install -e .

# Using Docker
docker build -f Dockerfile.api -t reflowfy-api .
docker build -f Dockerfile.worker -t reflowfy-worker .

🔌 Built-in Connectors

Sources

Elasticsearch: Scroll-based pagination with runtime parameters
SQL: ID range and offset-based splitting (Postgres, MySQL, etc.)
HTTP API: Offset/cursor pagination with authentication

Destinations

Kafka: Batching, compression, health checks
HTTP: Webhooks with retry logic

⚙️ Configuration

Environment Variables

API:

API_HOST=0.0.0.0
API_PORT=8000
KAFKA_BOOTSTRAP_SERVERS=kafka:9092
KAFKA_TOPIC=reflow.jobs

Worker:

KAFKA_BOOTSTRAP_SERVERS=kafka:9092
KAFKA_TOPIC=reflow.jobs
KAFKA_GROUP_ID=reflowfy-workers

Mode	Endpoint	Use Case	Kafka	Workers
Distributed	`POST /run`	Production execution	✅	✅
Dry Run	`POST /run` (dry_run=true)	Preview/Testing	❌	❌

📊 Monitoring

Reflowfy exposes Prometheus metrics:

reflowfy_jobs_processed_total - Total jobs processed
reflowfy_job_processing_duration_seconds - Job processing time
reflowfy_records_processed_total - Total records processed
reflowfy_active_workers - Active worker count

🐳 Kubernetes Deployment

# Deploy with Helm
helm install reflowfy-api ./helm/reflowfy-api
helm install reflowfy-worker ./helm/reflowfy-worker

KEDA will automatically scale workers based on Kafka lag.

📝 License

MIT

🤝 Contributing

Contributions welcome! This is a production-grade framework designed for real-world data processing at scale.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

0.38

Apr 27, 2026

0.37

Apr 20, 2026

0.36

Apr 10, 2026

0.35

Apr 6, 2026

0.34

Mar 11, 2026

0.33

Mar 11, 2026

0.32

Mar 10, 2026

0.31

Mar 10, 2026

0.30

Mar 10, 2026

0.29

Mar 10, 2026

0.28

Mar 10, 2026

0.26

Feb 12, 2026

0.25

Feb 11, 2026

0.24

Feb 11, 2026

0.23

Feb 11, 2026

0.22

Feb 11, 2026

0.21

Feb 11, 2026

0.2

Feb 10, 2026

0.1.34

Jan 28, 2026

0.1.33

Jan 28, 2026

0.1.29

Jan 27, 2026

0.1.28

Jan 27, 2026

0.1.27

Jan 27, 2026

0.1.26

Jan 27, 2026

0.1.25

Jan 27, 2026

0.1.24

Jan 27, 2026

0.1.23

Jan 27, 2026

0.1.22

Jan 27, 2026

0.1.21

Jan 26, 2026

0.1.20

Jan 26, 2026

0.1.19

Jan 26, 2026

0.1.18

Jan 26, 2026

0.1.17

Jan 26, 2026

0.1.16

Jan 26, 2026

0.1.15

Jan 26, 2026

0.1.14

Jan 26, 2026

0.1.13

Jan 26, 2026

0.1.12

Jan 26, 2026

0.1.11

Jan 25, 2026

0.1.10

Jan 25, 2026

0.1.9

Jan 25, 2026

0.1.7

Jan 25, 2026

0.1.6

Jan 25, 2026

0.1.5

Jan 25, 2026

0.1.4

Jan 25, 2026

0.1.3

Jan 25, 2026

0.1.2

Jan 25, 2026

This version

0.1.1

Jan 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reflowfy-0.1.1.tar.gz (280.1 kB view details)

Uploaded Jan 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reflowfy-0.1.1-py3-none-any.whl (308.7 kB view details)

Uploaded Jan 25, 2026 Python 3

File details

Details for the file reflowfy-0.1.1.tar.gz.

File metadata

Download URL: reflowfy-0.1.1.tar.gz
Upload date: Jan 25, 2026
Size: 280.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for reflowfy-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`8fcdff144ff2db2bc186c00f15b05b58792ac5ae3caeca782f828f08bc60b73e`
MD5	`ddffa574c1ff437115e9c0ad6b8522db`
BLAKE2b-256	`b88dfbffb1aa717d695188ee6b505345dad1a5a386286a53a4a934507ec433a7`

See more details on using hashes here.

File details

Details for the file reflowfy-0.1.1-py3-none-any.whl.

File metadata

Download URL: reflowfy-0.1.1-py3-none-any.whl
Upload date: Jan 25, 2026
Size: 308.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for reflowfy-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cd5e20c8fe8353299377f19732537ced23057750d886db4a70d1bc0e9a89466`
MD5	`16f25182329bc17986f99baa5eec9ae3`
BLAKE2b-256	`2659d651b46ff73699a1f1bbd882a1bfe0d7a0b8e517c87cec2dbab8237f959e`

See more details on using hashes here.

reflowfy 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Reflowfy

🎯 Key Features

🏗 Architecture

🚀 Quick Start

1. Define a Custom Transformation

2. Build a Pipeline

3. Start the API

4. Execute Pipeline

📦 Installation

🔌 Built-in Connectors

Sources

Destinations

⚙️ Configuration

Environment Variables

📊 Monitoring

🐳 Kubernetes Deployment

📝 License

🤝 Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes