handuflow - Reliable data movement and evolution at scale

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3.12

Project description

HanduFlow

HanduFlow is an architecture-agnostic data movement and transformation framework designed to manage evolving data reliably across modern data platforms.

It provides a standardized way to ingest, transform, and evolve data across layers (for example, bronze → silver → gold), while supporting change data capture (CDC), SCD Type 2, schema enforcement, and automated lineage generation.

HanduFlow focuses on consistency, reusability, and production readiness, without locking users into a specific architecture or vendor.

Why HanduFlow?

Modern data platforms commonly struggle with:

Inconsistent CDC implementations
Repeated and fragile SCD logic
Hard-to-maintain transformation pipelines
Missing or incomplete data lineage

HanduFlow centralizes these concerns into a single, reusable framework, allowing teams to focus on business logic instead of rebuilding data plumbing for every pipeline.

Key Capabilities

Data Movement & Load Patterns

HanduFlow supports multiple ingestion and evolution strategies:

Full Load
Append Load
Incremental CDC
SCD Type 2

All load patterns follow a consistent, configurable execution model across datasets.

Architecture-Agnostic Design

HanduFlow works naturally with Medallion-style architectures, but it is not dependent on any specific architectural pattern.

It can be used with:

Bronze / Silver / Gold layers
Hub-and-spoke models
Custom layered designs
Single-layer analytical tables

Transformation Framework

Clear separation of ingestion, validation, transformation, and persistence
Reusable transformation logic
Declarative and programmatic execution styles

Schema & Data Quality Enforcement

Schema alignment and enforcement at ingestion
Built-in standard data quality checks
Support for custom, query-based validations
Pre-load and post-load validation stages

Lineage Generation

HanduFlow can generate feed-level lineage, including:

Source datasets
Intermediate transformations
Target tables

Lineage output can be exported for visualization and governance use cases.

Technology Stack

HanduFlow is designed for distributed, production-grade environments:

Apache Spark
Delta Lake
Cloud object storage (S3 / ADLS / GCS)
Databricks (tested environment)

About the Project

HanduFlow is created and maintained by Harsh Handoo, Data Engineer. Thats why the name "handuflow", pronounced "hundooh-flow"

The framework was built to standardize common data movement patterns, reduce boilerplate, and improve reliability in real-world Spark and Delta Lake workloads.

Installation

pip install handuflow

Requirements

Cluster Resources (Typical)

Workload	Minimum	Recommended
Local development	4 vCPU, 8 GB RAM	8 vCPU, 16 GB RAM
Small datasets (<10M rows)	2 executors × 4 GB	4 executors × 8 GB
Medium datasets (10–100M rows)	4 executors × 8 GB	8 executors × 16 GB
Large datasets (>100M rows)	8+ executors × 16 GB	Cluster-specific tuning

Recommended Production Setup

Linux-based Spark cluster
Spark FAIR scheduler enabled
Delta Lake tables on cloud object storage
Versioned releases via PyPI and GitHub

Supported Storage

Local filesystem (development only)
HDFS / ADLS / S3 / GCS (recommended)
DBFS (Databricks)

Operating Systems

Linux (recommended)
macOS
Windows (WSL recommended)

⚠️ Production deployments are strongly recommended on Linux-based systems.

Note: HanduFlow is currently tested on Databricks.

Usage

Prerequisites

Create a dedicated directory for HanduFlow configuration and metadata Example:
```
/handuflow_dir/
```

Configure config.ini

[DEFAULT]
outbound_directory_name=handuflow_outbound
log_directory_name=handuflow_logs
temp_log_location=/handuflow_dir/temp
file_hunt_path=/handuflow_dir/
log_retention_policy_in_days=7
max_concurrent_batches=4

[FILES]
master_spec_name=master_specs.xlsx

[LINEAGE_DIAGRAM]
BOX_WIDTH=4.4
BOX_HEIGHT=2.2
X_GAP=2.0
Y_GAP=2.5
ROOT_GAP=2.0

Master Specification

The master specification file (master_specs.xlsx) defines feeds and dependencies.

Required fields include:

feed_id
system_name
subsystem_name
category
data_flow_direction
residing_layer
feed_name
load_type
target_schema_name
target_table_name
parent_feed_id
is_active

Feed Specification (JSON)

Each feed defines schema, quality checks, and load behavior.

{
  "primary_key": "col1",
  "partition_keys": [],
  "vacuum_hours": 168,
  "source_table_name": "test.test",
  "selection_schema": {
    "type": "struct",
    "fields": [
      { "name": "col1", "type": "string", "nullable": true },
      { "name": "col2", "type": "string", "nullable": true }
    ]
  },
  "standard_checks": [
    {
      "check_sequence": ["_check_primary_key"],
      "column_name": "col1",
      "threshold": 0
    }
  ]
}

Spark Configuration (FAIR Scheduler)

spark.scheduler.mode FAIR

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder
        .appName("HanduFlow")
        .config("spark.scheduler.mode", "FAIR")
        .getOrCreate()
)

Execution

import configparser
from handuflow import Orchestrator

cfg = configparser.ConfigParser()
cfg.read("/handuflow_dir/config.ini")

orchestrator = Orchestrator(spark, config=cfg)
orchestrator.run()

Logging

Logs are written to the directory defined in config.ini
Log retention and rotation are configurable
Execution-level and feed-level logs are supported

License

Apache License (Version 2.0, January 2004) http://www.apache.org/licenses/

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3.12

Release history Release notifications | RSS feed

0.1.21

Feb 8, 2026

0.1.20

Feb 8, 2026

This version

0.1.19

Feb 8, 2026

0.1.18

Feb 8, 2026

0.1.17

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

handuflow-0.1.19.tar.gz (48.1 kB view details)

Uploaded Feb 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

handuflow-0.1.19-py3-none-any.whl (62.9 kB view details)

Uploaded Feb 8, 2026 Python 3

File details

Details for the file handuflow-0.1.19.tar.gz.

File metadata

Download URL: handuflow-0.1.19.tar.gz
Upload date: Feb 8, 2026
Size: 48.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for handuflow-0.1.19.tar.gz
Algorithm	Hash digest
SHA256	`1f2f79dd4c4c2e8edde2f0b4970dfa65dbd2288f7a140bbd30ec334412767c66`
MD5	`932fb1a04ab98e8e9203551b1a7dbbc1`
BLAKE2b-256	`aacdc956fe9b97ba30b2e7cf4866ada839673eade6d95a2960666f76fd85c618`

See more details on using hashes here.

File details

Details for the file handuflow-0.1.19-py3-none-any.whl.

File metadata

Download URL: handuflow-0.1.19-py3-none-any.whl
Upload date: Feb 8, 2026
Size: 62.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for handuflow-0.1.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6ee25f585c6f449c425279aa0a36cb9161869ed8c22d5a0cae7b7c4abcf35a87`
MD5	`7a134cc336e9a004d8ee6fb5ea7ce51a`
BLAKE2b-256	`b997f75d9a2b18e0532e384482dc650141b41f4f8146201a9c12ceb98ea4aa06`

See more details on using hashes here.

handuflow 0.1.19

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HanduFlow

Why HanduFlow?

Key Capabilities

Data Movement & Load Patterns

Architecture-Agnostic Design

Transformation Framework

Schema & Data Quality Enforcement

Lineage Generation

Technology Stack

About the Project

Installation

Requirements

Cluster Resources (Typical)

Recommended Production Setup

Supported Storage

Operating Systems

Usage

Prerequisites

Master Specification

Feed Specification (JSON)

Spark Configuration (FAIR Scheduler)

Execution

Logging

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes