Data pipeline framework for Slurm-managed HPC clusters

These details have not been verified by PyPI

Project links

Project description

TigerFlow

tigerflow-run-screenshot

TigerFlow is a Python framework that simplifies the creation and execution of data pipelines on Slurm-managed HPC clusters. It supports data pipelines where:

Each task performs embarrassingly parallel file processing. That is, files are processed independently of one another.
The task dependency graph forms a rooted tree. That is, there is a single root task, and every other task has exactly one parent.

Designed as a continuously running service with dynamic scaling, TigerFlow minimizes the need for users to manually plan and allocate resources in advance.

Why TigerFlow Matters

HPC clusters are an invaluable asset for researchers who require significant computational resources. For example, computational social scientists may need to extract features (e.g., transcription embeddings) from a large volume of TikTok videos and store them in databases for downstream analysis and modeling. However, the architecture of HPC clusters can present challenges for such workflows:

Compute nodes often lack internet access. This prevents direct access to external APIs (e.g., LLM services provided by Google) or remote data sources (e.g., Amazon S3), requiring such tasks to be executed on a login or head node instead.
Compute nodes often have restricted access to file systems. Certain file systems (e.g., cold storage) may not be mounted on compute nodes. This necessitates moving or copying data to accessible locations (e.g., scratch space) before processing can occur on compute nodes.

These constraints make it difficult to design and implement end-to-end data pipelines, especially when some steps require external API calls (restricted to login/head nodes) while others depend on high-performance compute resources (available only on compute nodes). TigerFlow addresses these challenges by offering a simple, unified framework for defining and running data pipelines across different types of cluster nodes.

Additional Advantages

TigerFlow further streamlines HPC workflows by addressing common inefficiencies in traditional Slurm-based job scheduling:

No need to pre-batch workloads. Each Slurm task in TigerFlow runs a dynamically scalable worker cluster that automatically adapts to the incoming workload, eliminating the need for manual batch planning and tuning.
No need to start a new Slurm job for each file. In TigerFlow, a single Slurm job runs as a long-lived worker process that handles multiple files. It performs common operations (e.g., setup and teardown) only once, while applying the actual file-processing logic individually to each file. This reduces idle time and resource waste from launching a separate Slurm job for every file.
No need to wait for all files to complete a pipeline step. In TigerFlow, files are processed individually as they arrive, supporting more flexible and dynamic workflows.

These features make TigerFlow especially well-suited for running large-scale or real-time data pipelines on HPC systems.

How to Use TigerFlow

TigerFlow can be run on any HPC cluster managed by Slurm. Since it is written in Python, the system must have Python (version 3.10 or higher) installed.

Installation

TigerFlow can be installed using pip:

pip install tigerflow

Or install the package with the additional dependencies for running the examples:

pip install tigerflow[examples]

It can also be installed using other package managers such as uv and poetry.

Quick Start

Once the package is installed, tigerflow command will be available, like so:

tigerflow --help

Running the above will display an overview of the tool, including supported subcommands.

For instance, run is a subcommand for running a user-defined pipeline, and its details can be viewed by running:

tigerflow run --help

Try running the examples, starting with a simple pipeline consisting of two local tasks.

What Next

Please check out user guides for more detailed instructions and examples.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Apr 9, 2026

0.2.1

Mar 26, 2026

0.2.0

Mar 5, 2026

0.1.0a3 pre-release

Feb 4, 2026

This version

0.1.0a2 pre-release

Sep 25, 2025

0.1.0a1 pre-release

Sep 23, 2025

0.1.0a0 pre-release

Aug 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tigerflow-0.1.0a2.tar.gz (6.8 MB view details)

Uploaded Sep 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tigerflow-0.1.0a2-py3-none-any.whl (24.2 kB view details)

Uploaded Sep 25, 2025 Python 3

File details

Details for the file tigerflow-0.1.0a2.tar.gz.

File metadata

Download URL: tigerflow-0.1.0a2.tar.gz
Upload date: Sep 25, 2025
Size: 6.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.15

File hashes

Hashes for tigerflow-0.1.0a2.tar.gz
Algorithm	Hash digest
SHA256	`8ed9d914cfd060bfc3d554a2c3e1b54df4e4a9181ef494d0df701f2e0526e1ab`
MD5	`5bd275f12df89e07ddfe65be4c83d782`
BLAKE2b-256	`0ab04a8f3bee9c8e60a22f00924e497ee798364c64344b7f800fa777460f31c9`

See more details on using hashes here.

File details

Details for the file tigerflow-0.1.0a2-py3-none-any.whl.

File metadata

Download URL: tigerflow-0.1.0a2-py3-none-any.whl
Upload date: Sep 25, 2025
Size: 24.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.15

File hashes

Hashes for tigerflow-0.1.0a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cc46bca4bedc68b4af1b9707b131512c92d7375b5ed3af480501ab8afd66839e`
MD5	`74627c11e1cc491b1123ffa73ed02238`
BLAKE2b-256	`4a96208c3375d6685262d05fc7d9fdc2aa3a7352f525031faa5e6e27e43ae0c4`

See more details on using hashes here.

tigerflow 0.1.0a2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

TigerFlow

Why TigerFlow Matters

Additional Advantages

How to Use TigerFlow

Installation

Quick Start

What Next

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes