Framework to organize processing code outputs to/from disk, processing chaining and versionning with a common easy to use api

These details have not been verified by PyPI

Project links

Project description

Pypelines

Python Version from PEP 621 TOML coverage

Installation

pip install processing-pypelines

or with pdm

pdm add processing-pypelines

Intro

This package aims at providing a lightweight yet powerfull framework for chaining processing tasks, on experimental sessions. It does so in a "data agnostic" way, which means it is interfaceable with already existing packages, that themselves provide some sort of pipelining (such as suite2p for example).

Out of the box, it most easily handles processing steps that takes diven set of parameters and/or files as input, and outputs built-in python dictionnaries, or pandas DataFrames. To interface if with more complex or intricated input/output data structures, you need to implement custom versions of DiskObject, wich are the readers/writers that define how to check the existence of a Step output (to know when to skip it if the output exists) how to save such output, and how to load it in case a Step further in the Pipeline wants to access this output for further processing stages.

Concepts

As so, I identified that chopping tasks needed two key concepts that may facilitate the life of the developpers : Pipe and Step

Step A step is a processing stage. It takes an input 0 to virtually unlimited inputs and performs python execution to generate an output. The existance of it's output can be verified with the atached DiskObject.
Pipe A pipe is a collection of Steps. In real life situations, it is very often that the same data structure can be having several Steps that add some data to it. For example, imaging data can be obtained, and segmented into tables of neurons fluorescence over time. As the execution goes further in the processing pipeline, some of the processing steps may calculate the responsivness of the neurons to one or another type of stimulation, obtained in another step. It then seems logical to only add this responsiveness information to the tables data, to not overcrowd the disk with duplicated stages with increasingly detailed/further processed data. The pipe serves this purpose. When a given Step requires the output of another Step, it looks up wich Pipe (e.g. general data structure) it is attached to, and loads the most advanced one available in that Pipe. As such, it means a Step output at level 6 of a Pipe, has to also be valid for the data that it holds, to a Step that requires the level 1 of this Pipe. (meaning : as a rule of thumb, don't delete data from a Pipe output with increasing steps, but only add new fields/columns to it.)

Below, is the example of a full Pipeline graph : Pipeline Graph

On the same column, the dots represents the different Steps of a Pipe. A Pipe is then a single column on this graph. Links between steps represent the dependancies between them.

The most usefull parts that this package allows you do to is :

to auto-resolve this graph simply based on what you declared that a Step needs as input requirement (it will raise Exceptions to you about errors with cyclic dependancies you might have made, when creating the Pipeline at runtime, to help sort them out)
to automatically run the required upstream Steps, whenever you decide to get a specific Step arbitrarily positionned in the tree.

Examples :

To implement a Pipeline, you have to define at least a Step, attached inside a Pipe. Here is a simple example. You can test this example, and see the result on your own computer. (Two csv files located in the test/data folder of this repository will be used for demonstration purposes.)

from pypelines import Pipeline, BasePipe, BaseStep, Session, pickle_backend
from pathlib import Path
import pandas, numpy, json

ROIS_URL = "https://raw.githubusercontent.com/JostTim/pypelines/refs/heads/main/tests/data/rois_df.csv"
TRIALS_URL = "https://raw.githubusercontent.com/JostTim/pypelines/refs/heads/main/tests/data/trials_df.csv"

pipeline = Pipeline("my_neurophy_pipeline")

@pipeline.register_pipe
class ROIsTablePipe(BasePipe):
    pipe_name = "rois_df"
    disk_class = pickle_backend.PickleDiskObject
    
    class InitialCalculation(BaseStep):
        step_name = "read"
        
        def worker(self, session, extra=""):
            rois_data = pandas.read_csv(ROIS_URL).set_index("roi#")
            rois_data["F_norm"] = rois_data["F_norm"].apply(json.loads)
            return rois_data
        
@pipeline.register_pipe
class TrialsTablePipe(BasePipe):
    pipe_name = "trials_df"
    disk_class = pickle_backend.PickleDiskObject

    class InitialRead(BaseStep):
        step_name = "read"

        def worker(self, session, extra = ""):
            trials_data = pandas.read_csv(TRIALS_URL).set_index("trial#")            
            return trials_data
        
    class AddFrameTimes(BaseStep):
        step_name = "frame_times"
        requires = "trials_df.read"

        def worker(self, session, extra = "", sample_frequency_ms = 1000/30):
            def get_frame(time_ms):
                return int(numpy.round(time_ms / sample_frequency_ms))
            trials_data = self.load_requirement("trials_df",session)     
            trials_data["trial_start_frame"] = trials_data["trial_start_global_ms"].apply(get_frame)
            trials_data["stimulus_start_frame"] = trials_data["stimulus_start_ms"].apply(get_frame)
            trials_data["stimulus_change_frame"] = trials_data["stimulus_change_ms"].apply(get_frame)
            return trials_data

@pipeline.register_pipe
class TrialsCrossRoisTablePipe(BasePipe):
    pipe_name = "trials_rois_df"
    disk_class = pickle_backend.PickleDiskObject
    
    class InitialMerge(BaseStep):
        step_name = "merge"
        requires = ["rois_df.read", "trials_df.frame_times"]

        def worker(self, session, extra = ""):
            trials_data = self.load_requirement("trials_df",session)  
            rois_data = self.load_requirement("rois_df",session)

            trials_starts = trials_data["trial_start_frame"].to_list() + [len(rois_data["F_norm"].iloc[0])]

            trials_rois_data = []
            for roi_id, roi_details in rois_data.iterrows():
                roi_details = roi_details.to_dict()
                roi_fluorescence = roi_details.pop("F_norm")
                for trial_nb, (trial_id, trial_details) in enumerate(trials_data.iterrows()):
                    new_row = {"roi#": roi_id, "trial#": trial_id}
                    new_row["F_norm"] = roi_fluorescence[trials_starts[trial_nb]:trials_starts[trial_nb+1]]
                    new_row.update(trial_details.to_dict())
                    new_row.update(roi_details)
                    trials_rois_data.append(new_row)

            return pandas.DataFrame(trials_rois_data).set_index(["roi#", "trial#"])

pipeline.graph.draw()

session = Session(subject="test",date="2025-05-15",number=1,path=".",auto_path=True)

trials_roi_df = pipeline.trials_rois_df.merge.generate(session = session, check_requirements=True)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.98

Dec 17, 2025

0.0.97

Nov 26, 2025

0.0.96

Sep 11, 2025

0.0.95

Aug 27, 2025

0.0.94

Jul 24, 2025

0.0.93

Jun 30, 2025

0.0.92

May 16, 2025

0.0.91

May 16, 2025

0.0.90

May 16, 2025

0.0.89

May 15, 2025

0.0.88

May 15, 2025

0.0.87

May 15, 2025

0.0.86

May 15, 2025

0.0.85

May 15, 2025

This version

0.0.84

May 15, 2025

0.0.83

May 15, 2025

0.0.82

May 13, 2025

0.0.81

May 13, 2025

0.0.80

Mar 20, 2025

0.0.79

Mar 20, 2025

0.0.78

Mar 19, 2025

0.0.77

Nov 7, 2024

0.0.66

Jun 27, 2024

0.0.65

Jun 3, 2024

0.0.63

Jun 3, 2024

0.0.62

May 31, 2024

0.0.61

May 30, 2024

0.0.60

May 29, 2024

0.0.59

May 28, 2024

0.0.58

May 14, 2024

0.0.57

May 3, 2024

0.0.56

Apr 10, 2024

0.0.55

Apr 5, 2024

0.0.54

Apr 4, 2024

0.0.53

Apr 4, 2024

0.0.52

Apr 4, 2024

0.0.51

Apr 4, 2024

0.0.50

Apr 4, 2024

0.0.49

Apr 4, 2024

0.0.48

Apr 4, 2024

0.0.47

Apr 4, 2024

0.0.46

Apr 4, 2024

0.0.45

Apr 4, 2024

0.0.44

Mar 28, 2024

0.0.43

Mar 28, 2024

0.0.42

Mar 28, 2024

0.0.41

Mar 28, 2024

0.0.40

Mar 28, 2024

0.0.39

Mar 28, 2024

0.0.38

Mar 28, 2024

0.0.37

Mar 27, 2024

0.0.36

Mar 27, 2024

0.0.35

Mar 27, 2024

0.0.34

Mar 27, 2024

0.0.33

Mar 27, 2024

0.0.32

Mar 27, 2024

0.0.30

Mar 27, 2024

0.0.29

Mar 26, 2024

0.0.28

Mar 19, 2024

0.0.27

Mar 16, 2024

0.0.26

Mar 14, 2024

0.0.25

Mar 6, 2024

0.0.24

Mar 5, 2024

0.0.23

Feb 27, 2024

0.0.22

Feb 17, 2024

0.0.21

Feb 8, 2024

0.0.20

Feb 5, 2024

0.0.19

Jan 11, 2024

0.0.18

Jan 9, 2024

0.0.17

Jan 9, 2024

0.0.16

Jan 9, 2024

0.0.15

Jan 9, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

processing_pypelines-0.0.84.tar.gz (9.0 MB view details)

Uploaded May 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

processing_pypelines-0.0.84-py3-none-any.whl (61.4 kB view details)

Uploaded May 15, 2025 Python 3

File details

Details for the file processing_pypelines-0.0.84.tar.gz.

File metadata

Download URL: processing_pypelines-0.0.84.tar.gz
Upload date: May 15, 2025
Size: 9.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for processing_pypelines-0.0.84.tar.gz
Algorithm	Hash digest
SHA256	`427eaf7551650cf7dab368cfdf0485fb911df48aa616f5ae674535cc546046a9`
MD5	`ae481c4d7d76be3904858752e7eb935a`
BLAKE2b-256	`b4aa12f03741ce3d952b2c522b5aed254a11e5e82c0dbe6ffd6e4582d84d159f`

See more details on using hashes here.

File details

Details for the file processing_pypelines-0.0.84-py3-none-any.whl.

File metadata

Download URL: processing_pypelines-0.0.84-py3-none-any.whl
Upload date: May 15, 2025
Size: 61.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for processing_pypelines-0.0.84-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b500a1479c37b99fbbe537ed9d8efa64d6dde2241037ec1238b0438b52428eb`
MD5	`6225bbacf66845612376fb0221cc5922`
BLAKE2b-256	`04e43772acde210ce93370f37b5ead388890cd41cf89cab4459a669b93d9b982`

See more details on using hashes here.

processing-pypelines 0.0.84

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Pypelines

Installation

Intro

Concepts

Examples :

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes