Skip to main content

lightweight but versatile python-framework for multi-stage information processing

Project description

Tests

data-plumber

data-plumber is a lightweight but versatile python-framework for multi-stage information processing. It allows to construct processing pipelines from both atomic building blocks and via recombination of existing pipelines. Forks enable more complex (i.e. non-linear) orders of execution. Pipelines can also be collected into arrays that can be executed at once with the same input data.

Minimal usage example

Consider a scenario where the contents of a dictionary have to be validated and a suitable error message has to be generated. Specifically, a valid input- dictionary is expected to have a key "data" with the respective value being a list of integer numbers. A suitable pipeline might look like this

>>> from data_plumber import Stage, Pipeline, Previous
>>> pipeline = Pipeline(
        Stage(
            primer=lambda **kwargs: "data" in kwargs,
            status=lambda primer, **kwargs: 0 if primer else 1,
            message=lambda primer, **kwargs: "" if primer else "missing key"
        ),
        Stage(
            requires={Previous: 0},
            primer=lambda data, **kwargs: isinstance(data, list),
            status=lambda primer, **kwargs: 0 if primer else 1,
            message=lambda primer, **kwargs: "" if primer else "bad type"
        ),
        Stage(
            requires={Previous: 0},
            primer=lambda data, **kwargs: all(isinstance(i, int) for i in data),
            status=lambda primer, **kwargs: 0 if primer else 1,
            message=lambda primer, **kwargs: "validation success" if primer else "bad type in data"
        ),
        exit_on_status=1
    )
>>> pipeline.run(**{}).stages
[('missing key', 1)]
>>> pipeline.run(**{"data": 1}).stages
[('', 0), ('bad type', 1)]
>>> pipeline.run(**{"data": [1, "2", 3]}).stages
[('', 0), ('', 0), ('bad type in data', 1)]
>>> pipeline.run(**{"data": [1, 2, 3]}).stages
[('', 0), ('', 0), ('validation success', 0)]

Changelog

[1.8.0] - 2024-02-03

Changed

  • refactored Fork and Stage to transform string/integer-references to Stages into StageRefs (7ba677b)

Added

  • added decorator-factory Pipeline.run_for_kwargs to generate kwargs for function calls (fe616b2)
  • added optional Stage-callable to export kwargs into Pipeline.run (8eca1bc)
  • added even more types of StageRefs: PreviousN, NextN (576820c)
  • added py.typed-marker to package (04a2e1d)
  • added more types of StageRefs: StageById, StageByIndex, StageByIncrement (92d57ad)

[1.4.0] - 2024-02-01

Changed

  • refactored internal modules (cf7045f)

Added

  • added StageRefs Next, Last, and Skip (14abaa7)
  • added optional finalizer-Callable to Pipeline (d95e5b6)
  • added support for Callable in Pipeline-argument exit_on_status (154c67b)

Fixed

  • PipelineOutput.last_X-methods now return None in case of empty records (``)

[1.0.0] - 2024-01-31

Changed

  • Breaking: refactor PipelineOutput and related types (1436ca1)
  • Breaking: replaced forwarding kwargs of Pipeline.run as dictionary in_ into Stage/Fork-Callables by forwarding directly (f2710fa, b569bb9)

Added

  • added missing information in module- and class-docstrings (7896742)

[0.1.0] - 2024-01-31

initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-plumber-1.8.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

data_plumber-1.8.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file data-plumber-1.8.0.tar.gz.

File metadata

  • Download URL: data-plumber-1.8.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for data-plumber-1.8.0.tar.gz
Algorithm Hash digest
SHA256 d2f777571eee3fe4ad1e3fa01fe1b289729970fe27ecc71b6f0bd336e58c1739
MD5 a2d49b53a86f5bc81372b71b8c14dc74
BLAKE2b-256 d32a1e1c243f759b5cbe791ac7fb0b696492e173057ee11920e2161edfaa7b78

See more details on using hashes here.

File details

Details for the file data_plumber-1.8.0-py3-none-any.whl.

File metadata

  • Download URL: data_plumber-1.8.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for data_plumber-1.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 073a2f116dec39d832849c00c14867c8f3a039dbcc69b185a3a0971068535a99
MD5 67e5f6343b4bd08d3d770f01ac1d457b
BLAKE2b-256 1c0f8aface373d5603ff8292e410e725e336eb76ba2bf0682fc4f59f0f0c4621

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page