Skip to main content

Framework to build data pipelines

Project description

Project Status: Archived

This repository is no longer actively maintained. Due to shifting priorities and limited resources, we have decided to archive the repository and discontinue further development and maintenance.

What this means:

  • No new features or updates will be added.
  • Issues and pull requests will no longer be reviewed or responded to.
  • You are welcome to fork the project and continue development under your own maintenance.

luisy

Test Package Test docs PyPI

This tool is an extension for the Python Framework luigi which helps to build reproducable and complex data pipelines for batch jobs. Visit our docs to learn more!


How to use?

This is how an end-to-end luisy pipeline may look like:

    import luisy
    import pandas as pd
    
    @luisy.raw
    @luisy.csv_output(delimiter=',')
    class InputFile(luisy.ExternalTask):
        label = luisy.Parameter()
    
        def get_file_name(self): 
            return f"file_{self.label}"
    
    @luisy.interim
    @luisy.requires(InputFile)
    class ProcessedFile(luisy.Task):
        def run(self):
            df = self.input().read()
            # Some more preprocessings
            # ...
            # Write to disk
            self.write(df)
    
    @luisy.final
    class MergedFile(luisy.ConcatenationTask):
        def requires(self):
            for label in ['a', 'b', 'c', 'd']:
                yield ProcessedFile(label=label)

How to install?

Stable Branch: main

Minimum python version: 3.8

Install luisy with

pip install luisy

How to test?

To run all unittests that are inside the tests directory use the following command:

pytest

How to contribute?

Please have a look at our contribution guide.

Third-Party Licenses

Runtime dependencies

Name License Type
numpy BSD-3-Clause License Dependency
pandas BSD 3-Clause License Dependency
networkx BSD-3-Clause License Dependency
luigi Apache License 2.0 Dependency
distlib Python license Dependency
matplotlib Other Dependency
azure-storage-blob MIT License Dependency
tables BSD license Dependency
pipdeptree MIT License Dependency
requirements-parser Apache License 2.0 Dependency
pyarrow Apache License 2.0 Dependency
spark Apache License 2.0 Dependency

Development dependency

Name License Type
sphinx BSD-2-Clause Dependency
sphinx_rtd_theme MIT License Dependency
flake8 MIT License Dependency
pytest MIT License Dependency
pytest-flake8 BSD License Dependency
pytest-cov MIT License Dependency
pip-tools BSD 3-Clause License Dependency

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

luisy-1.4.7.tar.gz (41.1 kB view details)

Uploaded Source

Built Distribution

luisy-1.4.7-py2.py3-none-any.whl (47.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file luisy-1.4.7.tar.gz.

File metadata

  • Download URL: luisy-1.4.7.tar.gz
  • Upload date:
  • Size: 41.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for luisy-1.4.7.tar.gz
Algorithm Hash digest
SHA256 b1209b4749cbfd1f48304b753b6484b5e8c1fd871eb225b4e8fb8b228280c18a
MD5 e56db64a4344b44407c0ed7262c0d9f6
BLAKE2b-256 fb9600961d2f6e9d3a55f9ca5609cbdbee777401d87bdb4d04391e44a997d19d

See more details on using hashes here.

File details

Details for the file luisy-1.4.7-py2.py3-none-any.whl.

File metadata

  • Download URL: luisy-1.4.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 47.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for luisy-1.4.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 edea81f7d50855e85270d4616acf15554657fb168cb64917598412783ff2c4b3
MD5 518bb30af59764934c7ba56f564a862d
BLAKE2b-256 8442a9baf8638adbed103a37062be11e0a10696a6f73b9b18e21254f01911a45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page