Skip to main content

A wrapper of luigi. This make it easy to define tasks.

Project description

gokart

Build Status

A wrapper of the data pipeline library "luigi".

Getting Started

Run pip install gokart to install the latest version from PyPI. Documentation for the latest release is hosted on readthedocs.

How to Use

Please use gokart.TaskOnKart instead of luigi.Task to define your tasks.

Basic Task with gokart.TaskOnKart

import gokart

class BasicTask(gokart.TaskOnKart):
    def requires(self):
        return TaskA()

    def output(self):
        # please use TaskOnKart.make_target to make Target.
        return self.make_target('basic_task.csv')

    def run(self):
        # load data which TaskA output
        texts = self.load()
        
        # do something with texts, and make results.
        
        # save results with the file path {self.workspace_directory}/basic_task_{unique_id}.csv
        self.dump(results)

Details of base functions

Make Target with TaskOnKart

TaskOnKart.make_target judge Target type by the passed path extension. The following extensions are supported.

  • pkl
  • txt
  • csv
  • tsv
  • gz
  • json
  • xml

Make Target for models which generate multiple files in saving.

TaskOnKart.make_model_target and TaskOnKart.dump are designed to save and load models like gensim.model.Word2vec.

class TrainWord2Vec(TaskOnKart):
    def output(self):
        # please use 'zip'.
        return self.make_model_target(
            'model.zip', 
            save_function=gensim.model.Word2Vec.save,
            load_function=gensim.model.Word2Vec.load)

    def run(self):
        # make word2vec
        self.dump(word2vec)

Load input data

Pattern 1: Load input data individually.
def requires(self):
    return dict(data=LoadItemData(), model=LoadModel())

def run(self):
    # pass a key in the dictionary `self.requires()`
    data = self.load('data')  
    model = self.load('model')
Pattern 2: Load input data at once
def run(self):
    input_data = self.load()
    """
    The above line is equivalent to the following:
    input_data = dict(data=self.load('data'), model=self.load('model'))
    """

Load input data as pd.DataFrame

def requires(self):
    return LoadDataFrame()

def run(self):
    data = self.load_data_frame(required_columns={'id', 'name'})  

Advanced

Inherit task parameters with decorator

Description

class MasterConfig(luigi.Config):
    param: str = luigi.Parameter()
    param2: str = luigi.Parameter()

@inherits_config_params(MasterConfig)
class SomeTask(gokart.TaskOnKart):
    param: str = luigi.Parameter()

This is useful when multiple tasks has same parameter, since parameter settings of MasterConfig will be inherited to all tasks decorated with @inherits_config_params(MasterConfig).

Note that parameters which exists in both MasterConfig and SomeTask will be inherited. In the above example, param2 will not be available in SomeTask, since SomeTask does not have param2 parameter.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gokart-0.3.26.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

gokart-0.3.26-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file gokart-0.3.26.tar.gz.

File metadata

  • Download URL: gokart-0.3.26.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.1 Linux/5.4.0-1036-azure

File hashes

Hashes for gokart-0.3.26.tar.gz
Algorithm Hash digest
SHA256 ff276c1255439711baf59774851c18ea114101143e08485f56ec3df9d549a17a
MD5 acf0dbbb4bffc24eeee7d4068d591c1d
BLAKE2b-256 15e47dfc60941bd18d26f3d2105f01e003848cb10c4ea235fe01c4071df444fe

See more details on using hashes here.

File details

Details for the file gokart-0.3.26-py3-none-any.whl.

File metadata

  • Download URL: gokart-0.3.26-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.9.1 Linux/5.4.0-1036-azure

File hashes

Hashes for gokart-0.3.26-py3-none-any.whl
Algorithm Hash digest
SHA256 ec8479eb7a254cc84af5ff43b6d9a5583c375957c8ad1a5a64ef8cacf5005724
MD5 efb0f147a61e8d9d731aafee09ed3811
BLAKE2b-256 841f0b602d4adb3db18873fbde7e22df33a0ca2a73017e33abddae9768080088

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page