Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

A wrapper of luigi. This make it easy to define tasks.

Project description

gokart

Build Status

A wrapper of the data pipeline library "luigi".

Getting Started

Run pip install gokart to install the latest version from PyPI. Documentation for the latest release is hosted on readthedocs.

How to Use

Please use gokart.TaskOnKart instead of luigi.Task to define your tasks.

Basic Task with gokart.TaskOnKart

import gokart

class BasicTask(gokart.TaskOnKart):
    def requires(self):
        return TaskA()

    def output(self):
        # please use TaskOnKart.make_target to make Target.
        return self.make_target('basic_task.csv')

    def run(self):
        # load data which TaskA output
        texts = self.load()
        
        # do something with texts, and make results.
        
        # save results with the file path {self.workspace_directory}/basic_task_{unique_id}.csv
        self.dump(results)

Details of base functions

Make Target with TaskOnKart

TaskOnKart.make_target judge Target type by the passed path extension. The following extensions are supported.

  • pkl
  • txt
  • csv
  • tsv
  • gz
  • json
  • xml

Make Target for models which generate multiple files in saving.

TaskOnKart.make_model_target and TaskOnKart.dump are designed to save and load models like gensim.model.Word2vec.

class TrainWord2Vec(TaskOnKart):
    def output(self):
        # please use 'zip'.
        return self.make_model_target(
            'model.zip', 
            save_function=gensim.model.Word2Vec.save,
            load_function=gensim.model.Word2Vec.load)

    def run(self):
        # make word2vec
        self.dump(word2vec)

Load input data

Pattern 1: Load input data individually.
def requires(self):
    return dict(data=LoadItemData(), model=LoadModel())

def run(self):
    # pass a key in the dictionary `self.requires()`
    data = self.load('data')  
    model = self.load('model')
Pattern 2: Load input data at once
def run(self):
    input_data = self.load()
    """
    The above line is equivalent to the following:
    input_data = dict(data=self.load('data'), model=self.load('model'))
    """

Load input data as pd.DataFrame

def requires(self):
    return LoadDataFrame()

def run(self):
    data = self.load_data_frame(required_columns={'id', 'name'})  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for gokart, version 0.3.7
Filename, size File type Python version Upload date Hashes
Filename, size gokart-0.3.7.tar.gz (27.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page