Skip to main content

A wrapper of luigi. This make it easy to define tasks.

Project description

.. note::

For the latest source, discussion, etc, please visit the
`GitHub repository <https://github.com/m3dev/gokart>`_


# gokart

[![Build Status](https://travis-ci.org/m3dev/gokart.svg)](https://travis-ci.org/m3dev/gokart)

A wrapper of the data pipeline library "luigi".


## Getting Started
Run `pip install gokart` to install the latest version from PyPI. [Documentation](https://gokart.readthedocs.io/en/latest/) for the latest release is hosted on readthedocs.

## How to Use
Please use gokart.TaskOnKart instead of luigi.Task to define your tasks.


### Basic Task with gokart.TaskOnKart
```python
import gokart

class BasicTask(gokart.TaskOnKart):
def requires(self):
return TaskA()

def output(self):
# please use TaskOnKart.make_target to make Target.
return self.make_target('basic_task.csv')

def run(self):
# load data which TaskA output
texts = self.load()

# do something with texts, and make results.

# save results with the file path {self.workspace_directory}/basic_task_{unique_id}.csv
self.dump(results)
```

### Details of base functions
#### Make Target with TaskOnKart
`TaskOnKart.make_target` judge `Target` type by the passed path extension. The following extensions are supported.

- pkl
- txt
- csv
- tsv
- gz
- json
- xml

#### Make Target for models which generate multiple files in saving.
`TaskOnKart.make_model_target` and `TaskOnKart.dump` are designed to save and load models like gensim.model.Word2vec.
```python
class TrainWord2Vec(TaskOnKart):
def output(self):
# please use 'zip'.
return self.make_model_target(
'model.zip',
save_function=gensim.model.Word2Vec.save,
load_function=gensim.model.Word2Vec.load)

def run(self):
# make word2vec
self.dump(word2vec)
```

#### Load input data
##### Pattern 1: Load input data individually.
```python
def requires(self):
return dict(data=LoadItemData(), model=LoadModel())

def run(self):
# pass a key in the dictionary `self.requires()`
data = self.load('data')
model = self.load('model')
```

##### Pattern 2: Load input data at once
```python
def run(self):
input_data = self.load()
"""
The above line is equivalent to the following:
input_data = dict(data=self.load('data'), model=self.load('model'))
"""
```


#### Load input data as pd.DataFrame
```python
def requires(self):
return LoadDataFrame()

def run(self):
data = self.load_data_frame(required_columns={'id', 'name'})
```

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gokart-0.1.20.tar.gz (25.4 kB view details)

Uploaded Source

File details

Details for the file gokart-0.1.20.tar.gz.

File metadata

  • Download URL: gokart-0.1.20.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for gokart-0.1.20.tar.gz
Algorithm Hash digest
SHA256 84d9244f933b5f44c0fb3fe6946a038714c8a3a4edc6852527e3e91d5b1736ce
MD5 183a90c3c086d6077de706d685c20b8c
BLAKE2b-256 a4526a39ef3bd91e76e9b4650fa76897003187c7a784d7acf9730fe6c3713aea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page