A wrapper of luigi. This make it easy to define tasks.
Project description
.. note::
For the latest source, discussion, etc, please visit the
`GitHub repository <https://github.com/m3dev/gokart>`_
# gokart
[![Build Status](https://travis-ci.org/m3dev/gokart.svg)](https://travis-ci.org/m3dev/gokart)
A wrapper of the data pipeline library "luigi".
## Getting Started
Run `pip install gokart` to install the latest version from PyPI. [Documentation](https://gokart.readthedocs.io/en/latest/) for the latest release is hosted on readthedocs.
## How to Use
Please use gokart.TaskOnKart instead of luigi.Task to define your tasks.
### Basic Task with gokart.TaskOnKart
```python
import gokart
class BasicTask(gokart.TaskOnKart):
def requires(self):
return TaskA()
def output(self):
# please use TaskOnKart.make_target to make Target.
return self.make_target('basic_task.csv')
def run(self):
# load data which TaskA output
texts = self.load()
# do something with texts, and make results.
# save results with the file path {self.workspace_directory}/basic_task_{unique_id}.csv
self.dump(results)
```
### Details of base functions
#### Make Target with TaskOnKart
`TaskOnKart.make_target` judge `Target` type by the passed path extension. The following extensions are supported.
- pkl
- txt
- csv
- tsv
- gz
- json
- xml
#### Make Target for models which generate multiple files in saving.
`TaskOnKart.make_model_target` and `TaskOnKart.dump` are designed to save and load models like gensim.model.Word2vec.
```python
class TrainWord2Vec(TaskOnKart):
def output(self):
# please use 'zip'.
return self.make_model_target(
'model.zip',
save_function=gensim.model.Word2Vec.save,
load_function=gensim.model.Word2Vec.load)
def run(self):
# make word2vec
self.dump(word2vec)
```
#### Load input data
##### Pattern 1: Load input data individually.
```python
def requires(self):
return dict(data=LoadItemData(), model=LoadModel())
def run(self):
# pass a key in the dictionary `self.requires()`
data = self.load('data')
model = self.load('model')
```
##### Pattern 2: Load input data at once
```python
def run(self):
input_data = self.load()
"""
The above line is equivalent to the following:
input_data = dict(data=self.load('data'), model=self.load('model'))
"""
```
#### Load input data as pd.DataFrame
```python
def requires(self):
return LoadDataFrame()
def run(self):
data = self.load_data_frame(required_columns={'id', 'name'})
```
For the latest source, discussion, etc, please visit the
`GitHub repository <https://github.com/m3dev/gokart>`_
# gokart
[![Build Status](https://travis-ci.org/m3dev/gokart.svg)](https://travis-ci.org/m3dev/gokart)
A wrapper of the data pipeline library "luigi".
## Getting Started
Run `pip install gokart` to install the latest version from PyPI. [Documentation](https://gokart.readthedocs.io/en/latest/) for the latest release is hosted on readthedocs.
## How to Use
Please use gokart.TaskOnKart instead of luigi.Task to define your tasks.
### Basic Task with gokart.TaskOnKart
```python
import gokart
class BasicTask(gokart.TaskOnKart):
def requires(self):
return TaskA()
def output(self):
# please use TaskOnKart.make_target to make Target.
return self.make_target('basic_task.csv')
def run(self):
# load data which TaskA output
texts = self.load()
# do something with texts, and make results.
# save results with the file path {self.workspace_directory}/basic_task_{unique_id}.csv
self.dump(results)
```
### Details of base functions
#### Make Target with TaskOnKart
`TaskOnKart.make_target` judge `Target` type by the passed path extension. The following extensions are supported.
- pkl
- txt
- csv
- tsv
- gz
- json
- xml
#### Make Target for models which generate multiple files in saving.
`TaskOnKart.make_model_target` and `TaskOnKart.dump` are designed to save and load models like gensim.model.Word2vec.
```python
class TrainWord2Vec(TaskOnKart):
def output(self):
# please use 'zip'.
return self.make_model_target(
'model.zip',
save_function=gensim.model.Word2Vec.save,
load_function=gensim.model.Word2Vec.load)
def run(self):
# make word2vec
self.dump(word2vec)
```
#### Load input data
##### Pattern 1: Load input data individually.
```python
def requires(self):
return dict(data=LoadItemData(), model=LoadModel())
def run(self):
# pass a key in the dictionary `self.requires()`
data = self.load('data')
model = self.load('model')
```
##### Pattern 2: Load input data at once
```python
def run(self):
input_data = self.load()
"""
The above line is equivalent to the following:
input_data = dict(data=self.load('data'), model=self.load('model'))
"""
```
#### Load input data as pd.DataFrame
```python
def requires(self):
return LoadDataFrame()
def run(self):
data = self.load_data_frame(required_columns={'id', 'name'})
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gokart-0.1.19.tar.gz
(25.4 kB
view details)
File details
Details for the file gokart-0.1.19.tar.gz
.
File metadata
- Download URL: gokart-0.1.19.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52d2dd01438e58c0b5f55750b4e96858da58c5b1a9195836e398325da4a34b0b |
|
MD5 | 4b43be9aed7f34810ae45b3812d67691 |
|
BLAKE2b-256 | 42f3d969bbea2b367652e4dbc96b7e99462005104b056d11147408e8e9946772 |