Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline. [Documentation](https://gokart.readthedocs.io/en/latest/)
Project description
gokart
Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
Documentation for the latest release is hosted on readthedocs.
About gokart
Here are some good things about gokart.
- The following meta data for each Task is stored separately in a
pklfile with hash value- task output data
- imported all module versions
- task processing time
- random seed in task
- displayed log
- all parameters set as class variables in the task
- Automatically rerun the pipeline if parameters of Tasks are changed.
- Support GCS and S3 as a data store for intermediate results of Tasks in the pipeline.
- The above output is exchanged between tasks as an intermediate file, which is memory-friendly
pandas.DataFrametype and column checking during I/O- Directory structure of saved files is automatically determined from structure of script
- Seeds for numpy and random are automatically fixed
- Can code while adhering to SOLID principles as much as possible
- Tasks are locked via redis even if they run in parallel
All the functions above are created for constructing Machine Learning batches. Provides an excellent environment for reproducibility and team development.
Here are some non-goal / downside of the gokart.
- Batch execution in parallel is supported, but parallel and concurrent execution of task in memory.
- Gokart is focused on reproducibility. So, I/O and capacity of data storage can become a bottleneck.
- No support for task visualize.
- Gokart is not an experiment management tool. The management of the execution result is cut out as Thunderbolt.
- Gokart does not recommend writing pipelines in toml, yaml, json, and more. Gokart is preferring to write them in Python.
Getting Started
Within the activated Python environment, use the following command to install gokart.
pip install gokart
Quickstart
Minimal Example
A minimal gokart tasks looks something like this:
import gokart
class Example(gokart.TaskOnKart):
def run(self):
self.dump('Hello, world!')
task = Example()
output = gokart.build(task)
print(output)
gokart.build return the result of dump by gokart.TaskOnKart. The example will output the following.
Hello, world!
Type-Safe Pipeline Example
We introduce type-annotations to make a gokart pipeline robust. Please check the following example to see how to use type-annotations on gokart. Before using this feature, ensure to enable mypy plugin feature in your project.
import gokart
# `gokart.TaskOnKart[str]` means that the task dumps `str`
class StrDumpTask(gokart.TaskOnKart[str]):
def run(self):
self.dump('Hello, world!')
# `gokart.TaskOnKart[int]` means that the task dumps `int`
class OneDumpTask(gokart.TaskOnKart[int]):
def run(self):
self.dump(1)
# `gokart.TaskOnKart[int]` means that the task dumps `int`
class TwoDumpTask(gokart.TaskOnKart[int]):
def run(self):
self.dump(2)
class AddTask(gokart.TaskOnKart[int]):
# `a` requires a task to dump `int`
a: gokart.TaskOnKart[int] = gokart.TaskInstanceParameter()
# `b` requires a task to dump `int`
b: gokart.TaskOnKart[int] = gokart.TaskInstanceParameter()
def requires(self):
return dict(a=self.a, b=self.b)
def run(self):
# loading by instance parameter,
# `a` and `b` are treated as `int`
# because they are declared as `gokart.TaskOnKart[int]`
a = self.load(self.a)
b = self.load(self.b)
self.dump(a + b)
valid_task = AddTask(a=OneDumpTask(), b=TwoDumpTask())
# the next line will show type error by mypy
# because `StrDumpTask` dumps `str` and `AddTask` requires `int`
invalid_task = AddTask(a=OneDumpTask(), b=StrDumpTask())
This is an introduction to some of the gokart. There are still more useful features.
Please See Documentation .
Have a good gokart life.
Achievements
Gokart is a proven product.
- It's actually been used by m3.inc for over 3 years
- Natural Language Processing Competition by Nishika.inc 2nd prize : Solution Repository
Thanks
gokart is a wrapper for luigi. Thanks to luigi and dependent projects!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gokart-1.11.0.tar.gz.
File metadata
- Download URL: gokart-1.11.0.tar.gz
- Upload date:
- Size: 82.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf90008e4838cbbf698dd51bf3d314de7aa591adc8f4ea7b06a6d7e3e09133b9
|
|
| MD5 |
8f2e0b2071bf59edfd15a05cda1dfb0a
|
|
| BLAKE2b-256 |
5f83b18ff1b2089c3b6d59bbb26ce7ca002747354e1801de6690278d0445c44f
|
File details
Details for the file gokart-1.11.0-py3-none-any.whl.
File metadata
- Download URL: gokart-1.11.0-py3-none-any.whl
- Upload date:
- Size: 70.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d5618767b2eedd4fee8eeb4e981109d83a646d83d5b5ca5e30356affbcc1e3d
|
|
| MD5 |
7b75da72ced097b4878e079ed3411190
|
|
| BLAKE2b-256 |
5c79ead098bd36a574402db7a26736f5bca3fa37508510680fe438ea0c0a0993
|