Lightweight file-to-file build tool built for production workloads

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

tinybaker: Lightweight file-to-file build tool built for production workloads

Installation with pip, e.g. pip install tinybaker

Brief Example

Let's say we wanted to define a transformation from one set of files to another. Tinybaker allows a developer to specify a set of input "tags" that can be configured later.

For example, consider a transformation from the two files some/path/train.csv and some/path/test.csv to a pickled ML model another/path/some_model.pkl. With tinybaker, you can specify this individual configurable step as follows:

# train_step.py
from tinybaker import StepDefinition
import pandas as pd
from some_cool_ml_library import train_model, test_model

class TrainModelStep(StepDefinition):
  input_file_set = {"train_csv", "test_csv"}
  output_file_set = {"pickled_model"}

  def script():
    with self.input_files["train_csv"].open() as f:
      train_data = pd.read_csv(f)
    with self.input_files["test_csv"].open() as f:
      test_data = pd.read_csv(f)
    X = train_data.drop(["label"])
    Y = train_data[["label"]]
    model = train_model(X, Y, depth_or_something=self.config["depth"])
    model.test_model()
    with self.output_files["pickled_model"] as f:
      pickle.dump(f, model)

# script.py
from .train_step import TrainModelStep

[_, train_csv_path, test_csv_path, pickled_model_path] =  parse_args(os)
TrainModelStep.build(
  input={
    "train_csv": train_csv_path,
    "test_csv": test_csv_path,
  },
  output={
    "pickled_model": pickled_model_path
  },
  config={"depth": 5}
)

This will perform standard error handling, such as raising early if certain files are missing.

Combining several build steps

Let's say you've got a sequence of steps. We can compose several build steps together using the methods merge and sequence.

from tinybaker import StepDefinition, sequence

class CleanLogs(StepDefinition):
  input_files={"raw_logfile"}
  output_files={"cleaned_logfile"}
  ...

class BuildDataframe(StepDefinition):
  input_files={"cleaned_logfile"}
  output_files={"dataframe"}
  ...

class BuildLabels(StepDefinition):
  input_files={"cleaned_logfile"}
  output_files={"labels"}

class TrainModelFromDataframe(StepDefinition):
  input_files={"dataframe", "labels"}
  output_files={"trained_model"}


TrainFromRawLogs = sequence(
  CleanLogs,
  merge(BuildDataframe, BuildLabels),
  TrainModelFromDataframe
)

task = TrainFromRawLogs(
  input_paths={"raw_logfile": "/path/to/raw.log"},
  output_paths={"trained_model": "/path/to/model.pkl"}
)

task.build()

Mapping

Right now, association of files from one step to the next is based on tags. If we want to change the tag names, we can use map_tags to change them.

MappedStep = map_tags(
  SomeStep,
  input_mapping={"old_input_name": "new_input_name"},
  output_mapping={"old_output_name": "new_output_name"})

That's it!!

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.0

Apr 29, 2021

0.3.2

Dec 29, 2020

0.3.1

Dec 29, 2020

0.3.0

Dec 18, 2020

0.2.5

Dec 15, 2020

0.2.4

Dec 13, 2020

0.2.3

Dec 12, 2020

0.2.2

Dec 6, 2020

0.2.1

Dec 6, 2020

0.2.0

Dec 4, 2020

0.1.1

Nov 29, 2020

This version

0.1.0

Nov 28, 2020

0.0.1

Nov 25, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinybaker-0.1.0.tar.gz (7.2 kB view hashes)

Uploaded Nov 28, 2020 Source

Built Distribution

tinybaker-0.1.0-py3-none-any.whl (8.2 kB view hashes)

Uploaded Nov 28, 2020 Python 3

Hashes for tinybaker-0.1.0.tar.gz

Hashes for tinybaker-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0b872e1186542fbf5a758ff4c3006256830da36695f4663a8b03d719f7ea239e`
MD5	`0da0864dfe4f772abfd8caedfbbdae3f`
BLAKE2b-256	`86fdbed2026046984aa30db6f314170b34b276f65980908015962affd1551f73`

Hashes for tinybaker-0.1.0-py3-none-any.whl

Hashes for tinybaker-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`095b5b79d1468773f987ed8d33fddb8280734482dc558becdf41bf2ff1fb0c71`
MD5	`c56ee67c74d3d9b2b48b8aef221687eb`
BLAKE2b-256	`f80ffcc56a9ffa548b010ecfb8822dffdfdd742b3f3597bd4e036d48da14330e`