Skip to main content

Micro serialization utilities for Python with CLI support.

Project description

Micro Serialization Utilities for Python

uv pip install msup

With no required dependencies and only 496 LOC (cloc ./msup), this library enables you to:

  • create a CLI application from nested dataclass definitions (see example below)
  • serialize/deserialize dataclasses or regular python classes to/from json and python dictionaries without dependencies

Yes, the small LOC is an intentional feature.

design philosophy

This library is designed with the following design philosophies:

  • simplicity
  • minimal LOC
  • no dependencies by default, i.e. dependencies are opt-in
  • opinionated to reduce boilerplate

feature list

Serialization and de-serialization of:

  • dataclasses
    • validating types
    • basic primitives: float, str, int,
    • optionals
    • unions if there is no ambiguity
    • nested dataclasses
    • callables defined as a string
    • sub-objects can be loaded from a string representing a:
      • JSON, e.g. '{"x": 3, "name": "abc"}'
      • a file to JSON, e.g. myfile.json
      • TODO: in a future version, hooks will be added to the library to support other serialization formats such as JSON or YAML
  • other python classes with __init__, e.g. torch.optim.Adam (see examples/pt_basic.py)

TODOs

  • test optional robustly
    • Callable | None does not work
  • parameter sweep example
  • hooks to support other serialization formats, e.g. YAML
  • basic SQLite ORM, supporting:
    • schema generation with support to mark fields as a PK, FK and an index
    • encode/decode from SQLite
  • dataclass serialization
    • renaming fields
    • enum
    • union tests (aside from Optional)
  • CI tests
    • iterate over all examples/tests and run them

examples

  • simple CLI: examples/simple.py
  • multiple CLI commands with nested config (see below): examples/mutlicli.py
  • create a pytorch model and optimizer from config: examples/pt_dummpy.py
    • This example constructs python classes, such as a torch.optim.Adam, or a user provided optimizer class, e.g.
      python examples/pt_basic.py test_optim_advanced --lr 0.42 --optim torch.optim.SGD
      

The following demonstrates automatically creating a multi-command CLI serializing a dataclass to JSON, you can find this example in examples/mutlicli.py.

import os
from dataclasses import dataclass
from typing import Callable
from msup.cli import cli, cliarg, to_json

@dataclass
class ModelConfig:
    n_layers: int = cliarg(help="number of layers for the model", default=10)
    checkpoint_path: str | None = cliarg(short="-chkpt", help="path of the checkpoint", default=None)

def cosine_warmup_lr_step(i: int, base_lr: float): ...
@dataclass
class TrainArgs:
    model_config: ModelConfig = cliarg(default_factory=lambda: ModelConfig)
    lr: float = 0.01
    name: str = cliarg(help="name of experiment", default="example")
    lr_step_fn: Callable[[int, float], float] = cliarg(help="", default=cosine_warmup_lr_step)
    num_workers: int = -1
    cont: bool = cliarg(help="continue training from last known iter?", default=False)
    config_root_dir: str = cliarg(help="root directory where configuration is serialized to", default="./configs")

@dataclass
class EvalArgs:
    model_config: ModelConfig = cliarg(default_factory=lambda: ModelConfig)
    num_workers: int = -1
    # ...

def identity_step_fn(i: int, base_lr: float):
    return base_lr

def cosine_warmup_lr_step(i: int, base_lr: float):
    if args.warmup_iter and i < args.warmup_iter:
        return ((i+1) / args.warmup_iter) * base_lr
    else:
        t = torch.tensor((i - args.warmup_iter) / (args.niter - args.warmup_iter))
        t = torch.clamp(t, 0.0, 1.0)
        lr = base_lr * 0.5 * (1 + torch.cos(torch.pi * t))
        return lr

def train(args: TrainArgs):
    print("train args:")
    print(to_json(args))
    os.makedirs(args.config_root_dir, exist_ok=True)
    config_out_path = os.path.join(args.config_root_dir, args.name + ".json")

    print(f"\nwriting config to: {config_out_path}")
    to_json(args, config_out_path)

def eval(args: EvalArgs):
    print("eval args:")
    print(to_json(args))

if __name__ == "__main__":
    cli({
        train: "train a model",
        eval: "evaluate a trained model",
    })

With this example, you can run the train or eval function via python <script> {train,eval} [optional-args...], e.g.:

python examples/multicli.py train

Here's how we can change provide a custom python callable to use a different step function:

python examples/multicli.py train --lr_step_fn examples.multicli.identity_step_fn --lr 0.1 --name identity

# and now we can re-produce this config via:
python examples/multicli.py train configs/identity.json

# or provide --Args (or --TrainArgs) & optionally override args
python examples/multicli.py train --Args configs/identity.json --lr 0.2

We can also read a nested dataclasses from a file (e.g. JSON), or a string representing the encoded format (e.g. JSON), from the CLI, e.g.

python examples/multicli.py train --model_config configs/models/small.json

# or via a JSON object defined on the CLI
python examples/multicli.py train --model_config '{"n_layers": 1}'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msup-0.1.1.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msup-0.1.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file msup-0.1.1.tar.gz.

File metadata

  • Download URL: msup-0.1.1.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for msup-0.1.1.tar.gz
Algorithm Hash digest
SHA256 26e55dde564a5f7d0fc1b311bb06d0ceedf59a66a968400187acec2bd253ca5b
MD5 f0a007f82d56d300a10c020696cd1817
BLAKE2b-256 e814c1870dd058c984efc3757055684cbad5cd5269afe58bf0019afbdf85c6c5

See more details on using hashes here.

File details

Details for the file msup-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: msup-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for msup-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8580e5dde3b39d68488cc70fd4bf3792b5433fc2166b638c4cbcb88ccca4e724
MD5 995df3d8da0bd01ea98bcb3caf96e4cc
BLAKE2b-256 3b4d890b944964e62e0139ae5fe1b54b3126046607c9a928bef23f75d094f541

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page