Skip to main content

Minimal framework for ML modeling. Supports advanced dataset operations and streamlined training.

Project description

Overview

Minimal framework for ML modeling, supporting advanced dataset operations and streamlined training workflows.

Install

The trainlib package can be installed from PyPI:

pip install trainlib

Development

  • Initialize/synchronize the project with uv sync, creating a virtual environment with base package dependencies.
  • Depending on needs, install the development dependencies with uv sync --extra dev.

Testing

  • To run the unit tests, make sure to first have the test dependencies installed with uv sync --extra test, then run make test.
  • For notebook testing, run make install-kernel to make the environment available as a Jupyter kernel (to be selected when running notebooks).

Documentation

  • Install the documentation dependencies with uv sync --extra doc.
  • Run make docs-build (optionally preceded by make docs-clean), and serve locally with make docs-serve.

Development remarks

  • Across Trainer / Estimator / Dataset, I've considered a ParamSpec-based typing scheme to better orchestrate alignment in the Trainer.train() loop, e.g., so we can statically check whether a dataset appears to be fulfilling the argument requirements for the estimator's loss() / metrics() methods. Something like

    class Estimator[**P](nn.Module):
        def loss(
            self,
            input: Tensor,
            *args: P.args,
            **kwargs: P.kwargs,
        ) -> Generator:
            ...
    
    class Trainer[**P]:
        def __init__(
            self,
            estimator: Estimator[P],
            ...
        ): ...
    

    might be how we begin threading signatures. But ensuring dataset items can match P is challenging. You can consider a "packed" object where we obfuscate passing data through P-signatures:

    class PackedItem[**P]:
        def __init__(self, *args: P.args, **kwargs: P.kwargs) -> None:
            self._args = args
            self._kwargs = kwargs
    
        def apply[R](self, func: Callable[P, R]) -> R:
            return func(*self._args, **self._kwargs)
    
    
    class BatchedDataset[U, R, I, **P](Dataset):
        @abstractmethod
        def _process_item_data(
            self,
            item_data: I,
            item_index: int,
        ) -> PackedItem[P]:
            ...
    
        def __iter__(self) -> Iterator[PackedItem[P]]:
            ...
    

    Meaningfully shaping those signatures is what remains, but you can't really do this, not with typical type expression flexibility. For instance, if I'm trying to appropriately type my base TupleDataset:

    class SequenceDataset[I, **P](HomogenousDataset[int, I, I, P]):
        ...
    
    class TupleDataset[I](SequenceDataset[tuple[I, ...], "?"]):
        ...
    

    Here there's no way for me to shape a ParamSpec to indicate arbitrarily many arguments of a fixed type (I in this case) to allow me to unpack my item tuples into an appropriate PackedItem.

    Until this (among other issues) becomes clearer, I'm setting up around a simpler TypedDict type variable. We won't have particularly strong static checks for item alignment inside Trainer, but this seems about as good as I can get around the current infrastructure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trainlib-0.3.0.tar.gz (45.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trainlib-0.3.0-py3-none-any.whl (51.5 kB view details)

Uploaded Python 3

File details

Details for the file trainlib-0.3.0.tar.gz.

File metadata

  • Download URL: trainlib-0.3.0.tar.gz
  • Upload date:
  • Size: 45.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for trainlib-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3fc56fd25ca9df6e5fff52edfa8d75100c5453c29428e6fab82d4b207658450b
MD5 9b3221afd371f218257b10168a0d47d5
BLAKE2b-256 232e95d3c06a9e82e176c4dea17b414df92a6aa34fb8d3208c010faa7d82035f

See more details on using hashes here.

File details

Details for the file trainlib-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: trainlib-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 51.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for trainlib-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3adbc1270d8b43dfc72b019a5b5a34977a76469a251d16ea6fb8c3851ab6f9b7
MD5 2970741024e6f1d68140eb1a220b23cb
BLAKE2b-256 f40f170e5fb14fc7b625dfa6af7b054938719f83d26f762e230ed2efb9338ea1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page