Skip to main content

Minimal framework for ML modeling. Supports advanced dataset operations and streamlined training.

Project description

Overview

Minimal framework for ML modeling, supporting advanced dataset operations and streamlined training workflows.

Install

The trainlib package can be installed from PyPI:

pip install trainlib

Development

  • Initialize/synchronize the project with uv sync, creating a virtual environment with base package dependencies.
  • Depending on needs, install the development dependencies with uv sync --extra dev.

Testing

  • To run the unit tests, make sure to first have the test dependencies installed with uv sync --extra test, then run make test.
  • For notebook testing, run make install-kernel to make the environment available as a Jupyter kernel (to be selected when running notebooks).

Documentation

  • Install the documentation dependencies with uv sync --extra doc.
  • Run make docs-build (optionally preceded by make docs-clean), and serve locally with make docs-serve.

Development remarks

  • Across Trainer / Estimator / Dataset, I've considered a ParamSpec-based typing scheme to better orchestrate alignment in the Trainer.train() loop, e.g., so we can statically check whether a dataset appears to be fulfilling the argument requirements for the estimator's loss() / metrics() methods. Something like

    class Estimator[**P](nn.Module):
        def loss(
            self,
            input: Tensor,
            *args: P.args,
            **kwargs: P.kwargs,
        ) -> Generator:
            ...
    
    class Trainer[**P]:
        def __init__(
            self,
            estimator: Estimator[P],
            ...
        ): ...
    

    might be how we begin threading signatures. But ensuring dataset items can match P is challenging. You can consider a "packed" object where we obfuscate passing data through P-signatures:

    class PackedItem[**P]:
        def __init__(self, *args: P.args, **kwargs: P.kwargs) -> None:
            self._args = args
            self._kwargs = kwargs
    
        def apply[R](self, func: Callable[P, R]) -> R:
            return func(*self._args, **self._kwargs)
    
    
    class BatchedDataset[U, R, I, **P](Dataset):
        @abstractmethod
        def _process_item_data(
            self,
            item_data: I,
            item_index: int,
        ) -> PackedItem[P]:
            ...
    
        def __iter__(self) -> Iterator[PackedItem[P]]:
            ...
    

    Meaningfully shaping those signatures is what remains, but you can't really do this, not with typical type expression flexibility. For instance, if I'm trying to appropriately type my base TupleDataset:

    class SequenceDataset[I, **P](HomogenousDataset[int, I, I, P]):
        ...
    
    class TupleDataset[I](SequenceDataset[tuple[I, ...], "?"]):
        ...
    

    Here there's no way for me to shape a ParamSpec to indicate arbitrarily many arguments of a fixed type (I in this case) to allow me to unpack my item tuples into an appropriate PackedItem.

    Until this (among other issues) becomes clearer, I'm setting up around a simpler TypedDict type variable. We won't have particularly strong static checks for item alignment inside Trainer, but this seems about as good as I can get around the current infrastructure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trainlib-0.3.1.tar.gz (46.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trainlib-0.3.1-py3-none-any.whl (52.8 kB view details)

Uploaded Python 3

File details

Details for the file trainlib-0.3.1.tar.gz.

File metadata

  • Download URL: trainlib-0.3.1.tar.gz
  • Upload date:
  • Size: 46.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for trainlib-0.3.1.tar.gz
Algorithm Hash digest
SHA256 fc1744cf442f6e85b5e820e44e9962cff1a724e27adec19492973c2b91b2e203
MD5 363017c253bcb1ce2c04c5701db7a9a8
BLAKE2b-256 77cca6c60a96c7f12e3230acac58e876ab57ddc7ae0c198bd1f27ec1a4b3c913

See more details on using hashes here.

File details

Details for the file trainlib-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: trainlib-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 52.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for trainlib-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3790757d411c81d3113757930f8b6b212669aabdf35bb0137e1d579b77d0b7ef
MD5 2fab0f93d4416b985b04ee37eb660360
BLAKE2b-256 bdb6170f53211b619d9c2992c5feacadc4dd0a7c563812555932423f86e9824a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page