Skip to main content

Declarative machine learning experiments.

Project description

dmlx

Declarative Machine Learning eXperiments

Introduction

dmlx is a declarative framework for machine learning (ML) experiments. Typically, ML codebases use the standard python library argparse to parse parameters from command line, and pass these parameters deep into the models and other components. dmlx standardizes this process and provides an elegant framework for experiment declaration and basic management, including the following main features:

  • Declarative Experiment Components: Declarative interfaces are presented for defining resusable and reproducible experiment components and hyperparameters, such as model path, dataset getter and random seed.
  • click-powered Command Line Interface: click is integrated to provide powerful command line functionalities, including parameter properties.
  • Automatic Parameter Collection: Parameter properties will be wired with command line inputs and collected for experiment reproducibility.
  • Experiment Archive Management: Archive directories will be automatically created to hold experiment data for further analysis.
  • ML Framework Independent: dmlx is independent from ML frameworks so you can use whatever ML framework you like (PyTorch/TensorFlow/ScikitLearn/...).

Example

An example ML codebase using dmlx is illustrated below:

  • my_innovative_approach/
    • model/
      • baseline.py
      • ours.py
    • dataset/
      • dataset_foo.py
      • dataset_bar.py
    • experiments/
      • ...
    • approach.py
    • train.py
    • analyze.py
  1. Firstly, models are defined as submodules of the model module, and dataset loaders are defined as submodules of the dataset module. These components should expect normal Python arguments, and the component factories defined later using component() will parse command line parameters and pass the arguments to real components.

    # model/xxx.py
    
    class Model:
        def __init__(self, alpha: float, beta: float, ...) -> None: ...
    
    # dataset/dataset_yyy.py
    
    def get_dataset_yyy(...): ...
    
  2. Secondly, the components (models/datasets) and other parameters can be declared as properties on a composed approach using dmlx. The parameter properties, declared by argument() and option(), will define corresponding command line parameters and store them as instance attributes. The component properties, declared by component(), will create the actual component objects and store them as instance attributes.

    # approach.py
    
    from dmlx.context import argument, option, component
    
    
    class Approach:
        model = component(
            argument("model_locator", default="ours"),  # click argument
            "model",  # module base
            "Model",  # default factory name
        )
        dataset = component(
            option("dataset_locator", "-d", "--dataset"),  # click option
            "dataset",  # module base
        )
        epochs = option("-e", "--epochs", type=int, default=800)  # click option
    
        def run(self):
            for epoch in range(self.epochs):
                for x, y_true in self.dataset:
                    y_pred = self.model(x)
                    yield x, y_true, y_pred
    
  3. Thirdly, dmlx.experiment.Experiment can be used to declare your experiment. The experiment object will create an underlying click command, and the experiment context will collect the parameters(model_locator, dataset_locater and epochs) and wire them with command line inputs.

    # train.py
    
    from dmlx.experiment import Experiment
    
    experiment = Experiment()
    
    with experiment.context():
        from approach import Approach
    
    @experiment.main
    def main(**args):
        experiment.init()
    
        approach = Approach()
        with (experiment.path / "train.log").open("w") as log_file:
            for x, y_true, y_pred in approach.run():
                metrics = compute_metrics(y_pred, y_true)
                log_file.write(repr(metrics) + "\n")
    
        approach.model.save(experiment.path / "model.bin")
    
    experiment.run()
    
  4. Finally, you can invoke train.py in the command line to actually conduct the experiment, where component params accept string locators in the form of path.to.module[:factory_name][?[k_0=v_0][;k_n=v_n...]] with values parsed by json.loads.

    python train.py 'ours?alpha=0.1' \
        --dataset 'dataset_foo:get_dataset_foo?
            version = "2.0";
            shots = 5;
            # ...
        ' \
        --epochs 500
    
  5. After calling experiment.init(), an experiment directory will be created in experiments/, to which experiment.path will point, and the experiment meta will be dumped into meta.json in that directory. Extra data can also be saved to the experiment directory, as shown in train.py, where a log file train.log holding epoch metrics and a model archive model.bin are created. This experiment archive can then be loaded to perform extensive inspections, such as visualization and further statistical analysis, where properties defined on Approach will be automatically restored:

    # analyze.py
    
    from dmlx.experiment import Experiment
    
    experiment = Experiment()
    
    with experiment.context():
        from approach import Approach
    
    
    @experiment.main
    def main(**args):
        print("Loaded args:", args)
        print("Loaded meta:", experiment.meta)
    
        approach = Approach()
        approach.model.load(experiment.path / "model.bin")
    
        # Now, `args`, `approach.model`, `approach.dataset` and other properties
        # are all restored, ready for extensive inspections.
    
    
    experiment.load("/path/to/the/experiment")
    

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmlx-0.1.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmlx-0.1.1-py3-none-any.whl (3.9 kB view details)

Uploaded Python 3

File details

Details for the file dmlx-0.1.1.tar.gz.

File metadata

  • Download URL: dmlx-0.1.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.17

File hashes

Hashes for dmlx-0.1.1.tar.gz
Algorithm Hash digest
SHA256 66330467b4c5a768aab473469f4d30ba12cf01255a6ceeff95ca79e7309c9002
MD5 41d0d6a5f4f04d345b1a836ba52e630d
BLAKE2b-256 aea933f7117f0afd7e1e8baf02b40339f2fc3545f73b42aac20509640b7a1889

See more details on using hashes here.

File details

Details for the file dmlx-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dmlx-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.17

File hashes

Hashes for dmlx-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 700c9e7ed717421fa49d9f4cb7186183a8c38bdcc8fe03deaa597eef531e6b1f
MD5 598928538f6d164a51fd2310bee8cf24
BLAKE2b-256 dd143ea3d88f97b898f9b61d59323bcbd3a847b6ef32c9005bcb6ec0219e0173

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page