Skip to main content

A library for modular, fast, and reproducible ML experimentation built for R&D.

Project description

ModularML Banner

Modular, fast, and reproducible ML experimentation built for R&D.

Python PyPI codecov Docs License

ModularML is a flexible, backend-agnostic machine learning framework for designing, training, and evaluating machine learning pipelines, tailored specifically for research and scientific workflows. It enables rapid experimentation with complex model architectures, supports domain-specific feature engineering, and provides full reproducibility through configuration-driven declaration.

ModularML provides a plug-and-play ecosystem of interoperable components for data preprocessing, sampling, modeling, training, and evaluation — all wrapped in a unified experiment container.

ModularML Overview Diagram

Figure 1. Overview of the ModularML framework, highlighting the three core abstractions: feature set preprocessing and splitting, modular model graph construction, and staged training orchestration.

Key Concepts and Features

FeatureSet & FeatureSetView

  • FeatureSet is the primary user-facing container for structured data. It tracks features/targets/tags, reversible transforms, and named splits.
  • FeatureSetView gives a lightweight view into a FeatureSet (rows + selected columns) so you can feed exactly the slices required for a training phase.

Splitters & Samplers

  • Built-in splitters (e.g., random, rule-based) generate labeled splits from any FeatureSet.
  • Samplers consume FeatureSets or views and emit BatchViews in the shape required by the model. They support stratification, grouping, triplets/pairs, and custom roles so you can express experiment-specific batching without re-implementing the training loop.

Models & Wrappers

  • Use your own PyTorch or TensorFlow models, select from pre-exiting templates, or wrap third-party estimators. ModularML provides backend wrappers (Torch, TensorFlow, scikit-learn) so any supported model exposes a consistent forward API and reports its backend.

ModelGraph and Node-based Connectivity

  • ModelNode attaches a wrapped model to an upstream FeatureSet or node, handles building, freezing, and optimizer wiring.
  • MergeNode (e.g., ConcatNode) combines outputs from multiple nodes when you need multi-branch architectures.
  • ModelGraph is the DAG that ties everything together. It resolves head/tail nodes, executes topological forward/backward passes, mixes backends, and lets you switch between stage-wise or global training with a single call.

AppliedLoss

  • AppliedLoss instances bind user-defined loss functions to nodes within the ModelGraph. They carry labels, weights, and node scopes so multi-objective training is easy to configure from a phase or experiment.

Experiment Phases

  • TrainPhase runs iterative training with your sampler schedule, losses, callbacks, and optimizer configuration.
  • FitPhase (single-pass) is ideal for algorithms that expect a one-shot .fit() (e.g., scikit-learn estimators) after upstream neural components are frozen.
  • EvalPhase executes forward passes and records losses/metrics on held-out splits without touching gradients.

Experiment Class

  • The Experiment binds FeatureSets, ModelGraph, and all phases. It owns execution order, logging, callbacks, and results objects so every run is reproducible. Execution strategies (e.g., cross validation) simply wrap an Experiment to replay the same plan across folds.

Serialization

  • A core focus of ModularML is reproducibility. To that end, all major classes (FeatureSets, ModelGraph, phases, experiments, losses, samplers, optimizers, callbacks) implement configuration/state serialization
  • All model definitions, training/sampling logic, evaluation, etc is structured under a single Experiment object, allowing for exporting and sharing via a single .mml file.

Callbacks & Checkpointing

  • Built-in callbacks (EarlyStopping, Evaluation + metrics, custom progress hooks) plug directly into Train/Fit/Eval phases, allowing for fully flexibile workflows while retaining a structured experiment API.
  • Checkpointing can be attached at any major experiment or training execution step to persist model weights, optimizer states, FeatureSet transforms, and sampler cursors, making restarts seamless.

Getting Started

Requires Python >= 3.9

Installation

Install from PyPI:

pip install modularml

To install the latest development version:

pip install git+https://github.com/REIL-UConn/modular-ml.git

Explore More

  • Explanation – Conceptual material that explains why ModularML is structured the way it is.
  • How-To – Deep dive on core components of the ModularML framework.
  • Tutorial – Explore complete walkthroughs of solving common machine learning tasks with ModularML.
  • Full API Reference – API reference, component explanations, configuration guides, and tutorials.
  • Discussions – Join the community, ask questions, suggest features, or share use cases.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modularml-1.0.0.tar.gz (853.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modularml-1.0.0-py3-none-any.whl (436.0 kB view details)

Uploaded Python 3

File details

Details for the file modularml-1.0.0.tar.gz.

File metadata

  • Download URL: modularml-1.0.0.tar.gz
  • Upload date:
  • Size: 853.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for modularml-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a3d8f0e44571ee3b8fbff692bb52295979721aa19f1de30b8ce0070acb538979
MD5 0e98caaf11a1a805e7259213e18a096d
BLAKE2b-256 15be1f0f69898e04a0e5d28f246f1f9a577d9b36e3ca410207aa57e5b38c9e84

See more details on using hashes here.

File details

Details for the file modularml-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: modularml-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 436.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for modularml-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d118548e4d4437821cf5a9695e0d6283ecee8f06b16b9276954a26528b702e6
MD5 2434d86cf6b3adbe545013dfa4b08397
BLAKE2b-256 1ff661ef4440744e53ca6ceca3d4a1fe24262fa9bd71a7e7aa9d126f59d26dd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page