Skip to main content

An EXPERIMENTAL checkpoint decorator for Metaflow

Project description

Metaflow Checkpoint

Imagine running a machine learning training job or any data processing task that takes hours or even days to complete. In such scenarios, you don't want failures or collaboration complexities to force you to start over and lose all the progress made. This is where Metaflow's new decorators—@checkpoint, @model, and @huggingface_hub—come into play. These decorators are specifically designed to address these challenges by simplifying checkpointing, model management, and efficient loading of external models, ensuring that your long-running jobs can be resumed seamlessly after a failure and that models and checkpoints are properly versioned in multi-user environments.

This repository introduces three new decorators for Metaflow that address these challenges:

  • @checkpoint: Simplifies saving and reloading checkpoints within your Metaflow flows.
  • @huggingface_hub: Enables efficient loading and caching of large models from Hugging Face Hub.
  • @model: Allows for easy saving and loading of models created during your Metaflow flows.

Examples for these decorators can be found in this repository.

Features

@checkpoint Decorator

The @checkpoint decorator alleviates the pain points associated with saving and reloading the state of your program (a Metaflow @step) in Metaflow flows. It also handles version control in multi-user settings by isolating checkpoints per user and run. Whether it's a checkpoint created by a machine learning model or intermediate data required in case of crashes, this decorator simplifies state management and failure recovery.

  • Checkpointing: Save the state of your @step at designated points.
  • Seamless Recovery: Restart your job from the last checkpoint upon retries without any manual intervention.
  • User Isolation: Checkpoints are managed per user to prevent overwriting in collaborative environments.
  • Ease of Use: Minimal code changes required to implement checkpointing.

@huggingface_hub Decorator

The @huggingface_hub decorator allows you to load large models from Hugging Face Hub and cache them for increased performance benefits. It also ensures that models are versioned and managed appropriately in multi-user environments.

  • Efficient Model Loading: Load models on-the-fly from Hugging Face Hub.
  • Caching Mechanism: Cache models locally to avoid redundant downloads.
  • Version Control: Manages different versions of models to prevent conflicts.
  • Integration with Metaflow: Easily incorporate models across your Metaflow flows.

@model Decorator

The @model decorator provides a trivial way to save and load models/checkpoints created as part of your Metaflow flow.

  • Simplified Model Loading: Automatically load models based on references and identifiers created by decorators such as @model/@checkpoint/@huggingface_hub.
  • Model Identity: Associates a uniquie identity to models so that there is clear distinction between different versions making it easy to track their lineage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaflow_checkpoint-0.2.10.tar.gz (83.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metaflow_checkpoint-0.2.10-py2.py3-none-any.whl (108.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file metaflow_checkpoint-0.2.10.tar.gz.

File metadata

  • Download URL: metaflow_checkpoint-0.2.10.tar.gz
  • Upload date:
  • Size: 83.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metaflow_checkpoint-0.2.10.tar.gz
Algorithm Hash digest
SHA256 f957f2f1fc5d203000eaf906386cf75739c6370ce2240542dfc0e18c270d9c56
MD5 63ff65f2b361eaacf39d09384c3bbce9
BLAKE2b-256 90ab3a0c5b58c6e9e9ef1fd777222799e1df7d99d6177a73c6b5dfa3fe9a519f

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_checkpoint-0.2.10.tar.gz:

Publisher: publish.yaml on outerbounds/metaflow-checkpoint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file metaflow_checkpoint-0.2.10-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for metaflow_checkpoint-0.2.10-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 677bf1db2aeda88ca17e5d2c6eff4235cb78244823fbe4844f75cf1d43500974
MD5 a859ce9cacd3f6e3ff7f18a8d9ae588a
BLAKE2b-256 b637f4b66b1a71251bb9c095f110aeca58b7bca5526e9ca97ab9c9cb2970c556

See more details on using hashes here.

Provenance

The following attestation bundles were made for metaflow_checkpoint-0.2.10-py2.py3-none-any.whl:

Publisher: publish.yaml on outerbounds/metaflow-checkpoint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page