Skip to main content

Ensure code traceability in ML experiments

Project description

CodeStamper

CodeStamper aims to help the user in ensuring traceability between ML experiments and code.

Description

When running ML experiments one would want to be able to replicate a past experiment at any point in time. One aspect to achieve this(although not the only one) is to be able to run the exact same code version.

When things go wrong. An ML experiment is started but it might not be reproducible in the future because:

  • The experiment does not contain any information related to the code with which it was produced
  • Code modifications were staged but not commited
  • One modified file was missed when commiting
  • The code is commited, but the code never gets pushed
  • The experiment does not contain exact information related to the python enviroment used. Even if all the code is versioned re-running the same experiment 8 months from now might not work the same if the python package versions have changed(APIs/implementations of different algorithms might have changed).

CodeStamper to the rescue. It can:

  • Log information related to last git commit
  • Log any local changes not caught in a commit.
  • Log contents of commits not already Pushed
  • Log current python enviroment state
  • Prevent running experiments before having all the local modifications versioned on git.

Installing

pip install CodeStamper

Examples

Enforce a clean workspace

from codestamper import Gitstamp

GitStamp().raise_if_dirty()

Log the current code state

from codestamper import Gitstamp

GitStamp().log_state('./experiment/code_log', modified_as_patch=True, unpushed_as_patch=True)
📁experiments/code_log
|--🗎 code_state.json
|--🗎 mod.patch
|--🗎 unpushed<git-commit>-<git-commit>.patch
  • code_state.json
{
  "date": "03/08/2022 21:10:34",
  "git": {
    "hash": "75c88ba",
    "user": "git-usernmae",
    "email": "your-email-here@gmail.com"
  },
  "node": {
    "username": "gitpod",
    "node": "bmsan-gitstamp",
    "system": "Linux",
    "version": "#44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022",
    "release": "5.15.0-41-generic"
  },
  "python": {
    "version": "3.8.13 (default, Jul 26 2022, 01:36:30) \n[GCC 9.4.0]",
    "pip_packages": {
      "argon2-cffi": "21.3.0",
      "argon2-cffi-bindings": "21.2.0",
        .....
    }
  }
}
  • mod.patch

Contains modifications(staged/or unstaged) of git tracked files

The modifications can be applied in an workspace over the commit hash mentioned in the code_state.json

# Make sure we are at the right commit
git checkout <git.hash from code_state.json>

# Add uncommited changes to the workspace
git apply mod.patch
  • unpushed<last_pushed_commit_hash>-<last_unpushed_commit_hash>.patch

Contains the delta between the current commit and last pushed commit. This should be used only in the unlikely event when the unpushed commits get lost. It should be considered an experimental last resort feature.

# Make sure we are at the right commit
git checkout <last_pushed_commit_hash>

# Add uncommited changes to the workspace
git apply unpushed<last_pushed_commit_hash>-<last_unpushed_commit_hash>.patch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codestamper-0.1.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

codestamper-0.1.0-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file codestamper-0.1.0.tar.gz.

File metadata

  • Download URL: codestamper-0.1.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for codestamper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1a7a8ccbbd80bed1e724074945f8c7091028f849711d561654c8672c06898898
MD5 2abbc66d4fb7c1fdf4c6be90bf13e4d3
BLAKE2b-256 59cc9622937d4ff468af3fd4a00d9c9c6e11c480c6fec0e8c66aa6b0332fad7f

See more details on using hashes here.

File details

Details for the file codestamper-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codestamper-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for codestamper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ed638dd1cf48b845e685eafe81afa5260d0c5ed9312bf95bc6942cb2772d92a
MD5 efd72c288fa463299d29fdfdb8ab021d
BLAKE2b-256 08eafd29b90c866929cc7cd32db14f79c52bea7289034da5fd6d2e35b7612293

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page