Skip to main content

Ensure code traceability in ML experiments

Project description

CodeStamper

Reliability Rating Maintainability Rating Quality Gate Status Code Coverage CI status Docs Pylint

CodeStamper aims to help the user in ensuring traceability between ML experiments and code.

1.1. Description

When running ML experiments one would want to be able to replicate a past experiment at any point in time. One aspect to achieve this(although not the only one) is to be able to run the exact same code version.

1.1.1. When things go wrong. An ML experiment is started but it might not be reproducible in the future because:

Issue CodeStamper's solution
The experiment does not contain any information related to the code with which it was produced ✅ Logs information related to last git commit
Code modifications were staged but not commited or not all modified files were commited ✅ Logs any local changes not caught in a commit as patches that can be restored.
✅ Can prevent running experiments before having all the local modifications versioned on git.
The code is commited, but the code never gets pushed ✅Can log contents of commits not already Pushed
The experiment does not contain exact information related to the python enviroment used.
Even if all the code is versioned re-running the same experiment 8 months from now might not work the same if the python package versions have changed(APIs/implementations of different algorithms might have changed).
✅ Logs current python environment state

1.2. Installing

pip install codestamper

1.3. Examples

1.3.1. Enforce a clean workspace

from codestamper import Gitstamp

GitStamp().raise_if_dirty()

1.3.2. Log the current code state

from codestamper import Gitstamp

GitStamp().log_state('./experiment/code_log', modified_as_patch=True, unpushed_as_patch=True)
📁experiments/code_log
|--🗎 code_state.json
|--🗎 mod.patch
|--🗎 unpushed<git-commit>-<git-commit>.patch
|--🗎 pip-packages.txt
|--🗎 conda_env.yaml
|--🗎 poetry.lock
  • code_state.json
{
  "date": "03/08/2022 21:10:34",
  "git": {
    "hash": "75c88ba",
    "user": "git-usernmae",
    "email": "your-email-here@gmail.com"
  },
  "node": {
    "username": "gitpod",
    "node": "bmsan-gitstamp",
    "system": "Linux",
    "version": "#44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022",
    "release": "5.15.0-41-generic"
  },
  "python": {
    "version": "3.8.13 (default, Jul 26 2022, 01:36:30) \n[GCC 9.4.0]",
    "pip_packages": {
      "argon2-cffi": "21.3.0",
      "argon2-cffi-bindings": "21.2.0",
        
    }
  }
}
  • mod.patch

Contains modifications(staged/or unstaged) of git tracked files

The modifications can be applied in an workspace over the commit hash mentioned in the code_state.json

# Make sure we are at the right commit
git checkout <git.hash from code_state.json>

# Add uncommited changes to the workspace
git apply mod.patch
  • unpushed<last_pushed_commit_hash>-<last_unpushed_commit_hash>.patch

Contains the delta between the current commit and last pushed commit. This should be used only in the unlikely event when the unpushed commits get lost. It should be considered an experimental last resort feature.

# Make sure we are at the right commit
git checkout <last_pushed_commit_hash>

# Add uncommited changes to the workspace
git apply unpushed<last_pushed_commit_hash>-<last_unpushed_commit_hash>.patch
  • pip-packages.txt

Contains a list of pip packages and their versions as seen by the pip freeze command. If the project is using conda or poetry using the env files generate for them is prefered.

  • conda_env.yaml

If Conda is used, this file will be present and will contain the exported Conda env in yaml format. The enviroment can be recreated using : conda env create -n ENVNAME --file conda_env.yml

  • poetry.lock

The file will be generated if the project is using poetry as package manager.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codestamper-0.3.1.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

codestamper-0.3.1-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file codestamper-0.3.1.tar.gz.

File metadata

  • Download URL: codestamper-0.3.1.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for codestamper-0.3.1.tar.gz
Algorithm Hash digest
SHA256 9a1c6ffda6ee3fcf2b6523a0405d467af8d8bb07455c6cab02ed453a969e2e84
MD5 5fbf71fea6637a9810537c87a7c6a26b
BLAKE2b-256 8d9d7c3ba918984be80161fdca8f14d4c442458d3a8da1759baa3ac79b0bd03e

See more details on using hashes here.

File details

Details for the file codestamper-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: codestamper-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for codestamper-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3cb33fc35020ffdb5d9091c8b9a9b648c9b854e6351982722717adbc5789611c
MD5 82416c4dd85aee608c93d91cabad5559
BLAKE2b-256 e16e0c6e4e5e5ccdf6399843e62e447d7896e267255ca8ea90a72be2089fb88f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page