Skip to main content

This package is aimed at machine/deep learning practitioners. It helps keeping track of experiments, offering a consistent and automatized way to store and keep track of experiment results, code snapshot backups and more. It also exposes a terminal user interface to search and see previously ran experiments.

Reason this release was yanked:

error: requires python >= 3.8 instead of 3.7

Project description

That Metric Timeline (TMT) ⌚

That Metric Timeline (TMT) is a Python library aimed at the machine/deep learning practitioner/researcher. This library aims to:

  • Help the user to keep track of experiments, their metrics, their results and the code used to produce them;
  • Provide an easy-to-use and KISS based library to the user;
  • Do the bare-minimum (i.e., no fancy web interfaces, plotting and so on), but do it (hopefully) right;
  • Give an open-source experiment tracking library to the community which is free both as in "free beer" and "free speech".

So many times when working on a research project I've seen the number of experiments skyrocketing in the blink of an eye...and then when we come back to our project after "some time off", we may have lost track of where the results are, what was the code that produced them and so on.
This library can hopefully be a solution to all of this :)

Installation

The library is available on PyPI, and can be installed with

pip install ThatMetricTimeline

After a successful installation, a binary tmt_tui will be available in your path. This is the library terminal user interface (TUI). More on this later.

Note

The library is currently in its first development stages. Everything should work (please, open an issue if something doesn't) but documentation is lacking (this README is basically the only documentation there is at the moment).

Usage

tmt can keep track of your experiments. Every experiment will be saved as an entry in a json database, with results and code snapshot backups[^1] saved in different folders. By default, tmt will:

  • Create a .tmt directory in your current working directory;
  • Create a .tmt/tmt_db.json file which will be used as a database;
  • Create a .tmt/snapshots directory, where code snapshot backups will be saved. A symlink .tmt/snapshots/last will also be created and will always point to the last snapshot taken. See the Snapshots section for more details.

Should you want to change where all of this is saved, check the Custom configuration.
Code examples are provided in the examples folder in this repository. More examples will be added in the future.

[^1]: Code backup is taken as a "snapshot". It means that the first time it will be a standard copy of all your files (you can provide a .gitignore file to ignore files). The following backups will only copy new and changed files, while everything else is copied as a hard-link (therefore not occupying space on your disk).

Tracking experiments

While more optional features will probably come in the future, the library goal is to be simple, both for the user and for the library developer.
The main function exposed by tmt is actually the tmt_recorder decorator. This is what we use to actually store and keep track of experiments.

The decorator takes a name parameter (and a few optional more). The experiment will be saved and later searched with this name.
The decorated function might return a dictionary with the metrics the user wishes to save for later retrieval. If you don't want to save any metric, the function must return None or an empty object (e.g. {}).

from tmt import tmt_recorder

@tmt_recorder(name="some_experiment")
def train_and_predict(x_tr, y_tr, x_te, y_te):
    lr = LogisticRegression()
    lr.fit(x_tr, y_tr)
    preds = lr.predict(x_te)
    return {'f1': f1_score(y_te, preds), 'accuracy': accuracy_score(y_te, preds)}

The other key function tmt exposes is tmt_save. This function should be called by the user to save any kind of pickable object, at any time.
If we wanted to save the predictions in the example above, we would do:

from tmt import tmt_recorder, tmt_save

@tmt_recorder(name="some_experiment_with_data")
def train_and_predict(...):
    ...
    preds = lr.predict(x_te)
    tmt_save(preds, name='lr_predictions')
    return {'f1': f1_score(y_te, preds), 'accuracy': accuracy_score(y_te, preds)}

As you can see, we give a name to the saved object as well. This should make it easier to recognize what this pickled object refers to.

TUI

Searching and looking at experiments

tmt offers a terminal user interface (TUI) which should be installed in your path when you pip install the library. You can access the TUI by typing:

tmt_tui

You will be presented with the following old-fashioned interface (who doesn't love the 90s?): main_tui

You can move around with the arrow (or the tab and shift+tab) keys. You can then search by name in this interface: search_tui

Once you select an experiment you can see some details about it: experiment_tui

The Search experiment by date functionality is not implemented yet and will come in a future release. You can however use the TmtManager (see the next section).

Loading and using tracked experiments in your code

tmt offers a minimalistic TmtManager helper class, which can help you load an experiment in your code, load pickled results, see metrics etc.
Once you have the ID (or a unique name for your experiment) you can:

from tmt import TmtManager

# Let's say we know there is an experiment with id "example"


# An Entry is a row in the database, i.e. an experiment that was tracked.
manager = TmtManager()
manager.set_entry_by_id('example') 

# load the results and unpickle them
for name, path in manager.results_paths():
    with open(path, 'rb') as f:
        # do stuff with your results. If it's a pickle it's 
        # more convenient to use the code block below this one
        res = pickle.load(f)

# load the unpickled results
for name, res in manager.load_results():
    # do something with your results.
    # if res is a numpy array...
    print(res.mean())


for name, val in manager.get_metrics():
    print(f"{name}: {val}")

Should you need it, you can access the "low level" database manager from the manager.db member.

    # If you need to do other stuff, like searching for 
    # experiments between two datetimes and so on
    # you can access the `db` member like
    manager.db.get_entries_greater_than_date(date_or_timestamp)

Snapshots

Every time you track an experiment with tmt_recorder, a code snapshot backup will be saved (by default in .tmt/snapshots). This means that:

  • the first time you use the library in your project, a simple copy of your project is made (by default, this is the current working directory (cwd) from which you launch the experiment);
  • subsequent backups will only copy new and different files, while hard-linking all other files. This limits the space taken on your disk;
  • by default, the library will look for a .gitignore file in your cwd and ignore (i.e., not copy) all files listed in there (the PathSpec library is used for gitignore parsing);
  • a symlink pointing to the last snapshot taken is created (and updated everytime) in .tmt/snapshots/last.

You can change the default paths by using a Custom configuration file.

Custom configuration

tmt can be used as-is and does not require any configuration file. By default, everything the library needs or save is stored in a .tmt hidden directory, in the current working directory(cwd). If your cwd changes often for different experiments, or if you want to specify which folder is backed up and more, you may want to create and specify a custom configuration file.

To do so, create a config.json file: if you're fine with using the cwd/.tmt directory, place this file in cwd/.tmt/config.json. This way, you won't have to specify the path to this configuration file to library related functions.
The configuration file has the following structure

{
    // tmt_dir specifies the path where code snapshots and 
    // results will be saved. You may use an absolute 
    // path as well
    "tmt_dir": ".example",

    // this is the folder we will take a snapshot of 
    // for every experiment  
    "snapshot_source": ".", 

    // snapshot_target is where code snapshots will be 
    // saved. It will be joined with tmt_dir. So in this 
    // case the target will be .examples/snapshot_example
    "snapshot_target": "snapshot_example",

    // this path will be a symlink to the last snapshot
    // taken. Same rules as for snapshot_target apply
    "last_snapshot_link": "snapshot_example/last",

    // this might actually be any file with a .gitignore 
    // syntax. These files will be ignored and not backupped
    "gitignore_path": "path/to/.gitignore",

    // the two paths below are for the db and the results
    // directory, respectively. Same rules apply as for 
    // snapshot_target, so path will be .example/tmt_db.json
    "json_db_path": "tmt_db.json",
    "results_path": "results"
}

As mentioned, if you save this file in .tmt/config.json, no other action is necessary and tmt will pick it up and use it for its configuration.
If instead you save it somewhere else, say /config/path/config.json, you will have to specify this path in the code. When recording experiments:

@tmt_recorder('custom_config', config_path='/config/path/config.json')
def with_custom_config():
    x, y = make_classification()
    lr = LogisticRegression()
    lr.fit(x, y)
    preds = lr.predict(x)
    return {'f1': f1_score(y, preds), 'accuracy': accuracy_score(y, preds)}

And when managing experiments:

manager = TmtManager('/config/path/config.json')
# do your stuff

Final remarks

This project was mainly developed because I wanted to have a simple library to keep track of the machine learning experiments I run for my papers in my PhD (and also, because I had fun :D).
I'm well aware that there are other options at the moment such as Weights&Biases and ModelChimp, but I wanted something that was as straightforward and as simple as possible.
The aim of this project should be to keep a KISS approach, making it easier for other researchers to tinker with the library code or to make adjustments.
For this reason, there is not much automation provided, and, for instance, navigating snapshots is left to the user: when you want to quickly check the code for many experiments this might be a pain and I indeed plan to add a better and more automated way to do that.
That said, most of the other library functionalities will remain as they are now.

If you have any suggestions, feedbacks or problems, feel free to open an issue or a pull request here on Github.
Cheers!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ThatMetricTimeline-0.1.2.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ThatMetricTimeline-0.1.2-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file ThatMetricTimeline-0.1.2.tar.gz.

File metadata

  • Download URL: ThatMetricTimeline-0.1.2.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for ThatMetricTimeline-0.1.2.tar.gz
Algorithm Hash digest
SHA256 473fbca855610348873045565ad7815cd1c26586b7d1df67b7de18c11148568a
MD5 2b665b648bbec279b0098b84a7ad8dac
BLAKE2b-256 9a23f40a27f23ccd6d312fce6a44bd7212794b5e1ce9d4e5c2b76c3165223419

See more details on using hashes here.

File details

Details for the file ThatMetricTimeline-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ThatMetricTimeline-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8fb39c4ed0f7fcfb05abdb49544dff2f6cd9a8a72698d2f574e6e343a2883321
MD5 c03c8da348888a9581906c24eedc258a
BLAKE2b-256 4e5a2adbf070f118238e4eb857321cd859ede1241cf21028d4b09ba1f1c0f4a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page