Yet Another ML Experiment Tracking Tool
Project description
yamlett - Yet Another Machine Learning Experiment Tracking Tool
What is yamlett
?
yamlett
provides a simple but flexible way to track your ML experiments.
It has a simple interface with only two primitives: Run
and Experiment
.
- A
Run
is used to store information about one iteration of yourExperiment
. You can use it to record any (BSON-serializable) information you want such as model parameters, metrics, or pickled artifacts. - An
Experiment
is a collection ofRun
objects. It has aname
and it is a wrapper around apymongo.collection.Collection
object (reference), meaning that you can query it usingfind
oraggregate
. Think of it as a way to collect all the modeling iterations for a specific project.
The main difference with other tracking tools (e.g. MLflow) is that yamlett
lets you save complex structured information using dictionaries or lists and filter on them later using MongoDB queries.
yamlett
is particularly useful if your experiments are configuration-driven. Once your configuration is loaded as a python object, storing it is as easy as run.store("config", config)
.
Installation
yamlett
can be installed with pip
:
pip install yamlett
It also requires a MongoDB instance that you can connect to. If you don’t have one and just want to try out yamlett
, we provide a docker compose file that starts a MongoDB instance available at localhost:27017
(along with instances of Presto and Metabase).
Getting started
In yamlett
, MongoClient
connection parameters can be passed as keyword arguments in both Run
and Experiment
to specify what MongoDB instance you want to connect to. If you don’t pass anything, the default arguments (localhost:27017
) will be used. If you have a custom MongoDB instance, you can specify its host
and port
when creating a Run
using run = Run(host="mymongo.host.com", port=27017)
.
Example
In this section, we compare the same model run but with two different tracking different approaches: MLflow-like vs yamlett.
Set up the experiment
First, let’s load a dataset for a simple classification problem that ships with scikit-learn.
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
Then, we create a logistic regression model and train that model on the iris dataset, increasing the number of iterations and changing the regularization strength.
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=200, C=0.1)
model.fit(X, y)
MLflow-like tracking
With yamlett
, you are free to organize you tracking information so you could decide to store it using a “flat” approach similar to MLflow where each key has an associated value and there can be no nesting.
from yamlett import Run
from sklearn.metrics import f1_score
run = Run()
# store some information about your trained model: its class and its parameters
run.store("params_model_class", model.__class__.__name__)
for param_name, param_value in model.get_params().items():
run.store(f"params_model_{param_name}", param_value)
# store information about your data
run.store("data_n_features", X.shape[0])
run.store("data_n_observations", X.shape[1])
# store the F1 score on the train data
run.store("metrics_train_f1_score", f1_score(y, model.predict(X), average="weighted"))
# you could even store a pickled version of your model
# run.store("model", pickle.dumps(model))
After running this code, we can retrieve the stored information by calling run.data
:
{'_id': '901c6823493d429cae4ddb84c91a7768',
'_yamlett': {'created_at': datetime.datetime(2020, 12, 5, 21, 36, 14, 17000),
'last_modified_at': datetime.datetime(2020, 12, 5, 21, 36, 14, 461000)},
'data_n_features': 150,
'data_n_observations': 4,
'metrics_train_f1_score': 0.9599839935974389,
'params_model_C': 0.1,
'params_model_class': 'LogisticRegression',
'params_model_class_weight': None,
'params_model_dual': False,
'params_model_fit_intercept': True,
'params_model_intercept_scaling': 1,
'params_model_l1_ratio': None,
'params_model_max_iter': 200,
'params_model_multi_class': 'auto',
'params_model_n_jobs': None,
'params_model_penalty': 'l2',
'params_model_random_state': None,
'params_model_solver': 'lbfgs',
'params_model_tol': 0.0001,
'params_model_verbose': 0,
'params_model_warm_start': False}
This approach is straightforward: one scalar for each key in the document. However, one downside is that you need to maintain your own namespace convention. For example here, we used underscores to separate the different levels of information (params, data, metrics, etc) but this can quickly get confusing if chosen incorrectly: is it params/model/fit_intercept
or params/model_fit/intercept
? It is also more work than needed when information already comes nicely organized (e.g. model.get_params()
).
yamlett
tracking
The method we propose in this package leverages Python dictionaries / NoSQL DB documents to automatically store your information in a structured way. Let’s see what it looks like using the same run as above:
from yamlett import Run
from sklearn.metrics import f1_score
run = Run()
# store your model information
model_info = {
"class": model.__class__.__name__,
"params": model.get_params(),
}
run.store(f"model", model_info)
# store information about your data
run.store("data", {"n_features": X.shape[0], "n_observations": X.shape[1]})
# store the F1 score on your train data
run.store("metrics.f1_score", f1_score(y, model.predict(X), average="weighted"))
# you could even store a pickled version of your model
# run.store("model.artifact", pickle.dumps(model))
Once again, let’s call run.data
and see what information we stored:
{'_id': 'b7736c7b3cc3439ca379e3e6a2b6d9b8',
'_yamlett': {'created_at': datetime.datetime(2020, 12, 5, 22, 43, 2, 446000),
'last_modified_at': datetime.datetime(2020, 12, 5, 22, 43, 2, 529000)},
'data': {'n_features': 150, 'n_observations': 4},
'metrics': {'f1_score': 0.9599839935974389},
'model': {'class': 'LogisticRegression',
'params': {'C': 0.1,
'class_weight': None,
'dual': False,
'fit_intercept': True,
'intercept_scaling': 1,
'l1_ratio': None,
'max_iter': 200,
'multi_class': 'auto',
'n_jobs': None,
'penalty': 'l2',
'random_state': None,
'solver': 'lbfgs',
'tol': 0.0001,
'verbose': 0,
'warm_start': False}}}
The run information is now stored in a document that can be easily parsed based on its structure. The top level keys of the document are data
, metrics
, and model
and we argue this makes it easier to find information than with long keys in a flat dictionary. For instance, you may want to look at all the metrics for a given run using run.data["metrics"]
.
{'f1_score': 0.9599839935974389}
Note that yamlett
does not impose the document hierarchy so you are free to organize your run data as you see fit. Additionally, because yamlett
is a light abstraction layer on top of MongoDB, you can query runs in an Experiment
using find
or aggregate
. For example, we can retrieve all runs in the default experiment for which:
- the model was fit with a bias term
- on a dataset with at least 3000 data points
- that yielded an F1 score of at least 0.9
from yamlett import Experiment
e = Experiment()
e.find(
{
"model.params.fit_intercept": True,
"data.n_observations": {"$gte": 3000},
"metrics.f1_score": {"$gte": 0.9},
}
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.