A framework for managing machine learning experiments.
Project description
Keep track of your machine learning experiments with ScalarStop.
ScalarStop is a Python framework for reproducible machine learning research.
It was written and open-sourced at Neocrym, where it is used to train thousands of models every week.
ScalarStop can help you:
organize datasets and models with content-addressable names.
save/load datasets and models to/from the filesystem.
record hyperparameters and metrics to a relational database.
System requirements
ScalarStop is a Python package that requires Python 3.8 or newer.
Currently, ScalarStop only supports tracking tf.data.Dataset datasets and tf.keras.Model models. As such, ScalarStop requires TensorFlow 2.5.0 or newer.
We encourage anybody that would like to add support for other machine learning frameworks to ScalarStop. :)
Installation
ScalarStop is available on PyPI.
If you are using TensorFlow on a CPU, you can install ScalarStop with the command:
python3 -m pip install scalarstop[tensorflow]
If you are using TensorFlow with GPUs, you can install ScalarStop with the command:
python3 -m pip install scalarstop[tensorflow-gpu]
Development
If you would like to make changes to ScalarStop, you can clone the repository from GitHub.
git clone https://github.com/scalarstop/scalarstop.git
cd scalarstop
python3 -m pip install .
Usage
Read the ScalarStop Tutorial to learn the core concepts behind ScalarStop and how to structure your datasets and models.
Afterwards, you might want to dig deeper into the ScalarStop Documentation. In general, a typical ScalarStop workflow involves four steps:
1. Organize your datasets with scalarstop.datablob.
2. Describe your machine learning model architectures using scalarstop.model_template.
3. Load, train, and save machine learning models with scalarstop.model.
4. Save hyperparameters and training metrics to a SQLite or PostgreSQL database using scalarstop.train_store.
Contributing to ScalarStop
We warmly welcome contributions to ScalarStop. Here are the technical details for getting started with adding code to ScalarStop.
Getting started
First, clone this repository from GitHub. All development happens on the main branch.
git clone https://github.com/scalarstop/scalarstop.git
Then, run make install to install Python dependencies in a Poetry virtualenv.
You can run make help to see the other commands that are available.
Checking your code
Run make fmt to automatically format code.
Run make lint to run Pylint and MyPy to check for errors.
Generating documentation
Documentation is important! Here is how to add to it.
Generating Sphinx documentation
You can generate a local copy of our Sphinx documentation at scalarstop.com with make docs.
The generated documentation can be found at docs/_build/dirhtml. To view it, you should start an HTTP server in this directory, such as:
make docs
cd docs/_build/dirhtml
python3 -m http.server 5000
Then visit http://localhost:5000 in your browser to preview changes to the documentation.
If you want to use Sphinx’s ability to automatically generate hyperlinks to the Sphinx documentation of other Python projects, then you should configure intersphinx settings at the path docs/conf.py. If you need to download an objects.inv file, make sure to update the make update-sphinx command in the Makefile.
Editing the tutorial notebook
The main ScalarStop tutorial is in a Jupyter notebook. If you have made changes to ScalarStop, you should rerun the Jupyter notebook on your machine with your changes to make sure that it still runs without error.
Running unit tests
Run make test to run all unit tests.
If you want to run a specific unit test, try running python3 -m poetry run python -m unittest -k {name of your test}.
Unit tests with SQLite3
If you are running tests using a Python interpreter that does not have the SQLite3 JSON1 extension, then TrainStore unit tests involving SQLite3 will be skipped. This is likely to happen if you are using Python 3.8 on Windows. If you suspect that you are missing the SQLite3 JSON1 extension, the Django documentation has some suggestions for how to fix it.
Unit tests with PostgreSQL
By default, tests involving PostgreSQL are skipped. To enable PostgreSQL, run make test in a shell where the environment variable TRAIN_STORE_CONNECTION_STRING is set to a SQLAlchemy database connection URL–which looks something like "postgresql://scalarstop:changeme@localhost:5432/train_store". The connection URL should point to a working PostgreSQL database with an existing database and user.
The docker-compose.yml file in the root of this directory can set up a PostgreSQL instance on your local machine. If you have Docker and Docker Compose installed, you can start the PostgreSQL database by running docker-compose up in the same directory as the docker-compose.yml file.
Measuring test coverage
You can run make test-with-coverage to collect Python line and branch coverage information. Afterwards, run make coverage-html to generate an HTML report of unit test coverage. You can view the report in a web browser at the path htmlcov/index.html.
Credits
ScalarStop’s documentation is built with Sphinx using @pradyunsg’s Furo theme and is hosted by Read the Docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scalarstop-3.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 537173f82ab1e97cbc6cef1f7e0fad375f08632237956cf6fd0684084a81f3c2 |
|
MD5 | b2e67450ba541d9c51b70cb986f4eeba |
|
BLAKE2b-256 | 02fa08a55b22ff27dce448ca4fe7bf16915206def4c501917540a2d9be1ca320 |