Setup for training Tensorflow models on SLURM clusters.
Project description
scoach
A setup for training Tensorflow models on SLURM clusters
How does it work?
- Inputs needed (see examples directory):
- A
.jsonfile with parameters for training - A
.jsonfile with the model definition - A
.pyfile with the training code. - There's a CLI app for interacting with scoach
- Run
scoach initfor setting up your configuration file, such as inconfig_example.yaml - On the login machine at the SLURM cluster, run
scoach start. This will start a daemon that will then launch jobs as requested. - On any machine, you can do
scoach run submitto submit jobs. - This will upload the Python script to MinIO and submit the configurations to the database.
- The new runs are consumed by the daemon process, which then uses Jinja2 to render the training script and submit it to the cluster.
- The training script is then run on the cluster, using Dask workers, that will grow as needed.
- A
To do
- Add option
--localonscoach startfor launching runs locally - Add support for uploading/managing datasets
- No Python script duplicates
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scoach-0.1.9.tar.gz
(25.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
scoach-0.1.9-py3-none-any.whl
(41.6 kB
view details)
File details
Details for the file scoach-0.1.9.tar.gz.
File metadata
- Download URL: scoach-0.1.9.tar.gz
- Upload date:
- Size: 25.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.11 CPython/3.8.11 Linux/5.10.0-8-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
676247719174ab68e0c020da2297af79dc77e782f0042ae4bb617bee5f0dcc8f
|
|
| MD5 |
45344c3874ad257cafc72a58e5cc6820
|
|
| BLAKE2b-256 |
cc08029bcc7b4a131b52d4efcc5980a04a4701e9d7151ff2c5d7f68a89da22c2
|
File details
Details for the file scoach-0.1.9-py3-none-any.whl.
File metadata
- Download URL: scoach-0.1.9-py3-none-any.whl
- Upload date:
- Size: 41.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.11 CPython/3.8.11 Linux/5.10.0-8-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
581856a87072a4d00fbbedfa4a78dc0800ab019e61cb8aed4873c258b51ddf81
|
|
| MD5 |
2351f3861287b4d8293e44ac1dfa7f39
|
|
| BLAKE2b-256 |
a2c4257558efc68c55a36b01f6e51b61c443233a45aa5a45d9b8e6ecb52b452f
|