YOLO datasets, training queue, runs analytics — workspace-first CLI
Project description
Russian version: docs/ru/README.md
Smart Train (smartrain)
A CLI toolkit for preparing YOLO datasets, training models, running queues, and analyzing runs.
Quick start
Requirements: Python 3.10+.
git clone <repo-url>
cd smart-train
pip install -e .
Work from the project root (current directory):
smartrain deploy
smartrain scan
smartrain fusion --dataset ds_a --dataset ds_b --classes "class_a,class_b"
smartrain train --data 2026-01-01_12-00-00-merged --device 0 -y
What's included
- Single entry point:
smartrain(modulesmartrain.cli). - Single-workspace model:
raw_data/,datasets/,runs/,analytics/,models/,inference/,tmp/. - Pipeline support:
scan -> fusion -> train -> analyze. - Additional tools:
queue,registry,report,model,normalize-data-yaml,migrate-models,clearml-upload,plot,cvat,sahi,heatmap,orient.
How it works
smartrain uses a single workspace root and builds a process around file contracts:
scansynchronizes sources and updates the dataset catalog;fusiongenerates the final dataset for training;traincreates a run directory with metrics and metadata;analyzeandregistrywork on artifacts inruns/.
Key commands
| Command | Purpose |
|---|---|
smartrain deploy |
Initialize the workspace structure |
smartrain scan |
Synchronize sources and update the dataset catalog |
smartrain fusion |
Build the final training dataset |
smartrain train |
Train and validate YOLO models |
smartrain inference |
Run inference on folder or dataset split and save JSON report |
smartrain queue / smartrain queue-run |
Manage and run the command queue |
smartrain analyze |
Summaries, run comparison, PR curves, and inference benchmarks |
smartrain registry |
Catalog run artifacts and promoted models |
Documentation
Current documentation is organized into sections in docs/:
- Documentation navigation
- Getting started and core workflows
- CLI guide
- API and format reference
- Architecture and diagrams
Testing
pip install -e ".[dev]"
pytest
Important details
- Interactive mode starts only when a command is launched with zero arguments (TTY required).
- Interactive dataset commands:
fusion,augment,balance,stats,roi,orient,inference; plustrain. - Dataset cleanup command:
prune(prune emptyfor empty pairs,prune dedupfor duplicate images by content). - If any arguments are provided but required ones are missing, commands return a clear "incomplete arguments" error instead of interactive prompts.
- Command help now includes practical
Examples/Quick examplesblocks for common workflows. smartrain balancepresets:--preset weights-safefor conservative balancing--preset rfs-aggressivefor stronger tail upsampling--preset hybrid-defaultas a general default
smartrain balanceeval splits:--eval-coverageis on by default (keepsval/testnon-empty when possible and improves class coverage there); use--no-eval-coverageto disable. The interactive wizard asks for this option.- For
hash --validate:0for a match,1for a mismatch,2for an error. - By default, the workspace queue uses
queue.txtandtmp/status.txt. - Device selection in
trainandinference:--device 0to force GPU 0--device cputo force CPU- If
--deviceis omitted, default isGPU 0when CUDA is available, otherwisecpu
train resumerecovery behavior:- failed resume attempts are persisted in
training_metadata.json(resume_attempts) - if
train/weights/last.ptis still present after failure, run remains resumable for the next retry - run discovery for
resume/analyze/registryincludes runs with core train artifacts even when metadata is missing
- failed resume attempts are persisted in
- PyTorch CUDA policy:
- default target is CUDA 12.8 wheels (
cu128) - if current environment already has
torchwith CUDA13.x,smartrainkeeps it and does not downgrade - to apply policy in the current environment:
smartrain deps sync-torch
- default target is CUDA 12.8 wheels (
- Dependency extras:
pip install -e ".[dev]"for development and testingpip install -e ".[clearml]"for ClearMLpip install -e ".[sahi]"for SAHI
Common workflows
Scanning with an explicit source list:
smartrain scan --datasets-list /path/to/workspace/raw_data/datasets_list.txt
Check dataset hash:
smartrain hash --dataset my_dataset
smartrain hash /path/to/dataset --validate a1b2c3d4
Starting a queue without opening a GUI terminal:
smartrain queue run --no-gui
Quick run overview:
smartrain analyze scan
smartrain analyze export-table -o runs_summary.csv
Train and inference with explicit device:
smartrain train --data my_dataset --model yolo11n.pt --device 0
smartrain inference --model-name my_model --data-mode folder --source-dir ./images --device cpu
Running long jobs over SSH (tmux)
For long training runs on a remote server, use tmux so the job survives SSH disconnects.
Install tmux once (Ubuntu/Debian example):
sudo apt-get update
sudo apt-get install -y tmux
Minimal workflow:
tmux new -s smartrain-train
smartrain train --data my_dataset --model yolo11n.pt --device 0
- Detach without stopping the training:
Ctrl+B, thenD - Re-attach after reconnecting:
tmux attach -t smartrain-train - Stop training gracefully from attached session:
Ctrl+C - Close an unused session:
tmux kill-session -t smartrain-train
You can also use helper scripts from scripts/:
./scripts/tmux_train_start.sh --session smartrain-train -- smartrain train --data my_dataset --model yolo11n.pt --device 0
./scripts/tmux_train_attach.sh --session smartrain-train
./scripts/tmux_train_stop.sh --session smartrain-train
Optional: keep a file log while preserving live console output:
./scripts/tmux_train_start.sh --session smartrain-train -- bash -lc 'smartrain train --data my_dataset --model yolo11n.pt --device 0 2>&1 | tee -a runs/train.log'
Operations quick recipes
Check active tmux sessions:
tmux ls
See whether training process is still alive in session:
tmux list-panes -t smartrain-train -F '#{pane_current_command} #{pane_pid}'
Recover live console output after reconnect:
tmux attach -t smartrain-train
If already attached elsewhere, force re-attach:
tmux attach -d -t smartrain-train
Graceful stop and cleanup:
./scripts/tmux_train_stop.sh --session smartrain-train
tmux kill-session -t smartrain-train
FAQ (tmux over SSH)
Session exists, but no new output appears. What to check first?
- Re-attach with force detach:
tmux attach -d -t smartrain-train - Check current pane command:
tmux list-panes -t smartrain-train -F '#{pane_current_command} #{pane_pid}' - If your training wrote logs via
tee, inspect the log file (for exampleruns/train.log).
I accidentally closed SSH. Did training stop?
- Usually no, if it was started inside
tmux. - Reconnect and run:
tmux ls, thentmux attach -t smartrain-train.
Ctrl+C does not stop the run from my current shell.
- Ensure you are attached to the right
tmuxsession/window first. - Or send interrupt explicitly:
./scripts/tmux_train_stop.sh --session smartrain-train.
How to quickly find the latest training logs?
- Example pattern:
ls -lt runs | headtail -n 200 runs/train.log(if you usedtee -a runs/train.log)
How to clean up stale tmux sessions?
- List sessions:
tmux ls - Remove one:
tmux kill-session -t <session> - Remove all server sessions (careful):
tmux kill-server
Developers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smartrain-0.0.2.tar.gz.
File metadata
- Download URL: smartrain-0.0.2.tar.gz
- Upload date:
- Size: 288.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7653562b585d3c0a49777a65ccf4bd891ae128418fe1626ca38df1106e136f8
|
|
| MD5 |
ca9986f868a9e36baa682bb4503a87d4
|
|
| BLAKE2b-256 |
cdf41743503248e5b299e0d25ac2dbcb22f4cd3623c016c218814969c5a9f9bb
|
File details
Details for the file smartrain-0.0.2-py3-none-any.whl.
File metadata
- Download URL: smartrain-0.0.2-py3-none-any.whl
- Upload date:
- Size: 274.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3aa2a36a56c15f2c03abf81a7777482a61cc602ea84c338ed56caaa11ea38c2
|
|
| MD5 |
12afc890330778f09e7ae992925b0846
|
|
| BLAKE2b-256 |
4036462b08d02b12f152d50b5080cf643dfa7306d48842ffa35f1a34a588b210
|