YOLO datasets, training queue, runs analytics — workspace-first CLI

Project description

Russian version: docs/ru/README.md

Smart Train (`smartrain`)

A CLI toolkit for preparing YOLO datasets, training models, running queues, and analyzing runs.

Quick start

Requirements: Python 3.10+.

git clone <repo-url>
cd smart-train
pip install -e .

Work from the project root (current directory):

smartrain deploy
smartrain scan
smartrain fusion --dataset ds_a --dataset ds_b --classes "class_a,class_b"
smartrain train --data 2026-01-01_12-00-00-merged --device 0 -y

What's included

Single entry point: smartrain (module smartrain.cli).
Single-workspace model: raw_data/, datasets/, runs/, analytics/, models/, inference/, tmp/.
Pipeline support: scan -> fusion -> train -> analyze.
Additional tools: queue, registry, report, model, normalize-data-yaml, migrate-models, clearml-upload, plot, cvat, sahi, heatmap, orient.

How it works

smartrain uses a single workspace root and builds a process around file contracts:

scan synchronizes sources and updates the dataset catalog;
fusion generates the final dataset for training;
train creates a run directory with metrics and metadata;
analyze and registry work on artifacts in runs/.

Key commands

Command	Purpose
`smartrain deploy`	Initialize the workspace structure
`smartrain scan`	Synchronize sources and update the dataset catalog
`smartrain fusion`	Build the final training dataset
`smartrain train`	Train and validate YOLO models
`smartrain inference`	Run inference on folder or dataset split and save JSON report
`smartrain queue` / `smartrain queue-run`	Manage and run the command queue
`smartrain analyze`	Summaries, run comparison, PR curves, and inference benchmarks
`smartrain registry`	Catalog run artifacts and promoted models

Documentation

Current documentation is organized into sections in docs/:

Testing

pip install -e ".[dev]"
pytest

Important details

Interactive mode starts only when a command is launched with zero arguments (TTY required).
Interactive dataset commands: fusion, augment, balance, stats, roi, orient, inference; plus train.
Dataset cleanup command: prune (prune empty for empty pairs, prune dedup for duplicate images by content).
If any arguments are provided but required ones are missing, commands return a clear "incomplete arguments" error instead of interactive prompts.
Command help now includes practical Examples / Quick examples blocks for common workflows.
smartrain balance presets:
- --preset weights-safe for conservative balancing
- --preset rfs-aggressive for stronger tail upsampling
- --preset hybrid-default as a general default
smartrain balance eval splits: --eval-coverage is on by default (keeps val/test non-empty when possible and improves class coverage there); use --no-eval-coverage to disable. The interactive wizard asks for this option.
For hash --validate: 0 for a match, 1 for a mismatch, 2 for an error.
By default, the workspace queue uses queue.txt and tmp/status.txt.
Device selection in train and inference:
- --device 0 to force GPU 0
- --device cpu to force CPU
- If --device is omitted, default is GPU 0 when CUDA is available, otherwise cpu
train resume recovery behavior:
- failed resume attempts are persisted in training_metadata.json (resume_attempts)
- if train/weights/last.pt is still present after failure, run remains resumable for the next retry
- run discovery for resume/analyze/registry includes runs with core train artifacts even when metadata is missing
PyTorch CUDA policy:
- default target is CUDA 12.8 wheels (cu128)
- if current environment already has torch with CUDA 13.x, smartrain keeps it and does not downgrade
- to apply policy in the current environment: smartrain deps sync-torch
Dependency extras:
- pip install -e ".[dev]" for development and testing
- pip install -e ".[clearml]" for ClearML
- pip install -e ".[sahi]" for SAHI

Common workflows

Scanning with an explicit source list:

smartrain scan --datasets-list /path/to/workspace/raw_data/datasets_list.txt

Check dataset hash:

smartrain hash --dataset my_dataset
smartrain hash /path/to/dataset --validate a1b2c3d4

Starting a queue without opening a GUI terminal:

smartrain queue run --no-gui

Quick run overview:

smartrain analyze scan
smartrain analyze export-table -o runs_summary.csv

Train and inference with explicit device:

smartrain train --data my_dataset --model yolo11n.pt --device 0
smartrain inference --model-name my_model --data-mode folder --source-dir ./images --device cpu

Running long jobs over SSH (tmux)

For long training runs on a remote server, use tmux so the job survives SSH disconnects.

Install tmux once (Ubuntu/Debian example):

sudo apt-get update
sudo apt-get install -y tmux

Minimal workflow:

tmux new -s smartrain-train
smartrain train --data my_dataset --model yolo11n.pt --device 0

Detach without stopping the training: Ctrl+B, then D
Re-attach after reconnecting: tmux attach -t smartrain-train
Stop training gracefully from attached session: Ctrl+C
Close an unused session: tmux kill-session -t smartrain-train

You can also use helper scripts from scripts/:

./scripts/tmux_train_start.sh --session smartrain-train -- smartrain train --data my_dataset --model yolo11n.pt --device 0
./scripts/tmux_train_attach.sh --session smartrain-train
./scripts/tmux_train_stop.sh --session smartrain-train

Optional: keep a file log while preserving live console output:

./scripts/tmux_train_start.sh --session smartrain-train -- bash -lc 'smartrain train --data my_dataset --model yolo11n.pt --device 0 2>&1 | tee -a runs/train.log'

Operations quick recipes

Check active tmux sessions:

tmux ls

See whether training process is still alive in session:

tmux list-panes -t smartrain-train -F '#{pane_current_command} #{pane_pid}'

Recover live console output after reconnect:

tmux attach -t smartrain-train

If already attached elsewhere, force re-attach:

tmux attach -d -t smartrain-train

Graceful stop and cleanup:

./scripts/tmux_train_stop.sh --session smartrain-train
tmux kill-session -t smartrain-train

FAQ (tmux over SSH)

Session exists, but no new output appears. What to check first?

Re-attach with force detach: tmux attach -d -t smartrain-train
Check current pane command: tmux list-panes -t smartrain-train -F '#{pane_current_command} #{pane_pid}'
If your training wrote logs via tee, inspect the log file (for example runs/train.log).

I accidentally closed SSH. Did training stop?

Usually no, if it was started inside tmux.
Reconnect and run: tmux ls, then tmux attach -t smartrain-train.

Ctrl+C does not stop the run from my current shell.

Ensure you are attached to the right tmux session/window first.
Or send interrupt explicitly: ./scripts/tmux_train_stop.sh --session smartrain-train.

How to quickly find the latest training logs?

Example pattern:
- ls -lt runs | head
- tail -n 200 runs/train.log (if you used tee -a runs/train.log)

How to clean up stale tmux sessions?

List sessions: tmux ls
Remove one: tmux kill-session -t <session>
Remove all server sessions (careful): tmux kill-server

Developers

Project details

Release history Release notifications | RSS feed

This version

0.0.2

Apr 26, 2026

0.0.1

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartrain-0.0.2.tar.gz (288.4 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smartrain-0.0.2-py3-none-any.whl (274.1 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file smartrain-0.0.2.tar.gz.

File metadata

Download URL: smartrain-0.0.2.tar.gz
Upload date: Apr 26, 2026
Size: 288.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for smartrain-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`b7653562b585d3c0a49777a65ccf4bd891ae128418fe1626ca38df1106e136f8`
MD5	`ca9986f868a9e36baa682bb4503a87d4`
BLAKE2b-256	`cdf41743503248e5b299e0d25ac2dbcb22f4cd3623c016c218814969c5a9f9bb`

See more details on using hashes here.

File details

Details for the file smartrain-0.0.2-py3-none-any.whl.

File metadata

Download URL: smartrain-0.0.2-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 274.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for smartrain-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3aa2a36a56c15f2c03abf81a7777482a61cc602ea84c338ed56caaa11ea38c2`
MD5	`12afc890330778f09e7ae992925b0846`
BLAKE2b-256	`4036462b08d02b12f152d50b5080cf643dfa7306d48842ffa35f1a34a588b210`

See more details on using hashes here.

smartrain 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Smart Train (`smartrain`)

Quick start

What's included

How it works

Key commands

Documentation

Testing

Important details

Common workflows

Running long jobs over SSH (tmux)

Operations quick recipes

FAQ (tmux over SSH)

Developers

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

smartrain 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Smart Train (smartrain)

Quick start

What's included

How it works

Key commands

Documentation

Testing

Important details

Common workflows

Running long jobs over SSH (tmux)

Operations quick recipes

FAQ (tmux over SSH)

Developers

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Smart Train (`smartrain`)