Flash — managed LoRA post-training (SFT/GRPO) for Freesolo environments, driven by the `flash` CLI
Project description
Flash
Managed LoRA post-training service: SFT and GRPO on managed GPUs across multiple providers — RunPod Flash (serverless queue; RTX 4090/5090 classes) and Vast.ai (rented verified-datacenter instances; L40S / RTX Pro 4000 / A100 classes). The allocator picks the cheapest GPU class that fits the run across both providers.
Scope
flash train <cfg.toml>/ control-planePOST /runs— submit a training job; one dedicated GPU per run, supervised server-side (stall watchdog, bounded auto-retry resuming from the last streamed checkpoint, endpoint GC).flash deploy,flash chat— serving for trained adapters.- Freesolo SDK environments. Every run names a Freesolo environment id.
Scaffold
environment.py, upload.or another folder withflash env push --name <name> <folder>, then reference the returned id. The worker loads it throughfreesolo.environments. There are no built-in task environments. Single-turn and bounded multi-turn environments are supported.
Layout
flash/catalog.py— curated model catalog (Qwen3 dense supported tier; Qwen3.5/3.6 experimental tier) +model_policy = "allow"VRAM-fit check + each model'sthinkingcapability (opt-in reasoning modethinking = true)flash/schema.py,flash/spec.py— TOML →JobSpecflash/runner.py— server-side run supervisor (durable job handle, retries, cost guard, endpoint GC)flash/providers/— RunPod Flash + Vast.ai provider subtrees (pricing, gpus, durable submit/poll, preflight) behind onebase.Providerprotocol, with a cross-providerallocator.pythat picks the cheapest fitting classflash/engine/— the on-GPU worker (TRL + colocated vLLM rollouts) and the shared recipe; SFT targets and RL rewards route through the active environment (task-specific grading lives with its example, not in the engine)flash/envs/— environment machinery: registry and the adapter that loads Freesolo SDK environments onto the worker's interfaceflash env setup— scaffold a starter local Freesolo env and a ready-to-run config to start fromflash/serve/,flash/server/— adapter serving and the FastAPI control plane (run operator-side via the separateflash-servercommand)flash/mcp/— stdio MCP bridge for coding agentsDockerfile— the control-plane image (used by the repo docker-compose)tests/— pytest suite (CPU-only; offline-by-default, no GPU/network)
Local commands
cd flash
uv sync --extra server
uv run pytest # CPU tests (offline-by-default, no GPU/network)
uv run ruff check . && uv run ruff format .
uv run flash --help
uv run flash-server # control plane (operator-side, run once)
The control plane owns provider credentials: RUNPOD_API_KEY is always required
(RunPod is the default substrate), VAST_API_KEY is opt-in (only checked when set),
plus the shared HF_TOKEN.
The artifact repo is per-run (the run TOML's [train] hf_repo), not an
operator-wide env var. Clients authenticate with their freesolo API key (flash login).
Serving From an API
flash chat is a CLI wrapper around the Flash control-plane chat endpoint. To call a
deployed adapter from your own app, deploy the finished run once and then POST chat
requests with your freesolo API key:
export FLASH_API_URL=https://flash.freesolo.co
export FREESOLO_API_KEY=fslo_...
export RUN_ID=flash-1782194170-ce1cfcff
curl -X POST "$FLASH_API_URL/v1/runs/$RUN_ID/deploy" \
-H "Authorization: Bearer $FREESOLO_API_KEY" \
-H "Content-Type: application/json" \
-d '{"dry_run": false}'
curl -X POST "$FLASH_API_URL/v1/runs/$RUN_ID/chat" \
-H "Authorization: Bearer $FREESOLO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Write a two-sentence summary of the run."}
],
"temperature": 0.0,
"max_tokens": 256
}'
The response uses the OpenAI chat-completions shape:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "..."
}
}
]
}
Use choices[0].message.content for the generated text. The run id is the adapter id
for serving. If the run is not deployed yet, /v1/runs/<run_id>/chat returns 409
with a hint to deploy first.
Operators can also call the Modal serving app directly after the adapter is registered.
The default serving app is https://clado-ai--freesolo-lora-serving.modal.run, and
operators can point Flash at another serving app by setting FREESOLO_SERVING_URL.
Use that same base URL when calling the app directly; pass the run id as model:
export FREESOLO_SERVING_URL=https://clado-ai--freesolo-lora-serving.modal.run
curl -X POST "$FREESOLO_SERVING_URL/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "flash-1782194170-ce1cfcff",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.0,
"max_tokens": 256
}'
Prefer the Flash control-plane endpoint for user apps because it enforces run ownership and forwards per-run serving options such as thinking-mode parity.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file freesolo_flash-0.2.18.tar.gz.
File metadata
- Download URL: freesolo_flash-0.2.18.tar.gz
- Upload date:
- Size: 593.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50e1f9f9fbb3b8a9a51ec672bf247a899b439c6b57c6da88e0d925458501844a
|
|
| MD5 |
0c7a028727fed47d5b6bd82884a29e88
|
|
| BLAKE2b-256 |
b8eba3cd0c25993746707679423a5f5e5115eda3ec48077af9dd951f75e350ea
|
File details
Details for the file freesolo_flash-0.2.18-py3-none-any.whl.
File metadata
- Download URL: freesolo_flash-0.2.18-py3-none-any.whl
- Upload date:
- Size: 286.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d17ae6acc84c46912717bee83ae441fd7eebcdba45357ab250d46541a253d6cb
|
|
| MD5 |
f4f9392829c6e4882a0df43ba9c9f56d
|
|
| BLAKE2b-256 |
507df33d7830e7da297869cad9539a002fd8992e7cc07e51905ad959148c1045
|