Open-source framework for defining, running, and benchmarking robot training missions.
Project description
Odyssey
Status: alpha (v0.1.0-alpha.1). The API, CLI, schemas, and wire protocols are still subject to change without notice. See
docs/for the design refs.
Install
[!TIP] Linux only — install build dependencies before proceeding (needed by
.[all]):sudo apt update && sudo apt install build-essential python3-dev -y
git clone https://github.com/lovellai-dev/odyssey.git
cd odyssey
python3 -m venv .venv
source .venv/bin/activate
pip install -e . # CLI, validate, mock runs (lightweight)
pip install -e ".[all]" # real training + evaluation (torch, robosuite…)
pip install -e ".[all,dev]" # + pytest, ruff, mypy
The base install pulls in pydantic, click, pyyaml, and aiosqlite — enough to
run validate, list, status, and run --use-mock-runner against any
mission spec without a GPU. .[all] adds everything needed for real training
and evaluation runs.
Quick start (no GPU, no network)
# Validate the mission spec
$ odyssey validate examples/quickstart-openvla/mission.yaml
OK examples/quickstart-openvla/mission.yaml
spec version : 0.1
tasks : 1 training, 1 evaluation
# Run the full mission with a CPU mock (no GPU needed)
$ odyssey run examples/quickstart-openvla/mission.yaml --use-mock-runner
...
{"ts": "...", "event": "mission.completed", "overall_grade": 1.0}
COMPLETED c1756bad855e45cc9a95b5b0566c948b
overall_grade : 1.000
--use-mock-runner swaps in the CPU mock for every task, so this works on a
laptop without a GPU. Inspect runs afterward with odyssey list and
odyssey status <id>. State is persisted to ~/.odyssey/missions.db;
artifacts under ~/.odyssey/runs/<mission-id>/<task-id>/.
What it is
You train an agent by describing a mission in YAML — a robot, a model, a dataset to train on, an
evaluation benchmark to score against — and odyssey run walks it through the
full lifecycle: load → validate → execute training tasks → execute the
evaluation task → persist results. Local-mode by default; the hosted Lovell
services (leaderboard, learning graph, hosted runners) are optional layers
that land in later releases.
Launching a training mission
Two training paths ship today: GR00T (NVIDIA Isaac GR00T) and OpenVLA.
Both run through odyssey run <mission.yaml> — pick the quickstart that matches
your model.
GR00T (Isaac-GR00T + Isaac Lab)
Fine-tunes nvidia/GR00T-N1.7-3B on the LeRobot-format demo set that ships
inside the Isaac-GR00T repo (no separate download), evaluated in the Isaac Lab
cube-lift environment.
Prerequisites:
- Install the upstream Isaac-GR00T package — it carries the training entry
point (
gr00t.experiment.launch_finetune) and the demo dataset. Accept NVIDIA's weight license:git clone https://github.com/NVIDIA/Isaac-GR00T.git /srv/Isaac-GR00T pip install -e /srv/Isaac-GR00T export ISAAC_GR00T_REPO_PATH=/srv/Isaac-GR00T # resolves the demo dataset
- For the Isaac Lab evaluation, install Isaac Lab and point Odyssey at its
launcher:
export ISAACLAB_PATH=/srv/IsaacLab # provides isaaclab.sh
Run:
odyssey run examples/quickstart-gr00t/mission.yaml
The mission routes its training task to the GR00T runner with
config: { runner: gr00t } — OpenVLA and GR00T both serve wildcard training
tasks, so the family is selected explicitly.
OpenVLA (Bridge V2 + Robosuite)
Prerequisites:
- Install the training extras:
pip install -e ".[huggingface,openvla,robosuite]"
- Clone the upstream OpenVLA repo and install its dependencies (needed for
draccusand the fine-tuning script):git clone https://github.com/openvla/openvla.git /srv/openvla pip install -e /srv/openvla export OPENVLA_REPO_PATH=/srv/openvla
- Download the Bridge V2 dataset in RLDS format (~124 GB):
wget -r -nH --cut-dirs=4 --reject="index.html*" \ https://rail.eecs.berkeley.edu/datasets/bridge_release/data/tfds/bridge_dataset/ mv bridge_dataset bridge_orig
Set--data_root_dirto the parent directory containingbridge_orig/.
Run:
odyssey run examples/quickstart-openvla/mission.yaml
Hardware: 24 GB GPU (RTX 4090-class or better) for the OpenVLA LoRA fine-tune.
[!NOTE] GCP users: single-GPU VMs require
export NCCL_NET=Socketbefore running, to bypass Google's NCCL plugin. See issue #5 for details.
[!NOTE] Evaluation: the Robosuite runner auto-wires an OpenVLA→robosuite-action adapter (
make_openvla_policyinrunners/models/openvla.py) when no custompolicy_factoryis injected — it loads either a LoRA adapter or a full merged checkpoint, so eval works without extra glue. Full episode-completion validation on a real GPU is still in progress.
Known-good OpenVLA stack
The fine-tune runs through the cloned OpenVLA repo, which carries its own
dependency set — most onboarding friction comes from there, not from Odyssey.
Mixing versions surfaces as protobuf / TensorFlow / tensorflow-metadata
conflicts or draccus import errors. Known-good versions (from OpenVLA's own
requirements — treat its repo as the source of truth):
Python 3.10
torch 2.2.0
torchvision 0.17.0
transformers 4.40.1
tokenizers 0.19.1
timm 0.9.10
flash-attn 2.5.5
To avoid re-downloading the 7B base model each run, point its path env var at a
local copy (HF id upper-cased, / and - → _, suffixed _PATH):
export OPENVLA_OPENVLA_7B_PATH=/path/to/openvla-7b # for base: openvla/openvla-7b
Dataset: how source: oxe / ref: bridge_orig resolves
Odyssey does not download the dataset — oxe is a pass-through. The runner
forwards two values to OpenVLA's finetune.py, which loads via TFDS/RLDS:
| mission.yaml | becomes the flag | meaning |
|---|---|---|
dataset.ref: bridge_orig |
--dataset_name bridge_orig |
the OXE registry key OpenVLA looks up |
config.data_root_dir: <path> |
--data_root_dir <path> |
the parent dir containing the RLDS dataset folder |
⚠️ Naming gotcha: the registry key and the on-disk folder name can differ.
In validation, ref: bridge_orig resolved to data under ~/bridge_dataset/1.0.0/,
so data_root_dir had to point at the parent of that folder — not the key
name. Check where your download actually landed and set data_root_dir to its
parent.
Weights & Biases (W&B)
OpenVLA's finetune.py calls wandb.init() unconditionally, so a run stalls or
fails if W&B isn't reachable. Control it yourself:
# Disable for local / smoke runs:
export WANDB_MODE=disabled
# Or log to your account, then pass project/entity via mission config:
# config: { wandb_project: my-project, wandb_entity: my-entity }
Any config: key Odyssey doesn't consume is forwarded verbatim as
--<key> <value> to finetune.py.
What to expect during a run
Timing varies widely with hardware, disk, and network — treat these as orientation, not promises:
- Base model download —
openvla-7b(~14 GB) on first run, unlessOPENVLA_OPENVLA_7B_PATHis set. - Dataset load / indexing — Bridge V2 (~124 GB); RLDS indexing on a cold cache takes a while.
- Training startup — model load + LoRA wrap, then steps begin.
- Steady state — throughput logs as
it/s(~1.49 it/s on an NVIDIA L4 for the quickstart config).
If a stage seems stuck, it's almost always a download in progress or a dataset-path / W&B issue rather than a training bug — check those first.
Multi-agent (PILOT + SPECIALIST)
Show setup & how it works
HuggingFace login (gated models)
The models pulled from the Hub are gated — you must accept each model's
license on its HuggingFace page, then authenticate on the machine before the
first run, or the download fails with 401/403:
openvla/openvla-7b— the PILOTgoogle/gemma-4-E2B-it— the SPECIALIST in the multi-agent example (Apache-2.0, no gating)
huggingface-cli login # paste a token from https://huggingface.co/settings/tokens
# or, non-interactive (CI / headless VM):
export HF_TOKEN=hf_xxx # a read token on an account that accepted the licenses
A mission with a SPECIALIST agent (a task planner) in addition to the PILOT runs a plan-then-execute loop during eval: the SPECIALIST decomposes the instruction into sub-steps once per episode, and the PILOT executes each. Only the PILOT produces actions and only the PILOT is trained — the SPECIALIST is inference-only (it runs its base checkpoint to plan and has no training task).
robot:
agents:
- id: pilot
role: PILOT
model: { source: huggingface, base: openvla/openvla-7b }
- id: task-planner
role: SPECIALIST
model:
source: huggingface
base: google/gemma-4-E2B-it
quantization: int4
modality: multimodal
The SPECIALIST is a vision-grounded multimodal Gemma 4 planner: it sees the
first camera frame of each episode and grounds its plan in the scene. Gemma 4
needs a modern transformers + torchvision, which conflicts with OpenVLA's
pinned transformers==4.40.1, so the SPECIALIST must run out of process in a
separate venv. The PILOT stays in the main venv; the two talk over a JSON-lines
subprocess protocol (the planner runs once per episode, off the per-step hot loop).
Setting up the out-of-process SPECIALIST
-
Create the specialist venv (modern transformers + torchvision + Gemma deps):
python -m venv ~/specialist-venv ~/specialist-venv/bin/pip install -e ".[specialist]" \ -c constraints/specialist-known-good.txt
-
Point Odyssey at that venv's python. It is read per-process from the environment, so export it in every shell that runs a mission — or add it to your shell profile / VM startup script so it persists:
export ODYSSEY_SPECIALIST_PYTHON=~/specialist-venv/bin/python
ODYSSEY_SPECIALIST_PYTHONis required for any mission with a SPECIALIST. The planner is launched in that venv (RemotePlanner→python -m odyssey.runners.agents.planner_server). If it is unset, multi-agent eval fails fast with a clearRuntimeError: the multimodal Gemma 4 planner cannot load in the main venv, which pinstransformers==4.40.1for OpenVLA.
Quick check without a simulator (launches the planner in the specialist venv and prints a decomposition — no OpenVLA or simulator needed):
python tests/manual/smoke_remote_planner.py
Why Gemma 4, not Gemma 3, for multimodal. Gemma 3 4B emits NaN logits under int4 bitsandbytes on this stack (verified across eager/sdpa attention, text-only and with-image), so it can't run quantized here. Gemma 4 (Apache-2.0, ungated) loads cleanly in int4 and grounds plans in the scene image.
VRAM note. Both models still share the GPU — the venv split solves the dependency conflict, not VRAM. The SPECIALIST is pinned to GPU 0 (
device_map={"": 0}) so bitsandbytes never silently offloads layers to CPU. Gemma 4 E4B-it int4 (~9.3 GB) alongside bf16 OpenVLA (~14 GB) peaks at ~23 GB — tight on a 24 GB L4; drop to E2B-it for headroom (this is what the multimodal example mission uses).
Two known-good stacks. The main venv pins OpenVLA's stack (
constraints/openvla-known-good.txt: torch 2.2.0, transformers 4.40.1); the specialist venv pins a modern one with torchvision (constraints/specialist-known-good.txt). They no longer need to be mutually compatible.
CLI reference
| Command | What it does |
|---|---|
odyssey init [DIR] |
Scaffold a new mission directory. --template openvla|cpu_mock. |
odyssey validate <mission.yaml> |
Parse + validate a spec. Exits 0 if clean. |
odyssey run <mission.yaml> |
Execute end-to-end. --use-mock-runner for no-GPU smoke. |
odyssey list |
Recent missions from the local SQLite DB. --status to filter. |
odyssey status <mission_id> |
One mission's detail. Accepts an id prefix. |
All commands respect --db and --working-dir to override the
~/.odyssey/ defaults.
Status snapshot (v0.1.0-alpha.1)
| Area | Done | Deferred |
|---|---|---|
| Spec + validate | ✓ | — |
| Engine + lifecycle | ✓ | watchdog timers, materialized profiles |
| In-memory + SQLite persistence | ✓ | — |
| Provider ABCs + Local + HF | ✓ | OXE, Lovell-mode |
| CPU mock runner | ✓ | — |
| OpenVLA training runner | ✓ (validated on L4) | — |
| GR00T training runner | ✓ (validated on H100); task-level runner: gr00t routing |
— |
| Robosuite eval runner | ✓ (auto-wired OpenVLA adapter) | full GPU end-to-end validation |
| Isaac Lab eval runner | ✓ skeleton + tests, subprocess launch + ODYSSEY_* stdout protocol |
blessed eval script (GR00T/VLA recipe), real-Isaac smoke |
| Multi-agent eval (PILOT + SPECIALIST) | ✓ (out-of-process Gemma 4 planner) | full GPU end-to-end validation |
odyssey init / run / list / status / validate |
✓ | logs, publish |
| Leaderboard publish, Learning Graph, Anonymizer, Auth | — | post-v0.1.0-alpha.1 |
License
Apache License 2.0. See LICENSE.
Contributing
See CONTRIBUTING.md. DCO sign-off required on every commit. Open an issue before non-trivial PRs — the API surface is moving weekly until v0.1.0-alpha.1 freezes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lovell_odyssey-0.1.0a1.tar.gz.
File metadata
- Download URL: lovell_odyssey-0.1.0a1.tar.gz
- Upload date:
- Size: 5.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d296822478d1448225c80769cccdfe090d768a452b7b8d2612273a17df2ff072
|
|
| MD5 |
f46c621a1ca3492a0bdbc0db103629f6
|
|
| BLAKE2b-256 |
e893bc7d2c0b8101d06ecdc29d2ee64aec0e8ad980259819f50b795f216d158a
|
File details
Details for the file lovell_odyssey-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: lovell_odyssey-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 106.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e3db36a32207ec25efa0d2648125dfdb5eca02a08805eb17db4798916ae7d28
|
|
| MD5 |
74bb50a3f2fc1baf1e85ed9d803c70c0
|
|
| BLAKE2b-256 |
6222c378f0bf4aa90279778475b6a3aec595285a4217884473adaf70db98cffc
|