Heterogeneous Multi-Robot Ad-Hoc Teamwork — benchmark + safety stack.
Project description
Contact-rich coordination with opaque, heterogeneous teammates —
with explicit safety assumptions and conformal-CBF reporting.
CONCERTO is the method.
CHAMBER is the benchmark.
We evaluate CONCERTO on CHAMBER.
Status — pre-release, Phase 0. Architecture is locked in 15 ADRs (13 Accepted, 2 RFC) under the working policy recorded in
adr/ADR-INDEX.md; the staged Phase-0 spike protocol (ADR-007) is the validation gate that promotes Accepted ADRs to Validated with per-axis ≥20 pp evidence. The Stage-1 (AS + OM) preregistrations are the next launch; the leaderboard fills with M5. The public API is on0.x— MINOR bumps may break it per SemVer §4. See Roadmap.
TL;DR. CONCERTO is a three-layer safety stack — exponential
CBF‑QP, conformal-slack overlay, OSCBF inner filter, hard
braking fallback — for robots that must work with opaque,
heterogeneous teammates they were never trained with. CHAMBER is
the matching benchmark — six heterogeneity sub-axes above
ManiSkill v3, fixed-format communication with URLLC-anchored
degradation profiles, a partner zoo, and an ISO 10218-2:2025-aware
safety-reporting format. Open from day one, ADR-tracked design
contract, preregistered spikes, byte-identical CPU determinism via
uv.lock + a root_seed.
Table of contents
Quickstart
30-second smoke test.
git clone https://github.com/fsafaei/concerto.git
cd concerto
pip install uv && uv sync --group dev --group train
# Smoke test the rig (ADR-001 acceptance criterion).
uv run pytest -m smoke -x -v
Install groups.
--group devpulls the developer toolchain (ruff, pyright, pytest). HARL ships as theharl-ahtdistribution (the CONCERTO fork atfsafaei/harl-fork; see ADR-002 §Revision-history 2026-05-19 and #132) and is pulled automatically as a runtime dependency, so the ego-AHT trainer + frozen-HARL partner work out of the box from a source checkout — no separate train-group install needed. Theconcerto-multirobotdistribution ships to TestPyPI for the 0.x line; the production-PyPI debut is staged in the Release workflow.
Compose a factory-floor channel (URLLC-anchored degradation profile
from ADR-006) and round-trip a packet through encode → decode:
from chamber.comm import (
CommDegradationWrapper,
FixedFormatCommChannel,
URLLC_3GPP_R17,
)
channel = CommDegradationWrapper(
FixedFormatCommChannel(),
URLLC_3GPP_R17["factory"],
tick_period_ms=1.0,
root_seed=0,
)
state = {
"pose": {
"ego": {"xyz": (0.0, 0.0, 0.0), "quat_wxyz": (1.0, 0.0, 0.0, 0.0)},
},
"task_state": {"ego": {"grasp_side": "left"}},
}
# The factory profile delays each packet by ~5 ticks; drain the queue so
# the visible packet carries the freshly-encoded state.
for _ in range(10):
packet = channel.encode(state)
decoded = channel.decode(packet)
print("decoded payload:", decoded)
Save the snippet to quickstart.py and run uv run python quickstart.py.
The six pre-registered URLLC profiles — ideal, urllc,
factory, wifi, lossy, saturation — are the Stage‑2
CM sweep table. See
docs/how-to/run-spike.md for the full
flow. For the bigger picture, jump to
Architecture at a glance.
Architecture at a glance
Two top-level packages, one wheel. CHAMBER (benchmark) wraps
ManiSkill v3 and provides the six heterogeneity axes, the
communication stack, the partner zoo, and the evaluation harness.
CONCERTO (method) provides the safety stack and the ego-AHT training
loop. Dependency direction is one-way: chamber → concerto.
flowchart LR
subgraph CHAMBER["CHAMBER · benchmark"]
direction TB
ENVS["envs<br/>ManiSkill v3 wrappers<br/>AS · OM · CR"]
COMM["comm<br/>fixed-format channel +<br/>URLLC degradation · CM"]
PART["partners<br/>partner zoo · PF<br/>heuristic / frozen-RL / VLA"]
EVAL["evaluation<br/>HRS · prereg · leaderboard"]
BENCH["benchmarks<br/>Stage-0/1/2/3 spike runners"]
end
subgraph CONCERTO["CONCERTO · method"]
direction TB
SAFETY["safety<br/>exp CBF-QP + conformal +<br/>OSCBF + braking · SA"]
TRAIN["training<br/>ego-AHT loop +<br/>deterministic seeding"]
API["api<br/>public Protocols"]
end
ENVS --> BENCH
COMM --> BENCH
PART --> BENCH
BENCH --> EVAL
BENCH -- ego-policy --> TRAIN
TRAIN -- filtered actions --> SAFETY
SAFETY -. consumes .-> API
CHAMBER -- "depends on (one-way)" --> CONCERTO
The six axis labels in parentheses tie each module to the heterogeneity sub-axis it exercises; see the six heterogeneity axes for the per-axis pre-registered ≥20 pp gap rule.
Why this exists
Real factories already pair robots that were never trained together. A 500 Hz industrial arm next to a 50 Hz mobile base; a vision-only manipulator next to a force-feedback one; a vendor‑A controller next to a vendor‑B controller under binding ISO 10218-2:2025. At deployment time, your robot's teammate is opaque (no policy access), heterogeneous (different morphology and action frequency), and ad hoc (no prior joint training). Hospitals and warehouses are the same picture.
Most multi-robot benchmarks assume identical embodiments and shared training. The few that don't focus on planning or navigation, not on contact-rich physical manipulation. The intersection of Heterogeneity × Black-box partner × Safety × Manipulation is empty in the published literature. CHAMBER is built to fill it, and CONCERTO is the first method designed against this four-aspect contract; empirical validation is staged through CHAMBER spikes (Stage 1 → Stage 3) per ADR-007 §Decision.
How we sit relative to the closest prior work
Every prior precedent covers at most three of the four aspects. The table below lists the closest precedent for each pair of aspects; no published row hits all four. Click any precedent to open the paper.
| Method | Heterogeneous | Black-box partner | Safety bound | Contact-rich manipulation |
|---|---|---|---|---|
| Liu 2024 RSS (LLM‑AHT) | ✓ | ✓ | ||
| COHERENT (LLM‑MR planning) | ✓ | ✓ | ||
| Huriot & Sibai 2025 (conformal CBF) | ✓ | ✓ | ||
| HetGPPO / HARL (heterogeneous MARL) | ✓ | |||
| Wang et al. 2017 (multi‑robot CBFs) | ✓ | ✓ | ||
| RoCoBench (multi‑robot manipulation) | ✓ | ✓ | ||
| SafeBimanual (safe bimanual manip.) | ✓ | ✓ | ||
| CONCERTO + CHAMBER | ✓ | ✓ | ✓ | ✓ |
Reading the table. Heterogeneous here is the four-aspect literature-gap level; CHAMBER's six measurable sub-axes (AS, OM, CR, CM, PF, SA) decompose it further per ADR-007.
Read the table by columns to see what each aspect covers in isolation, and by rows to see what no single line of work has yet combined. Contact-rich manipulation appears with multi-robot coordination (RoCoBench) and with safety (SafeBimanual), but never with black-box ad-hoc partners under explicit safety assumptions at the same time. CONCERTO + CHAMBER occupy the four-aspect intersection at the design-contract level (ADRs, scaffold, smoke test); empirical validation across the six heterogeneity sub-axes is the staged Phase-0 spike protocol's job (Stage 1: AS + OM → Stage 2: CR + CM → Stage 3: PF + SA), with results landing on the leaderboard from M5 onward.
See adr/ADR-007 for the
six-axis taxonomy that defines "heterogeneous" precisely, and the
docs/explanation/why-aht.md page for
the long-form positioning.
The six heterogeneity axes CHAMBER measures
| Axis | Symbol | What it varies | Where the priors come from |
|---|---|---|---|
| Action space | AS | 7‑DOF arm vs 2‑DOF mobile base on shared task | HARL, HetGPPO |
| Observation modality | OM | vision-only vs vision + force/torque + proprioception | Visual-tactile peg-in-hole literature |
| Control rate | CR | 500 Hz arm vs 50 Hz base, chunk size held constant | RTC, A2C2, FAVLA |
| Communication | CM | latency 1–100 ms, jitter µs–10 ms, drop 10−6–10−2 | 3GPP R17, URLLC |
| Partner familiarity | PF | trained-with vs frozen-novel partner, mid-episode swap | FCP, MEP |
| Safety | SA | mixed-vendor force-limit / SIL-PL pairs, contact-rich | ISO 10218-2:2025 |
Every surviving axis is required to clear a pre-registered
≥20 pp homogeneous-vs-heterogeneous gap before it ships in
the v1 benchmark. See
adr/ADR-007 for the
staged Phase‑0 spike protocol (Stage 1: AS + OM,
Stage 2: CR + CM, Stage 3: PF + SA).
Repository layout
src/
├── concerto/ # the METHOD (cite this)
│ ├── safety/ # exp CBF-QP + conformal overlay + OSCBF + braking fallback
│ ├── training/ # ego-AHT training loop + deterministic seeding
│ ├── policies/ # Phase-1 trained checkpoints
│ └── api/ # public Protocols
└── chamber/ # the BENCHMARK (run this)
├── envs/ # ManiSkill v3 wrappers
├── comm/ # fixed-format channel + URLLC degradation
├── partners/ # partner zoo (heuristic / frozen-RL / VLA stubs)
├── tasks/ # CHAMBER-Solo / Duo / Quartet (Phase 1+)
├── evaluation/# HRS, pre-registration, leaderboard renderer
└── benchmarks/# Stage-0/1/2/3 spike runners
adr/ # 15 Architecture Decision Records (the design rationale)
docs/ # Diátaxis: tutorials / how-to / reference / explanation
tests/ # unit / property / integration / smoke / reproduction
spikes/ # pre-registration YAMLs + result archives
Leaderboard
Stage‑0 acceptance results; rendered by
chamber-render-tables after each tagged spike.
Stage 1 (AS + OM) rows land with M5 — see
Roadmap.
Show placeholder table
| Method | Stage 0 success | Inter-robot collision | Force-limit violation | Conformal λ mean | Reference |
|---|---|---|---|---|---|
| MAPPO (homogeneous baseline) | pending | pending | pending | n/a | M5 |
| HetGPPO + naive CBF | pending | pending | pending | n/a | M5 |
| CONCERTO | pending | pending | pending | pending | M5 |
Submit a new entry: docs/how-to/submit-leaderboard.md.
Who this is for
Multi-robot RL researchers — CHAMBER is the first benchmark to
score ad-hoc teamwork at the manipulation tier with a measurable
heterogeneity-robustness score (HRS). Start with
docs/tutorials/hello-spike.md.
Safe-control researchers — CONCERTO's safety module is a
production-grade reference implementation of the
exp CBF + conformal + OSCBF stack with a hard
braking fallback. The unresolved theoretical question
(average-loss → per-step bound) is documented in
adr/ADR-004.
Robotics practitioners and integrators — CHAMBER's
communication profiles are anchored to 3GPP Release 17 URLLC and
5G-TSN industrial-trial data, and the safety axis references
ISO 10218-2:2025 directly. See
docs/explanation/threat-model.md.
Documentation
Full documentation: fsafaei.github.io/concerto
- Tutorials — step-by-step walkthroughs.
- How-tos — add a partner, add a safety filter, run a spike.
- API reference — generated from docstrings.
- ADR index — 15 design decisions with full rationale.
- Glossary — HRS, AoI, OSCBF, FCP/MEP, all defined.
- Literature — five-cluster bibliography (AHT/ZSC, safe control, conformal prediction, benchmarks, reproducibility).
- Standards — ISO 10218-2:2025 + IEC 62061 + IEEE TSN + 3GPP R17 references, with the axis → standard → metric → report-table flowchart.
- Evaluation — the multi-seed and rliable reporting contract for the leaderboard.
Roadmap
The project advances in three phases. Phase 0 (current) locks the design contract and runs the staged heterogeneity-axis spikes. Phase 1 ships the partner zoo and the populated leaderboard. Phase 2 expands tasks and adds the real-robot demo platform.
Now — Phase 0, design contract live, spikes about to start.
15 ADRs (13 Accepted, 2 RFC) under the status taxonomy in
adr/ADR-INDEX.md; open follow-up work is
tracked per-ADR via the footnote column. M1 (platform), M2 (comm),
and M4b (training stack) are merged on main. The chamber-spike
CLI runs the ego-AHT loop end-to-end against a Hydra config.
Next. Stage-1 spikes (AS + OM) — preregistered, launched, first leaderboard rows. arXiv design-report preprint (priority defence on the four-aspect framing). Stage-2 spikes (CR + CM).
Later. Stage-3 spikes (PF + SA) — possibly HIL for SA. Phase-1 leaderboard v1 (CONCERTO + 3 baselines on Tier-1 / Tier-2 tasks). Phase 2 (Tier-3 long-object tasks, real-robot demo platform).
Day-to-day progress: CHANGELOG.md and the issues board.
FAQ
How does CHAMBER differ from RoCoBench, SafeBimanual, or BiGym?
RoCoBench covers Heterogeneity × Manipulation on MuJoCo with multi-arm LLM-dialectic coordination but does not address black-box partners or formal safety bounds. SafeBimanual covers Safety × Manipulation on a single bimanual platform. BiGym is single-embodiment. CHAMBER targets the four-aspect intersection (H × B × S × M) at the substrate level — thin wrapper layers above ManiSkill v3, a fixed-format communication stack, and a partner zoo — rather than as a curated task set. See ADR-001 and ADR-005 for the simulator-base decision.
Is the safety guarantee per-step or asymptotic?
The conformal slack overlay (Huriot & Sibai 2025 Theorem 3) gives a distribution-free ε + o(1) long-term average-loss bound, not a per-step bound. For contact-rich manipulation where a single violation can be irreversible, the hard braking fallback (Wang‑Ames‑Egerstedt 2017 eq. 17) is the per-step backstop. Sharpening the average-loss bound to per-step is the project's headline open theoretical question; see ADR-004 Open Questions. The conformal layer's average-loss bound is an Accepted claim under the ADR status taxonomy with the per-step refinement flagged as Open work in ADR-004 (see also ADR-INDEX footnote a); promoting the layer to Validated is gated on the Stage-1 AS spike and the follow-up safety-stack refactor.
Can I plug in my own partner or safety filter?
Yes. Partners implement the FrozenPartner Protocol in
chamber.partners.api; register with the @register_partner
decorator from chamber.partners. See
Add a partner.
Safety filters implement the SafetyFilter Protocol in
concerto.safety.api; see
Add a filter.
When will the leaderboard be populated?
Stage-1 (AS + OM) rows land with M5. The remaining rows fill as the staged spikes complete; see Roadmap.
What's the relationship between CONCERTO and CHAMBER?
CONCERTO is the method (safety stack + ego-AHT training); CHAMBER is
the benchmark (env wrappers + comm + partner zoo + evaluation). Two
top-level packages in one wheel, with a one-way dependency:
chamber → concerto. Canonical sentence: we evaluate CONCERTO on
CHAMBER.
Is this reproducible bit-for-bit?
CPU runs are byte-identical under uv.lock + a root_seed via the
determinism harness in concerto.training.seeding. GPU runs are
deterministic up to the underlying CUDA non-determinism in PyTorch
reductions; the rliable-style aggregate metrics defined in
docs/reference/evaluation
are the canonical way to compare across seeds.
Why ManiSkill v3 and not Isaac Lab?
ADR-001's contingent rule was "extend the simulator if its abstractions admit the heterogeneity-axis controls without monkey-patching." ManiSkill v3 passes that test at ≈230 LOC of wrappers; Isaac Lab would have required a 3-month standalone build. Isaac Lab remains a viable secondary path if upstream API constraints force a migration — the env-adapter layer is intentionally thin so the swap is Type-2 reversible. See ADR-001 and ADR-005.
Non-goals
CHAMBER is not a navigation, planning, or generic-RL benchmark; the four-aspect intersection requires contact-rich physical manipulation. CONCERTO is not a certified safety product — it is a research-grade reference implementation of the exp CBF + conformal + OSCBF stack and is not a substitute for safety engineering in production deployments. The project does not ship pretrained partner checkpoints in Phase 0; the partner zoo construction lands in Phase 1.
Contributing
This is a research project, but it is open from the first commit. We welcome PRs.
- Read
CONTRIBUTING.mdfor the development flow. - Look at issues labelled
good-first-issue. - Sign your commits (
-S). DCO (Signed-off-by:) is required. - External contributors: the CLA bot will guide you on first PR.
- Every PR cites the ADR section it touches (e.g.
ADR-004 §6.2). We treat the ADRs as the design contract; if your PR motivates a change to them, propose a new ADR rather than editing an Accepted one.
Code of Conduct: CODE_OF_CONDUCT.md.
Security policy: SECURITY.md.
Stability & versioning
This project follows Semantic Versioning.
Under 0.x, MINOR-version bumps may break the public API per
SemVer §4. The public API surfaces are concerto.api,
concerto.safety.api, and chamber.comm; everything else is
implementation detail and subject to change without notice. The
wire-format chamber.comm.SCHEMA_VERSION constant is the single
source of truth for the fixed-format packet shape; bumping it is a
breaking change and requires a new ADR.
Citing CONCERTO & CHAMBER
If you use CONCERTO or CHAMBER in your research, please cite the preprint. Until the preprint is on arXiv (target: 2026‑06), cite the archived software release via its Zenodo DOI:
@software{safaei2026concerto,
author = {Safaei, Farhad},
title = {{CONCERTO} and {CHAMBER}: Contact-rich Coordination
with Opaque, Heterogeneous Teammates},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.20128469},
url = {https://doi.org/10.5281/zenodo.20128469},
note = {arXiv preprint forthcoming},
}
Citation entries are also in CITATION.cff so GitHub
renders a "Cite this repository" button.
Acknowledgments
CONCERTO stands on shoulders. The safety stack composes Wang et al. 2017 (decentralised exponential CBFs), Huriot & Sibai 2025 (conformal CBFs), and Morton & Pavone 2025 (OSCBF). CHAMBER is a wrapper layer over ManiSkill v3 and depends on a fork of HARL for the training stack. Corrections, acknowledgements, and contributions are welcome via PRs and issues.
License
Apache 2.0. See LICENSE and NOTICE.
The full Software Bill of Materials is at
sbom.spdx.json and is regenerated on every release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file concerto_multirobot-0.7.0.tar.gz.
File metadata
- Download URL: concerto_multirobot-0.7.0.tar.gz
- Upload date:
- Size: 975.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.20 {"installer":{"name":"uv","version":"0.11.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
835984f168461343d3b94fbadf20ca9685e98220a6988ecea3c5cfdd951fd47a
|
|
| MD5 |
906d7fd651794cf7415f60ae69831133
|
|
| BLAKE2b-256 |
a22eaf0bfec2d574d06fe4cb152473391d3ed70f9aa65c6405d9e03a9d34f98f
|
File details
Details for the file concerto_multirobot-0.7.0-py3-none-any.whl.
File metadata
- Download URL: concerto_multirobot-0.7.0-py3-none-any.whl
- Upload date:
- Size: 277.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.20 {"installer":{"name":"uv","version":"0.11.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2f23d4ae2bb770fb4b1c8e5c77c13686788418f9d2537a6890c3e01b9d0d1d9
|
|
| MD5 |
0d3e0472c03b1073b930f79aacaca178
|
|
| BLAKE2b-256 |
9cb821889ef56c6b7f15095284066c763008fe7ecded345e3d686fb6f27b3108
|