Skip to main content

Heterogeneous Multi-Robot Ad-Hoc Teamwork — benchmark + safety stack.

Project description

CONCERTO

Contact-rich coordination with opaque, heterogeneous teammates — with explicit safety assumptions and conformal-CBF reporting.
CONCERTO is the method. CHAMBER is the benchmark. We evaluate CONCERTO on CHAMBER.

DOI CI Docs License OpenSSF Scorecard TestPyPI Status

Status — pre-release, Phase 0. Architecture is locked in 15 ADRs (13 Accepted, 2 RFC) under the working policy recorded in adr/ADR-INDEX.md; the staged Phase-0 spike protocol (ADR-007) is the validation gate that promotes Accepted ADRs to Validated with per-axis ≥20 pp evidence. The Stage-1 (AS + OM) preregistrations are the next launch; the leaderboard fills with M5. The public API is on 0.x — MINOR bumps may break it per SemVer §4. See Roadmap.

TL;DR. CONCERTO is a three-layer safety stack — exponential CBF‑QP, conformal-slack overlay, OSCBF inner filter, hard braking fallback — for robots that must work with opaque, heterogeneous teammates they were never trained with. CHAMBER is the matching benchmark — six heterogeneity sub-axes above ManiSkill v3, fixed-format communication with URLLC-anchored degradation profiles, a partner zoo, and an ISO 10218-2:2025-aware safety-reporting format. Open from day one, ADR-tracked design contract, preregistered spikes, byte-identical CPU determinism via uv.lock + a root_seed.

Table of contents

Quickstart

30-second smoke test.

git clone https://github.com/fsafaei/concerto.git
cd concerto
pip install uv && uv sync --group dev --group train

# Smoke test the rig (ADR-001 acceptance criterion).
uv run pytest -m smoke -x -v

Install groups. --group dev pulls the developer toolchain (ruff, pyright, pytest). HARL ships as the harl-aht distribution (the CONCERTO fork at fsafaei/harl-fork; see ADR-002 §Revision-history 2026-05-19 and #132) and is pulled automatically as a runtime dependency, so the ego-AHT trainer + frozen-HARL partner work out of the box from a source checkout — no separate train-group install needed. The concerto-multirobot distribution ships to TestPyPI for the 0.x line; the production-PyPI debut is staged in the Release workflow.

Compose a factory-floor channel (URLLC-anchored degradation profile from ADR-006) and round-trip a packet through encodedecode:

from chamber.comm import (
    CommDegradationWrapper,
    FixedFormatCommChannel,
    URLLC_3GPP_R17,
)

channel = CommDegradationWrapper(
    FixedFormatCommChannel(),
    URLLC_3GPP_R17["factory"],
    tick_period_ms=1.0,
    root_seed=0,
)

state = {
    "pose": {
        "ego": {"xyz": (0.0, 0.0, 0.0), "quat_wxyz": (1.0, 0.0, 0.0, 0.0)},
    },
    "task_state": {"ego": {"grasp_side": "left"}},
}

# The factory profile delays each packet by ~5 ticks; drain the queue so
# the visible packet carries the freshly-encoded state.
for _ in range(10):
    packet = channel.encode(state)
decoded = channel.decode(packet)
print("decoded payload:", decoded)

Save the snippet to quickstart.py and run uv run python quickstart.py.

The six pre-registered URLLC profiles — ideal, urllc, factory, wifi, lossy, saturation — are the Stage‑2 CM sweep table. See docs/how-to/run-spike.md for the full flow. For the bigger picture, jump to Architecture at a glance.


Architecture at a glance

Two top-level packages, one wheel. CHAMBER (benchmark) wraps ManiSkill v3 and provides the six heterogeneity axes, the communication stack, the partner zoo, and the evaluation harness. CONCERTO (method) provides the safety stack and the ego-AHT training loop. Dependency direction is one-way: chamber → concerto.

flowchart LR
    subgraph CHAMBER["CHAMBER · benchmark"]
        direction TB
        ENVS["envs<br/>ManiSkill v3 wrappers<br/>AS · OM · CR"]
        COMM["comm<br/>fixed-format channel +<br/>URLLC degradation · CM"]
        PART["partners<br/>partner zoo · PF<br/>heuristic / frozen-RL / VLA"]
        EVAL["evaluation<br/>HRS · prereg · leaderboard"]
        BENCH["benchmarks<br/>Stage-0/1/2/3 spike runners"]
    end
    subgraph CONCERTO["CONCERTO · method"]
        direction TB
        SAFETY["safety<br/>exp CBF-QP + conformal +<br/>OSCBF + braking · SA"]
        TRAIN["training<br/>ego-AHT loop +<br/>deterministic seeding"]
        API["api<br/>public Protocols"]
    end
    ENVS --> BENCH
    COMM --> BENCH
    PART --> BENCH
    BENCH --> EVAL
    BENCH -- ego-policy --> TRAIN
    TRAIN -- filtered actions --> SAFETY
    SAFETY -. consumes .-> API
    CHAMBER -- "depends on (one-way)" --> CONCERTO

The six axis labels in parentheses tie each module to the heterogeneity sub-axis it exercises; see the six heterogeneity axes for the per-axis pre-registered ≥20 pp gap rule.


Why this exists

Real factories already pair robots that were never trained together. A 500 Hz industrial arm next to a 50 Hz mobile base; a vision-only manipulator next to a force-feedback one; a vendor‑A controller next to a vendor‑B controller under binding ISO 10218-2:2025. At deployment time, your robot's teammate is opaque (no policy access), heterogeneous (different morphology and action frequency), and ad hoc (no prior joint training). Hospitals and warehouses are the same picture.

Most multi-robot benchmarks assume identical embodiments and shared training. The few that don't focus on planning or navigation, not on contact-rich physical manipulation. The intersection of Heterogeneity × Black-box partner × Safety × Manipulation is empty in the published literature. CHAMBER is built to fill it, and CONCERTO is the first method designed against this four-aspect contract; empirical validation is staged through CHAMBER spikes (Stage 1 → Stage 3) per ADR-007 §Decision.

How we sit relative to the closest prior work

Every prior precedent covers at most three of the four aspects. The table below lists the closest precedent for each pair of aspects; no published row hits all four. Click any precedent to open the paper.

Method Heterogeneous Black-box partner Safety bound Contact-rich manipulation
Liu 2024 RSS (LLM‑AHT)
COHERENT (LLM‑MR planning)
Huriot & Sibai 2025 (conformal CBF)
HetGPPO  / HARL (heterogeneous MARL)
Wang et al. 2017 (multi‑robot CBFs)
RoCoBench (multi‑robot manipulation)
SafeBimanual (safe bimanual manip.)
CONCERTO + CHAMBER

Reading the table. Heterogeneous here is the four-aspect literature-gap level; CHAMBER's six measurable sub-axes (AS, OM, CR, CM, PF, SA) decompose it further per ADR-007.

Read the table by columns to see what each aspect covers in isolation, and by rows to see what no single line of work has yet combined. Contact-rich manipulation appears with multi-robot coordination (RoCoBench) and with safety (SafeBimanual), but never with black-box ad-hoc partners under explicit safety assumptions at the same time. CONCERTO + CHAMBER occupy the four-aspect intersection at the design-contract level (ADRs, scaffold, smoke test); empirical validation across the six heterogeneity sub-axes is the staged Phase-0 spike protocol's job (Stage 1: AS + OM → Stage 2: CR + CM → Stage 3: PF + SA), with results landing on the leaderboard from M5 onward.

See adr/ADR-007 for the six-axis taxonomy that defines "heterogeneous" precisely, and the docs/explanation/why-aht.md page for the long-form positioning.


The six heterogeneity axes CHAMBER measures

Axis Symbol What it varies Where the priors come from
Action space AS 7‑DOF arm vs 2‑DOF mobile base on shared task HARL, HetGPPO
Observation modality OM vision-only vs vision + force/torque + proprioception Visual-tactile peg-in-hole literature
Control rate CR 500 Hz arm vs 50 Hz base, chunk size held constant RTC, A2C2, FAVLA
Communication CM latency 1–100 ms, jitter µs–10 ms, drop 10−6–10−2 3GPP R17, URLLC
Partner familiarity PF trained-with vs frozen-novel partner, mid-episode swap FCP, MEP
Safety SA mixed-vendor force-limit / SIL-PL pairs, contact-rich ISO 10218-2:2025

Every surviving axis is required to clear a pre-registered ≥20 pp homogeneous-vs-heterogeneous gap before it ships in the v1 benchmark. See adr/ADR-007 for the staged Phase‑0 spike protocol (Stage 1: AS + OM, Stage 2: CR + CM, Stage 3: PF + SA).


Repository layout

src/
├── concerto/      # the METHOD  (cite this)
│   ├── safety/    #   exp CBF-QP + conformal overlay + OSCBF + braking fallback
│   ├── training/  #   ego-AHT training loop + deterministic seeding
│   ├── policies/  #   Phase-1 trained checkpoints
│   └── api/       #   public Protocols
└── chamber/       # the BENCHMARK  (run this)
    ├── envs/      #   ManiSkill v3 wrappers
    ├── comm/      #   fixed-format channel + URLLC degradation
    ├── partners/  #   partner zoo (heuristic / frozen-RL / VLA stubs)
    ├── tasks/     #   CHAMBER-Solo / Duo / Quartet (Phase 1+)
    ├── evaluation/#   HRS, pre-registration, leaderboard renderer
    └── benchmarks/#   Stage-0/1/2/3 spike runners

adr/               # 15 Architecture Decision Records (the design rationale)
docs/              # Diátaxis: tutorials / how-to / reference / explanation
tests/             # unit / property / integration / smoke / reproduction
spikes/            # pre-registration YAMLs + result archives

Leaderboard

Stage‑0 acceptance results; rendered by chamber-render-tables after each tagged spike. Stage 1 (AS + OM) rows land with M5 — see Roadmap.

Show placeholder table
Method Stage 0 success Inter-robot collision Force-limit violation Conformal λ mean Reference
MAPPO (homogeneous baseline) pending pending pending n/a M5
HetGPPO + naive CBF pending pending pending n/a M5
CONCERTO pending pending pending pending M5

Submit a new entry: docs/how-to/submit-leaderboard.md.


Who this is for

Multi-robot RL researchers — CHAMBER is the first benchmark to score ad-hoc teamwork at the manipulation tier with a measurable heterogeneity-robustness score (HRS). Start with docs/tutorials/hello-spike.md.

Safe-control researchers — CONCERTO's safety module is a production-grade reference implementation of the exp CBF + conformal + OSCBF stack with a hard braking fallback. The unresolved theoretical question (average-loss → per-step bound) is documented in adr/ADR-004.

Robotics practitioners and integrators — CHAMBER's communication profiles are anchored to 3GPP Release 17 URLLC and 5G-TSN industrial-trial data, and the safety axis references ISO 10218-2:2025 directly. See docs/explanation/threat-model.md.


Documentation

Full documentation: fsafaei.github.io/concerto

  • Tutorials — step-by-step walkthroughs.
  • How-tos — add a partner, add a safety filter, run a spike.
  • API reference — generated from docstrings.
  • ADR index — 15 design decisions with full rationale.
  • Glossary — HRS, AoI, OSCBF, FCP/MEP, all defined.
  • Literature — five-cluster bibliography (AHT/ZSC, safe control, conformal prediction, benchmarks, reproducibility).
  • Standards — ISO 10218-2:2025 + IEC 62061 + IEEE TSN + 3GPP R17 references, with the axis → standard → metric → report-table flowchart.
  • Evaluation — the multi-seed and rliable reporting contract for the leaderboard.

Roadmap

The project advances in three phases. Phase 0 (current) locks the design contract and runs the staged heterogeneity-axis spikes. Phase 1 ships the partner zoo and the populated leaderboard. Phase 2 expands tasks and adds the real-robot demo platform.

Now — Phase 0, design contract live, spikes about to start. 15 ADRs (13 Accepted, 2 RFC) under the status taxonomy in adr/ADR-INDEX.md; open follow-up work is tracked per-ADR via the footnote column. M1 (platform), M2 (comm), and M4b (training stack) are merged on main. The chamber-spike CLI runs the ego-AHT loop end-to-end against a Hydra config.

Next. Stage-1 spikes (AS + OM) — preregistered, launched, first leaderboard rows. arXiv design-report preprint (priority defence on the four-aspect framing). Stage-2 spikes (CR + CM).

Later. Stage-3 spikes (PF + SA) — possibly HIL for SA. Phase-1 leaderboard v1 (CONCERTO + 3 baselines on Tier-1 / Tier-2 tasks). Phase 2 (Tier-3 long-object tasks, real-robot demo platform).

Day-to-day progress: CHANGELOG.md and the issues board.


FAQ

How does CHAMBER differ from RoCoBench, SafeBimanual, or BiGym?

RoCoBench covers Heterogeneity × Manipulation on MuJoCo with multi-arm LLM-dialectic coordination but does not address black-box partners or formal safety bounds. SafeBimanual covers Safety × Manipulation on a single bimanual platform. BiGym is single-embodiment. CHAMBER targets the four-aspect intersection (H × B × S × M) at the substrate level — thin wrapper layers above ManiSkill v3, a fixed-format communication stack, and a partner zoo — rather than as a curated task set. See ADR-001 and ADR-005 for the simulator-base decision.

Is the safety guarantee per-step or asymptotic?

The conformal slack overlay (Huriot & Sibai 2025 Theorem 3) gives a distribution-free ε + o(1) long-term average-loss bound, not a per-step bound. For contact-rich manipulation where a single violation can be irreversible, the hard braking fallback (Wang‑Ames‑Egerstedt 2017 eq. 17) is the per-step backstop. Sharpening the average-loss bound to per-step is the project's headline open theoretical question; see ADR-004 Open Questions. The conformal layer's average-loss bound is an Accepted claim under the ADR status taxonomy with the per-step refinement flagged as Open work in ADR-004 (see also ADR-INDEX footnote a); promoting the layer to Validated is gated on the Stage-1 AS spike and the follow-up safety-stack refactor.

Can I plug in my own partner or safety filter?

Yes. Partners implement the FrozenPartner Protocol in chamber.partners.api; register with the @register_partner decorator from chamber.partners. See Add a partner. Safety filters implement the SafetyFilter Protocol in concerto.safety.api; see Add a filter.

When will the leaderboard be populated?

Stage-1 (AS + OM) rows land with M5. The remaining rows fill as the staged spikes complete; see Roadmap.

What's the relationship between CONCERTO and CHAMBER?

CONCERTO is the method (safety stack + ego-AHT training); CHAMBER is the benchmark (env wrappers + comm + partner zoo + evaluation). Two top-level packages in one wheel, with a one-way dependency: chamber → concerto. Canonical sentence: we evaluate CONCERTO on CHAMBER.

Is this reproducible bit-for-bit?

CPU runs are byte-identical under uv.lock + a root_seed via the determinism harness in concerto.training.seeding. GPU runs are deterministic up to the underlying CUDA non-determinism in PyTorch reductions; the rliable-style aggregate metrics defined in docs/reference/evaluation are the canonical way to compare across seeds.

Why ManiSkill v3 and not Isaac Lab?

ADR-001's contingent rule was "extend the simulator if its abstractions admit the heterogeneity-axis controls without monkey-patching." ManiSkill v3 passes that test at ≈230 LOC of wrappers; Isaac Lab would have required a 3-month standalone build. Isaac Lab remains a viable secondary path if upstream API constraints force a migration — the env-adapter layer is intentionally thin so the swap is Type-2 reversible. See ADR-001 and ADR-005.


Non-goals

CHAMBER is not a navigation, planning, or generic-RL benchmark; the four-aspect intersection requires contact-rich physical manipulation. CONCERTO is not a certified safety product — it is a research-grade reference implementation of the exp CBF + conformal + OSCBF stack and is not a substitute for safety engineering in production deployments. The project does not ship pretrained partner checkpoints in Phase 0; the partner zoo construction lands in Phase 1.


Contributing

This is a research project, but it is open from the first commit. We welcome PRs.

  • Read CONTRIBUTING.md for the development flow.
  • Look at issues labelled good-first-issue.
  • Sign your commits (-S). DCO  (Signed-off-by:) is required.
  • External contributors: the CLA bot will guide you on first PR.
  • Every PR cites the ADR section it touches (e.g. ADR-004 &sect;6.2). We treat the ADRs as the design contract; if your PR motivates a change to them, propose a new ADR rather than editing an Accepted one.

Code of Conduct: CODE_OF_CONDUCT.md. Security policy: SECURITY.md.


Stability & versioning

This project follows Semantic Versioning. Under 0.x, MINOR-version bumps may break the public API per SemVer §4. The public API surfaces are concerto.api, concerto.safety.api, and chamber.comm; everything else is implementation detail and subject to change without notice. The wire-format chamber.comm.SCHEMA_VERSION constant is the single source of truth for the fixed-format packet shape; bumping it is a breaking change and requires a new ADR.


Citing CONCERTO & CHAMBER

If you use CONCERTO or CHAMBER in your research, please cite the preprint. Until the preprint is on arXiv (target: 2026‑06), cite the archived software release via its Zenodo DOI:

@software{safaei2026concerto,
  author       = {Safaei, Farhad},
  title        = {{CONCERTO} and {CHAMBER}: Contact-rich Coordination
                  with Opaque, Heterogeneous Teammates},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.20128469},
  url          = {https://doi.org/10.5281/zenodo.20128469},
  note         = {arXiv preprint forthcoming},
}

Citation entries are also in CITATION.cff so GitHub renders a "Cite this repository" button.


Acknowledgments

CONCERTO stands on shoulders. The safety stack composes Wang et al. 2017 (decentralised exponential CBFs), Huriot & Sibai 2025 (conformal CBFs), and Morton & Pavone 2025 (OSCBF). CHAMBER is a wrapper layer over ManiSkill v3 and depends on a fork of HARL for the training stack. Corrections, acknowledgements, and contributions are welcome via PRs and issues.


License

Apache 2.0. See LICENSE and NOTICE. The full Software Bill of Materials is at sbom.spdx.json and is regenerated on every release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concerto_multirobot-0.7.0.tar.gz (975.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

concerto_multirobot-0.7.0-py3-none-any.whl (277.4 kB view details)

Uploaded Python 3

File details

Details for the file concerto_multirobot-0.7.0.tar.gz.

File metadata

  • Download URL: concerto_multirobot-0.7.0.tar.gz
  • Upload date:
  • Size: 975.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.20 {"installer":{"name":"uv","version":"0.11.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for concerto_multirobot-0.7.0.tar.gz
Algorithm Hash digest
SHA256 835984f168461343d3b94fbadf20ca9685e98220a6988ecea3c5cfdd951fd47a
MD5 906d7fd651794cf7415f60ae69831133
BLAKE2b-256 a22eaf0bfec2d574d06fe4cb152473391d3ed70f9aa65c6405d9e03a9d34f98f

See more details on using hashes here.

File details

Details for the file concerto_multirobot-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: concerto_multirobot-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 277.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.20 {"installer":{"name":"uv","version":"0.11.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for concerto_multirobot-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2f23d4ae2bb770fb4b1c8e5c77c13686788418f9d2537a6890c3e01b9d0d1d9
MD5 0d3e0472c03b1073b930f79aacaca178
BLAKE2b-256 9cb821889ef56c6b7f15095284066c763008fe7ecded345e3d686fb6f27b3108

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page