Unified Multi-modal Feedback using Amortized Variational Inference
Project description
MAVRL — Multi-feedback Amortized Variational Reward Learning
Code release for the ICML 2026 camera-ready of "MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference". Implements an amortized variational reward learner that combines preference, demonstration, rating and stop feedback in a single ELBO.
Repository layout
mavrl/— the algorithm: encoders, feedback likelihoods, losses, retraining utilities.import mavrl.mavrl_experiments/— experiment infrastructure: Optuna search, file queues, table printers, CLI. Depends onmavrl.scripts/— entrypoints.reproduce_paper.sh,reproduce_run.py, per-figure/table renderers, and SLURM launchers for re-running on a cluster.results/— the canonical artifact bundle: model checkpoints, hyperparameter logs, MCMC results, normalization constants, rendered tables, paper source, published figures.FeedbackInformativeness/— the Julia MCMC pipeline for the baseline-comparison.expert_policies/— pre-trained external policies used as Boltzmann-rational simulators (inputs, not our outputs; see the README inside).tests/— pytest suite for the algorithm + feedback models.
Top-level entry-point scripts (train.py, transfer.py,
evaluate_reward_model.py) live at the repo root.
Installation
conda env create -f environment.yml
conda activate mavrl-env
pip install -e .
Reproducing the paper
All tables and figures regenerate from results/. No
Optuna journals, cluster access, or retraining required.
conda activate mavrl-env
bash scripts/reproduce_paper.sh
Each step is skipped if its output exists; pass --force to rebuild
everything.
| Paper element | Renderer | Output | Runtime |
|---|---|---|---|
tab:fixed_allocation_combined_paper |
python -m mavrl_experiments.fixed_allocation_table --camera-ready-root results |
results/tables/fixed_allocation.tex |
~5 s |
tab:equal_budget_combined |
python -m mavrl_experiments.equal_budget_table --camera-ready-root results |
results/tables/equal_budget.tex |
~5 s |
tab:misspec-excerpt, tab:app-full-misspec |
python scripts/summarize_misspec.py --tex … |
results/tables/misspec.tex |
~1 s |
tab:baselines (Post-Hoc Avg row) |
python scripts/baseline_comparison_table.py |
stdout (manual splice) | ~1 s |
tab:baselines (MCMC row) |
inspect results/mcmc/*_combined_*.json |
manual | — |
fig:robustness-ood |
python scripts/plot_combined_transfer.py |
results/figures/transfer_combined.{png,pdf} |
~10 s |
fig:appendix-transfer-* |
python scripts/plot_individual_transfer.py |
results/figures/transfer_*_full.{png,pdf} |
~30 s |
fig:qualitative-comparison-app-grid-trap |
python scripts/visualize_checkpoint_comparison.py --config results/qualitative_configs/grid_trap.json … |
results/figures/appendix_grids/final-fig_grid-trap.png |
~5 s |
fig:qualitative-comparison-app-grid-{cliff,sparse} |
committed PNG (regeneration deferred to a follow-up checkpoint bundle) | — | — |
Reproducing a single training run
Each cell under
results/models/{equal_budget,fixed_allocation}/<env>/<subset>/
ships per-seed checkpoints plus a hparams.json with the full merged
training config. To re-train one seed:
# Grid env (~15 s):
python scripts/reproduce_run.py \
--cell results/models/fixed_allocation/grid_trap/pdrs --seed 0
# Control env (~15 min):
python scripts/reproduce_run.py \
--cell results/models/fixed_allocation/cartpole_v1/pdrs --seed 0
# Lunar Lander (~15 min):
python scripts/reproduce_run.py \
--cell results/models/fixed_allocation/lunar_lander_v3/pdrs --seed 0
The script prints the produced final eval metric next to the published
_meta.per_seed_values[seed] so you can eyeball the match. Swap
--cell and --seed to cover any other cell.
Bit-exact reproduction vs. from-scratch retraining
Loading a published r_model_seed{i}.pt and re-evaluating it
reproduces the per-seed metric bit-identically on any machine —
that's the canonical reproducibility path.
Re-training the same config from scratch (reproduce_run.py) produces
metrics in the same distribution but may bounce between basins
per-seed. This is the well-known floor of PyTorch non-determinism
across hardware/BLAS backends (Linux x86 + MKL vs macOS arm64 +
Accelerate). Deterministic algorithms only fix intra-platform
non-determinism, not cross-platform. The aggregate means in the paper
tables average this out; individual seeds bounce.
Re-running the full pipeline on a SLURM cluster
For readers who want to redo the experiments end-to-end (not just verify the published numbers), the cluster-side entrypoints are:
scripts/launch_fixed_allocation_table.sh— 66 Optuna studies for the fixed-allocation table (usesconfigs/optuna/<env>_fixed_paper.py).scripts/launch_equal_budget_table.sh— 66 Optuna studies for the equal-budget appendix table.scripts/launch_transfer_fixalloc.sh— transfer/perturbation experiments (consumes the published encoders fromresults/models/fixed_allocation/).scripts/launch_misspec.sh— misspecification robustness sweeps.scripts/pregenerate_datasets.py— populates the dataset cache before launching the Optuna sweeps.FeedbackInformativeness/scripts/submit_grid_mcmc_euler.sh— Julia MCMC baseline.
All launchers read --help-style docstrings at the top of each file.
They default to writing into $SCRATCH/mavrl/optuna_studies/.
Testing
pytest tests/
Citation
@inproceedings{baur2026mavrl,
title={MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference},
author={Baur, Rapha\"el and Metz, Yannick and Gkoulta, Maria and El-Assady, Mennatallah and Ramponi, Giorgia and Kleine Buening, Thomas},
booktitle={International Conference on Machine Learning (ICML)},
year={2026},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mavrl-0.1.tar.gz.
File metadata
- Download URL: mavrl-0.1.tar.gz
- Upload date:
- Size: 126.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb4718297127f9c5191717ae107a665ebcb492142dee542c55012f4d38c68652
|
|
| MD5 |
8b6d3767abe6237d0ee8a4d139fa8e78
|
|
| BLAKE2b-256 |
99f1665baf444355121a62e74fcd0afa4e5ae3076a2c7ae8c6a049abfaa4b286
|
File details
Details for the file mavrl-0.1-py3-none-any.whl.
File metadata
- Download URL: mavrl-0.1-py3-none-any.whl
- Upload date:
- Size: 160.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
370d9e24498f20eba74a00f23dfc65de06c5ffa331c885bae11d43e0bbb5a91a
|
|
| MD5 |
6883c09fbc44c7a7b8b61c233ce64ab8
|
|
| BLAKE2b-256 |
8f8630a2570eba2222e7abeb67048cbc1b5c8966af8848819dc615d7f8a96e7a
|