Unified Multi-modal Feedback using Amortized Variational Inference

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language

Project description

MAVRL

MAVRL — Multi-feedback Amortized Variational Reward Learning

Code release for the ICML 2026 camera-ready of "MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference". Implements an amortized variational reward learner that combines preference, demonstration, rating and stop feedback in a single ELBO.

Repository layout

mavrl/ — the algorithm: encoders, feedback likelihoods, losses, retraining utilities. import mavrl.
mavrl_experiments/ — experiment infrastructure: Optuna search, file queues, table printers, CLI. Depends on mavrl.
scripts/ — entrypoints. reproduce_paper.sh, reproduce_run.py, per-figure/table renderers, and SLURM launchers for re-running on a cluster.
results/ — the canonical artifact bundle: model checkpoints, hyperparameter logs, MCMC results, normalization constants, rendered tables, paper source, published figures.
FeedbackInformativeness/ — the Julia MCMC pipeline for the baseline-comparison.
expert_policies/ — pre-trained external policies used as Boltzmann-rational simulators (inputs, not our outputs; see the README inside).
tests/ — pytest suite for the algorithm + feedback models.

Top-level entry-point scripts (train.py, transfer.py, evaluate_reward_model.py) live at the repo root.

Installation

conda env create -f environment.yml
conda activate mavrl-env
pip install -e .

Reproducing the paper

All tables and figures regenerate from results/. No Optuna journals, cluster access, or retraining required.

conda activate mavrl-env
bash scripts/reproduce_paper.sh

Each step is skipped if its output exists; pass --force to rebuild everything.

Paper element	Renderer	Output	Runtime
`tab:fixed_allocation_combined_paper`	`python -m mavrl_experiments.fixed_allocation_table --camera-ready-root results`	`results/tables/fixed_allocation.tex`	~5 s
`tab:equal_budget_combined`	`python -m mavrl_experiments.equal_budget_table --camera-ready-root results`	`results/tables/equal_budget.tex`	~5 s
`tab:misspec-excerpt`, `tab:app-full-misspec`	`python scripts/summarize_misspec.py --tex …`	`results/tables/misspec.tex`	~1 s
`tab:baselines` (Post-Hoc Avg row)	`python scripts/baseline_comparison_table.py`	stdout (manual splice)	~1 s
`tab:baselines` (MCMC row)	inspect `results/mcmc/_combined_.json`	manual	—
`fig:robustness-ood`	`python scripts/plot_combined_transfer.py`	`results/figures/transfer_combined.{png,pdf}`	~10 s
`fig:appendix-transfer-*`	`python scripts/plot_individual_transfer.py`	`results/figures/transfer_*_full.{png,pdf}`	~30 s
`fig:qualitative-comparison-app-grid-trap`	`python scripts/visualize_checkpoint_comparison.py --config results/qualitative_configs/grid_trap.json …`	`results/figures/appendix_grids/final-fig_grid-trap.png`	~5 s
`fig:qualitative-comparison-app-grid-{cliff,sparse}`	committed PNG (regeneration deferred to a follow-up checkpoint bundle)	—	—

Reproducing a single training run

Each cell under results/models/{equal_budget,fixed_allocation}/<env>/<subset>/ ships per-seed checkpoints plus a hparams.json with the full merged training config. To re-train one seed:

# Grid env (~15 s):
python scripts/reproduce_run.py \
    --cell results/models/fixed_allocation/grid_trap/pdrs --seed 0

# Control env (~15 min):
python scripts/reproduce_run.py \
    --cell results/models/fixed_allocation/cartpole_v1/pdrs --seed 0

# Lunar Lander (~15 min):
python scripts/reproduce_run.py \
    --cell results/models/fixed_allocation/lunar_lander_v3/pdrs --seed 0

The script prints the produced final eval metric next to the published _meta.per_seed_values[seed] so you can eyeball the match. Swap --cell and --seed to cover any other cell.

Bit-exact reproduction vs. from-scratch retraining

Loading a published r_model_seed{i}.pt and re-evaluating it reproduces the per-seed metric bit-identically on any machine — that's the canonical reproducibility path.

Re-training the same config from scratch (reproduce_run.py) produces metrics in the same distribution but may bounce between basins per-seed. This is the well-known floor of PyTorch non-determinism across hardware/BLAS backends (Linux x86 + MKL vs macOS arm64 + Accelerate). Deterministic algorithms only fix intra-platform non-determinism, not cross-platform. The aggregate means in the paper tables average this out; individual seeds bounce.

Re-running the full pipeline on a SLURM cluster

For readers who want to redo the experiments end-to-end (not just verify the published numbers), the cluster-side entrypoints are:

scripts/launch_fixed_allocation_table.sh — 66 Optuna studies for the fixed-allocation table (uses configs/optuna/<env>_fixed_paper.py).
scripts/launch_equal_budget_table.sh — 66 Optuna studies for the equal-budget appendix table.
scripts/launch_transfer_fixalloc.sh — transfer/perturbation experiments (consumes the published encoders from results/models/fixed_allocation/).
scripts/launch_misspec.sh — misspecification robustness sweeps.
scripts/pregenerate_datasets.py — populates the dataset cache before launching the Optuna sweeps.
FeedbackInformativeness/scripts/submit_grid_mcmc_euler.sh — Julia MCMC baseline.

All launchers read --help-style docstrings at the top of each file. They default to writing into $SCRATCH/mavrl/optuna_studies/.

Testing

pytest tests/

Citation

@inproceedings{baur2026mavrl,
  title={MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference},
  author={Baur, Rapha\"el and Metz, Yannick and Gkoulta, Maria and El-Assady, Mennatallah and Ramponi, Giorgia and Kleine Buening, Thomas},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026},
}

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

0.1

May 27, 2026

0.0.1

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mavrl-0.1.tar.gz (126.2 kB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mavrl-0.1-py3-none-any.whl (160.3 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file mavrl-0.1.tar.gz.

File metadata

Download URL: mavrl-0.1.tar.gz
Upload date: May 27, 2026
Size: 126.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mavrl-0.1.tar.gz
Algorithm	Hash digest
SHA256	`fb4718297127f9c5191717ae107a665ebcb492142dee542c55012f4d38c68652`
MD5	`8b6d3767abe6237d0ee8a4d139fa8e78`
BLAKE2b-256	`99f1665baf444355121a62e74fcd0afa4e5ae3076a2c7ae8c6a049abfaa4b286`

See more details on using hashes here.

File details

Details for the file mavrl-0.1-py3-none-any.whl.

File metadata

Download URL: mavrl-0.1-py3-none-any.whl
Upload date: May 27, 2026
Size: 160.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mavrl-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`370d9e24498f20eba74a00f23dfc65de06c5ffa331c885bae11d43e0bbb5a91a`
MD5	`6883c09fbc44c7a7b8b61c233ce64ab8`
BLAKE2b-256	`8f8630a2570eba2222e7abeb67048cbc1b5c8966af8848819dc615d7f8a96e7a`

See more details on using hashes here.

mavrl 0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MAVRL — Multi-feedback Amortized Variational Reward Learning

Repository layout

Installation

Reproducing the paper

Reproducing a single training run

Bit-exact reproduction vs. from-scratch retraining

Re-running the full pipeline on a SLURM cluster

Testing

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes