Unified Multi-modal Feedback using Amortized Variational Inference

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language

Project description

MAVRL - Unified Multi-modal Feedback using Amortized Variational Inference

This package implements a variational inference approach for learning reward functions from multiple types of feedback (preferences, demonstrations, etc.).

Repository layout

The repository ships two top-level Python packages:

mavrl/ — the algorithm itself: encoders, feedback models, datasets, losses, environment wrappers, retraining utilities. Importable as import mavrl.
mavrl_experiments/ — the infrastructure that runs the algorithm: Optuna search, distributed file queues, table printers, Slack watchers, CLI entry points, and the experiment configs themselves (mavrl_experiments/configs/{experiments,optuna}/). Importable as import mavrl_experiments and invoked via python -m mavrl_experiments.<module>.

mavrl_experiments depends on mavrl (one-way); mavrl never imports from mavrl_experiments. The split keeps the algorithm package focused and lets infrastructure evolve without touching algorithm code.

Top-level entry-point scripts (train.py, transfer.py, evaluate_reward_model.py, train_online.py) live at the repo root.

Installation

Ensure your current Python is python/3.11.6. On Euler, load the correct python version using:

module load stack/2024-06 python/3.11.6

Ensure that you are at the root of this project. Create a fresh virtual environment with this exact name:

python -m venv venv/

.gitignore will ignore this virtual environment. Activate the virtual environment:

source venv/bin/activate

Install all required dependencies:

pip install -r requirements.txt
pip install -e .

The first line installs all python packages except mavrl. The second install an editable version of mavrl.

Running a single trial

To run a single trial, execute

python -m train.py

Running an experiment

Instead of running just a single trial, you can run a potentially large number of trials through our our cli. Here is an overview of the process:

1. Specifying all configuations

Specify all experintal configurations using the ExperimentGrid class. This will exhaustively run all valid combinations of the specified parameters. For an example on how to specify a grid of configurations, see mavrl_experiments/configs/experiments/sweep_grid_trap.py. You can specify configurations in four ways:

By passing the base_config to the ExperimentGrids constructor. These are parameters that are shared between all configurations.
By adding a parameter sweep with grid.add. Values are specified as lists.
By adding a conditional parameter with grid.add_conditional. Supply a boolean function to the condition argument that defines whether a configuration fulfills the condition to contain these parameter values.
By removing invalid configurations with grid.add_validator.

NOTE: Any paths that are specified in the grid should be absolute paths for the machine that you plan to run the experiment on. Otherwise paths will not be correctly recognized.

Once your grid is setup, populate the database with experiments:

python -m mavrl_experiments.cli add-grid <your_config_name> --seeds 5

This will create a database containing all configuration parameters that will be read out by the workers, but no results yet.

NOTE: Populating the database might take a long time on Euler, while it might only take a few seconds on your local system. Consider populating the database locally and copying it to Euler after.

This command is idempotent: Pre-existing entries with equivalent configurations will not be deleted by issueing it again, only new configurations will be added.

--seeds specifies the number of trials (differing by seed) that are run per configuration. So if you have 100 distinct configurations, --seeds 5 will result in 500 trials.

2. Checking experiment status

At each time-point during the experiment, you can check the progress using

python -m mavrl_experiments.cli status

Since you haven't started yet, you will see something like this.

Experiment Queue Status (rb_experiment_001.db)
========================================
  Pending:    22320
  Running:        0
  Completed:      0
  Failed:         0
----------------------------------------
  Total:      22320

  Progress: [░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 0.0%

Do not forget to specify the correct database path with this command in case you use a custom path.

3. Submit experiment

Now you can have workers pick up tasks from the queue (see scripts/ for cluster submission scripts).

Hyperparameter search (Optuna)

For finding good multi-modal feedback allocations under a fixed sample budget, use the Optuna-based search in mavrl_experiments/optuna_search.py. It samples a Dirichlet-distributed allocation over modalities (always summing exactly to --budget) and jointly searches over reward-model and PPO retraining hyperparameters defined in the env config.

Configs live at mavrl_experiments/configs/optuna/<env>.py (override the root via MAVRL_CONFIG_ROOT). Each config defines:

BASE_CONFIG — fixed parameters,
MODALITY_PARAMS — per-modality hyperparameters applied when that modality has samples > 0,
HYPERPARAM_SEARCH_SPACE — the search space (categorical lists or (low, high, log) continuous ranges),
MODALITIES — the ordered list of modality sample-count keys.

Local end-to-end test

The recipe below runs a minimal single-worker search on lunar_lander_v3. A full trial does a complete PPO retrain (1M timesteps by default), which is slow on a laptop. To iterate faster locally, temporarily add "retrain_n_timesteps": 100_000 to BASE_CONFIG in mavrl_experiments/configs/optuna/lunar_lander_v3.py (don't commit that — it's just for testing).

# 1. Pre-generate cached datasets (1 seed is enough for a smoke test).
#    --gen_samples should be >= the budget you plan to test.
python scripts/pregenerate_datasets.py \
    --config lunar_lander_v3 \
    --cache_dir dataset_cache/lander_local \
    --seeds 1 \
    --gen_samples 256 \
    --gen_samples_demo 256

# 2. Run a small search (single worker, few trials, one seed per trial).
python -m mavrl_experiments.optuna_search \
    --study-name lander_b256_local \
    --storage optuna_journal_lander_local.log \
    --env-config lunar_lander_v3 \
    --budget 256 \
    --n-seeds 1 \
    --n-trials 5 \
    --dataset-cache-dir dataset_cache/lander_local

# 3. Inspect the results (passing --env-config enables the normalized-score column).
python -m mavrl_experiments.optuna_search \
    --study-name lander_b256_local \
    --storage optuna_journal_lander_local.log \
    --env-config lunar_lander_v3 \
    --show-results

The journal file (optuna_journal_lander_local.log) is append-only and NFS-safe, so re-running step 2 with the same --study-name and --storage will continue the same study.

Cluster submission

scripts/submit_optuna.sh runs an Optuna worker as a SLURM array task. Every array element is an independent worker; they coordinate through a shared journal file (NFS-safe, append-only), so there is no central scheduler. Each worker fits its own TPE model from the shared trial history and proposes its own next trial.

Prerequisites

Virtual environment. The script activates venv/ (or ../venv/) automatically. Create it as described in Installation.
Journal directory. Pick a path on a shared filesystem reachable from all compute nodes (e.g. $SCRATCH/mavrl/optuna_studies/). The journal file will be created on first run.
Dataset cache (recommended). Pre-generate datasets once so trials don't redo expensive sample generation. --gen_samples should be at least the budget you intend to search:
```
python scripts/pregenerate_datasets.py \
    --config lunar_lander_v3 \
    --cache_dir $SCRATCH/mavrl/dataset_cache/lander \
    --seeds 3 \
    --gen_samples 256 \
    --gen_samples_demo 256
```
Use the same --seeds value as your trial N_SEEDS (workers seed trials as 0..N_SEEDS-1).

Submission

The script reads its configuration from environment variables. Required:

Variable	Meaning
`STUDY_NAME`	Optuna study name. Use a fresh name per (metric, direction, budget) — `load_if_exists=True` silently reuses an existing study's direction.
`ENV_CONFIG`	Config name under `mavrl_experiments/configs/optuna/` (e.g. `lunar_lander_v3`).
`BUDGET`	Total feedback samples per trial (sum across modalities).
`STORAGE_PATH`	Path to the journal `.log` file.

Optional:

Variable	Default	Meaning
`N_SEEDS`	`3`	Seeds evaluated per trial; the trial value is the mean across seeds.
`N_TRIALS`	`20`	Trials per worker. With a 32-task array, total trials ≈ `32 × N_TRIALS`.
`METRIC`	`eval/regret`	Final-evaluation key to optimize (e.g. `eval/mean_rew`, `eval/discounted_value`).
`DIRECTION`	`minimize`	`minimize` or `maximize`. Pair with `METRIC` correctly.
`SINGLE_MODALITY`	unset	If set to `pref`/`demo`/`rating`/`stop`, the entire `BUDGET` is allocated to that modality. Useful for single-modality baselines.
`WANDB_PROJECT`	unset	Log every trial run to this wandb project.
`DATASET_CACHE_DIR`	unset	Point trials at a pre-generated dataset cache.

Combined-modality run (Dirichlet allocation across all modalities):

STUDY_NAME=lander_b256_meanrew \
ENV_CONFIG=lunar_lander_v3 \
BUDGET=256 \
STORAGE_PATH=$SCRATCH/mavrl/optuna_studies/lander_b256_meanrew.log \
METRIC=eval/mean_rew DIRECTION=maximize \
N_SEEDS=3 N_TRIALS=20 \
DATASET_CACHE_DIR=$SCRATCH/mavrl/dataset_cache/lander \
sbatch scripts/submit_optuna.sh

Single-modality baseline (e.g. all-preferences) under the same budget, for comparison:

STUDY_NAME=lander_b256_meanrew_prefonly \
ENV_CONFIG=lunar_lander_v3 \
BUDGET=256 \
STORAGE_PATH=$SCRATCH/mavrl/optuna_studies/lander_b256_meanrew_prefonly.log \
METRIC=eval/mean_rew DIRECTION=maximize \
SINGLE_MODALITY=pref \
N_SEEDS=3 N_TRIALS=20 \
DATASET_CACHE_DIR=$SCRATCH/mavrl/dataset_cache/lander \
sbatch scripts/submit_optuna.sh

Adjusting array size and resources

The script defaults to --array=0-31 (32 workers), 4 CPUs each, 4 hours wall time. Override at submit time:

sbatch --array=0-15 --time=08:00:00 scripts/submit_optuna.sh   # 16 workers, 8h
sbatch --array=0-63 --cpus-per-task=8 scripts/submit_optuna.sh # 64 workers, 8 CPUs each

Logs land in logs/slurm/optuna_<jobid>_<taskid>.out|err.

Monitoring & inspecting results

While running, the journal file is readable:

python -m mavrl_experiments.optuna_search \
    --study-name lander_b256_meanrew \
    --storage $SCRATCH/mavrl/optuna_studies/lander_b256_meanrew.log \
    --env-config lunar_lander_v3 \
    --show-results

This works mid-run (you'll just see partial results) and after completion. Passing --env-config enables a normalized-score column when results/normalization_values.json has entries for the env.

Two main tables: equal-budget and fixed-allocation

There are two pre-built launchers that each submit 66 Optuna studies (6 envs × 11 modality subsets). They answer different questions:

Launcher	Allocation	Question
`launch_equal_budget_table.sh`	Dirichlet over budget	Are modalities complementary when you spend a fixed total budget?
`launch_fixed_allocation_table.sh`	Prescribed per-modality	Can MAVRL combine arbitrary offline feedback datasets to produce gains?

Both share the same 11-subset layout (pref, demo, rating, stop, all 6 pairs, and pdrs = all four). The two are designed to live side-by-side in $STORAGE_ROOT — study suffixes differ (_b<N> vs _fixed), so they don't collide.

1. Equal-budget table — modality complementarity

For each env, fix a single total feedback budget and let Optuna's Dirichlet allocation split it across whichever modalities are active in the study. Tests whether two modalities together at total budget B beat the best single modality at B.

# Submit all 66 studies (default per-env budgets: grid=64, control=64, lander=256)
bash scripts/launch_equal_budget_table.sh

# Filter to a subset of envs/subsets / dry-run
ENVS="grid_trap"  SUBSETS="pdrs pref"  bash scripts/launch_equal_budget_table.sh
DRY_RUN=1         bash scripts/launch_equal_budget_table.sh

# Override per-env-group budgets
BUDGET_GRID=128   bash scripts/launch_equal_budget_table.sh

Snapshot the current best value of every cell into one printed table (safe mid-optimization; reads the journal files):

python -m mavrl_experiments.equal_budget_table \
    --storage-root $SCRATCH/mavrl/optuna_studies

Cells render as normalized percentages (uniform=0%, optimal=100%) when results/normalization_values.json covers the env. Filter with --envs grid_cliff lunar_lander_v3 to print a subset of rows.

2. Fixed-allocation table — gains from heterogeneous offline data

For each env, prescribe per-modality sample counts in mavrl_experiments/configs/optuna/<env>_fixed.py:FIXED_SAMPLE_COUNTS. Each study uses exactly those counts (no Dirichlet, no shared budget); Optuna instead searches the optimizer/loss hyperparameters that combine the modalities: td_error_weight, kl_weight, use_importance_weights, lr, batch_size, encoder_hidden_sizes (and the PPO retraining hparams for non-tabular envs). Tests the "you have offline data of various kinds lying around — can our method turn it into a better reward model than any single-modality alternative?" story.

# Submit all 66 studies using prescribed counts from <env>_fixed.py
bash scripts/launch_fixed_allocation_table.sh

# Filter / dry-run (same hooks as the equal-budget launcher)
ENVS="grid_trap acrobot_v1"  bash scripts/launch_fixed_allocation_table.sh
DRY_RUN=1                    bash scripts/launch_fixed_allocation_table.sh

Default FIXED_SAMPLE_COUNTS (small values, totals near a power of 2; tune in the <env>_fixed.py config to match your offline-data scenario):

env	pref	demo	rating	stop	total
grid_*	23	2	23	16	64
acrobot_v1	23	2	23	16	64
cartpole_v1	23	2	23	16	64
lunar_lander_v3	92	8	92	64	256

To inspect any individual study's best trial (works for both tables):

python -m mavrl_experiments.optuna_search \
    --study-name grid_trap_pdrs_fixed \
    --storage $SCRATCH/mavrl/optuna_studies/grid_trap/grid_trap_pdrs_fixed.log \
    --env-config grid_trap_fixed --show-results

Plotting a study

scripts/plot_optuna_study.py writes interactive Plotly HTML files (optimization history, param importances, slice, parallel coordinates, contour) under figures/optuna/<study_name>/. Safe to run mid-study — the journal backend tolerates concurrent reads.

# Equal-budget joint study, lunar_lander_v3 (pdrs at budget 256)
python scripts/plot_optuna_study.py \
    --study-name lunar_lander_v3_pdrs_b256 \
    --storage-dir $SCRATCH/mavrl/optuna_studies/lunar_lander_v3

Substitute the study name to plot any other env / subset / budget. To sweep all five "tracked" subsets for one env quickly:

for sub in pref demo rating stop pdrs; do
    python scripts/plot_optuna_study.py \
        --study-name lunar_lander_v3_${sub}_b256 \
        --storage-dir $SCRATCH/mavrl/optuna_studies/lunar_lander_v3
done

Then scp the figures/optuna/ tree back to your laptop and open the HTMLs in a browser. The optimization-history plot is usually the most informative for "is the search still improving or has it plateaued."

Resuming and adding more trials

To add more trials to an existing study, resubmit with the same STUDY_NAME and STORAGE_PATH. Workers will load the existing study (load_if_exists=True), fit TPE on the existing history, and append new trials. The original direction/metric is preserved — you cannot change them mid-study; start a fresh study instead.

Tips

Test the configuration locally with --n-trials 1 --n-seeds 1 before submitting an array job. Most config errors (typos, missing policies, invalid hyperparam ranges) surface in the first trial.
The first few trials in any new study are random startup samples (n_startup_trials); TPE only kicks in after enough completed trials are visible across all workers.
Slurm logs print the resolved per-trial allocation as Allocation: {...} at the end of each --show-results invocation, which is the most useful artifact for downstream sweeps.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

0.1

May 27, 2026

This version

0.0.1

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mavrl-0.0.1.tar.gz (165.2 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mavrl-0.0.1-py3-none-any.whl (231.2 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file mavrl-0.0.1.tar.gz.

File metadata

Download URL: mavrl-0.0.1.tar.gz
Upload date: May 12, 2026
Size: 165.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mavrl-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`d53cbaaed43df60a75529bff7041067c28d8a908b3d0f12532d626c38af4d74a`
MD5	`dbacd4161ca56f397babc768c09bff05`
BLAKE2b-256	`06afd609d5f97df9db5b1504f423220eddadc4089674ca2daa40c2ce421e29cf`

See more details on using hashes here.

File details

Details for the file mavrl-0.0.1-py3-none-any.whl.

File metadata

Download URL: mavrl-0.0.1-py3-none-any.whl
Upload date: May 12, 2026
Size: 231.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mavrl-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`644d480514e0903527410b75a9dfc69b8807bda909e14190d866e9bf6a977621`
MD5	`051d9da78497fc2ffd1f547613f1e337`
BLAKE2b-256	`606a5def772b0a0418af8d9c8dd09455107487154ee66280179bb1ede429d2d7`

See more details on using hashes here.

mavrl 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MAVRL - Unified Multi-modal Feedback using Amortized Variational Inference

Repository layout

Installation

Running a single trial

Running an experiment

1. Specifying all configuations

2. Checking experiment status

3. Submit experiment

Hyperparameter search (Optuna)

Local end-to-end test

Cluster submission

Prerequisites

Submission

Adjusting array size and resources

Monitoring & inspecting results

Two main tables: equal-budget and fixed-allocation

1. Equal-budget table — modality complementarity

2. Fixed-allocation table — gains from heterogeneous offline data

Plotting a study

Resuming and adding more trials

Tips

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes