Auto-tuning reward system for production RL applications (Premium)
Project description
RewardGuard Premium
Automatic Reward Alignment for Production RL — with Statistical Detection and Auto-Correction
RewardGuard Premium is the paid tier of RewardGuard. Install it freely from PyPI — just sign in to your RewardGuard account to activate.
Installation
pip install rewardguard-premium
On first use you will be prompted to sign in:
RewardGuard Premium — Sign in to your account
Visit https://rewardguard.dev to create an account
Email: you@example.com
Password: ••••••••
Signed in successfully!
Your session is saved to ~/.rewardguard/session.json and refreshed automatically. You only sign in once per machine.
Requires an active RewardGuard Premium subscription. Subscribe at: https://rewardguard.dev/premium
Sign-in CLI
rewardguard-premium login # sign in (or switch accounts)
rewardguard-premium logout # clear saved session
rewardguard-premium status # show who you are signed in as
CI / automated environments — use env vars instead of the interactive prompt:
export REWARDGUARD_EMAIL=you@example.com
export REWARDGUARD_PASSWORD=yourpassword
Quick Start
from rewardguard_premium import AutoMonitor
monitor = AutoMonitor(
expected={"task": 0.7, "safety": 0.3},
baseline_steps=300, # warm-up before detection activates
auto_correct=True, # adjust weights automatically when flagged
)
for episode in range(num_episodes):
for step in range(max_steps):
r_task, r_safety = env.step(action)
snapshot = monitor.step({"task": r_task, "safety": r_safety})
if snapshot:
if snapshot.flag == "critical":
# Apply auto-corrected weights back to the environment
env.set_reward_weights(monitor.weights)
monitor.print_report()
What AutoMonitor Does
| Phase | What happens |
|---|---|
Warm-up (baseline_steps) |
Learns the normal ratio for your environment. Returns None from step(). |
| Detection | Computes per-component z-scores against the learned baseline. |
| Alignment score | 0 (misaligned) → 1 (fully aligned), sigmoid-mapped from the max z-score. |
| Flagging | ok (score > 0.75) / warning (> 0.5) / critical (≤ 0.5) |
| Auto-correction | Adjusts per-component weight multipliers when flagged. |
| Drift velocity | Linear regression slope over recent scores — distinguishes trends from spikes. |
AlignmentSnapshot
Every call to step() after warm-up returns an AlignmentSnapshot:
snapshot.alignment_score # float 0–1
snapshot.flag # "ok" / "warning" / "critical"
snapshot.z_scores # {"task": -0.4, "safety": +2.8}
snapshot.drift_velocity # negative = worsening trend
snapshot.corrections_applied # {"safety": 1.24} — weights changed this step
snapshot.component_ratios # {"task": 68.3, "safety": 31.7}
Framework Integrations
Weights & Biases
from rewardguard_premium import AutoMonitor, make_wandb_callback
import wandb
wandb.init(project="my-rl-run")
monitor = AutoMonitor(
expected={"task": 0.7, "safety": 0.3},
callbacks=[make_wandb_callback()],
)
TensorBoard
from torch.utils.tensorboard import SummaryWriter
from rewardguard_premium import AutoMonitor, make_tensorboard_callback
writer = SummaryWriter("runs/my_run")
monitor = AutoMonitor(
expected={"task": 0.7, "safety": 0.3},
callbacks=[make_tensorboard_callback(writer)],
)
Stable-Baselines3
from stable_baselines3 import PPO
from rewardguard_premium import AutoMonitor, make_sb3_callback
monitor = AutoMonitor(expected={"task": 0.7, "safety": 0.3})
model = PPO("MlpPolicy", env)
model.learn(total_timesteps=500_000, callback=make_sb3_callback(monitor))
Your environment must include
"reward_components"in itsinfodict for the SB3 callback.
Save / Load State
# Save after training
monitor.save("run_42_state.json")
# Resume later
monitor = AutoMonitor.load("run_42_state.json")
Export Data
monitor.to_json("results.json") # full state
monitor.to_csv("snapshots.csv") # one row per detection-phase step
AutoMonitor Parameters
| Parameter | Default | Description |
|---|---|---|
expected |
required | Target distribution, e.g. {"task": 0.7, "safety": 0.3} |
baseline_steps |
300 |
Warm-up steps before detection activates |
z_threshold |
2.5 |
Z-score at which a component is flagged |
auto_correct |
True |
Automatically adjust weights when flagged |
correction_rate |
0.2 |
Fraction of required correction applied per step |
tolerance |
5.0 |
Percentage-point tolerance for the free-tier check() method |
window |
200 |
Rolling window size for ratio computation |
drift_window |
30 |
Snapshots used to compute drift velocity |
callbacks |
[] |
List of callables invoked with each AlignmentSnapshot |
Free vs Premium
| Feature | Free (rewardguard) |
Premium |
|---|---|---|
| Live in-loop monitoring | ✅ | ✅ |
| Log-file analysis | ✅ | ✅ |
| Imbalance detection | ✅ | ✅ |
| Weight recommendations | ✅ | ✅ |
| Baseline learning | ❌ | ✅ |
| Z-score detection | ❌ | ✅ |
| Alignment score (0–1) | ❌ | ✅ |
| Drift velocity | ❌ | ✅ |
| Auto-correction | ❌ | ✅ |
| WandB / TensorBoard / SB3 | ❌ | ✅ |
| Save / load state | ❌ | ✅ |
License
Proprietary — requires an active RewardGuard Premium subscription. © 2026 RewardGuard | https://rewardguard.dev
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters