Skip to main content

Auto-tuning reward system for production RL applications (Premium)

Project description

RewardGuard Premium

Automatic Reward Alignment for Production RL — with Statistical Detection and Auto-Correction

RewardGuard Premium is the paid tier of RewardGuard. Install it freely from PyPI — just sign in to your RewardGuard account to activate.


Installation

pip install rewardguard-premium

On first use you will be prompted to sign in:

RewardGuard Premium — Sign in to your account
  Visit https://rewardguard.ai to create an account

  Email: you@example.com
  Password: ••••••••
  Signed in successfully!

Your session is saved to ~/.rewardguard/session.json and refreshed automatically. You only sign in once per machine.

Requires an active RewardGuard Premium subscription. Subscribe at: https://rewardguard.ai/premium


Sign-in CLI

rewardguard-premium login    # sign in (or switch accounts)
rewardguard-premium logout   # clear saved session
rewardguard-premium status   # show who you are signed in as

CI / automated environments — use env vars instead of the interactive prompt:

export REWARDGUARD_EMAIL=you@example.com
export REWARDGUARD_PASSWORD=yourpassword

Quick Start

from rewardguard_premium import AutoMonitor

monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    baseline_steps=300,   # warm-up before detection activates
    auto_correct=True,    # adjust weights automatically when flagged
)

for episode in range(num_episodes):
    for step in range(max_steps):
        r_task, r_safety = env.step(action)

        snapshot = monitor.step({"task": r_task, "safety": r_safety})

        if snapshot:
            if snapshot.flag == "critical":
                # Apply auto-corrected weights back to the environment
                env.set_reward_weights(monitor.weights)

monitor.print_report()

What AutoMonitor Does

Phase What happens
Warm-up (baseline_steps) Learns the normal ratio for your environment. Returns None from step().
Detection Computes per-component z-scores against the learned baseline.
Alignment score 0 (misaligned) → 1 (fully aligned), sigmoid-mapped from the max z-score.
Flagging ok (score > 0.75) / warning (> 0.5) / critical (≤ 0.5)
Auto-correction Adjusts per-component weight multipliers when flagged.
Drift velocity Linear regression slope over recent scores — distinguishes trends from spikes.

AlignmentSnapshot

Every call to step() after warm-up returns an AlignmentSnapshot:

snapshot.alignment_score      # float 0–1
snapshot.flag                 # "ok" / "warning" / "critical"
snapshot.z_scores             # {"task": -0.4, "safety": +2.8}
snapshot.drift_velocity       # negative = worsening trend
snapshot.corrections_applied  # {"safety": 1.24}  — weights changed this step
snapshot.component_ratios     # {"task": 68.3, "safety": 31.7}

Framework Integrations

Weights & Biases

from rewardguard_premium import AutoMonitor, make_wandb_callback
import wandb

wandb.init(project="my-rl-run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_wandb_callback()],
)

TensorBoard

from torch.utils.tensorboard import SummaryWriter
from rewardguard_premium import AutoMonitor, make_tensorboard_callback

writer = SummaryWriter("runs/my_run")
monitor = AutoMonitor(
    expected={"task": 0.7, "safety": 0.3},
    callbacks=[make_tensorboard_callback(writer)],
)

Stable-Baselines3

from stable_baselines3 import PPO
from rewardguard_premium import AutoMonitor, make_sb3_callback

monitor = AutoMonitor(expected={"task": 0.7, "safety": 0.3})
model = PPO("MlpPolicy", env)
model.learn(total_timesteps=500_000, callback=make_sb3_callback(monitor))

Your environment must include "reward_components" in its info dict for the SB3 callback.


Save / Load State

# Save after training
monitor.save("run_42_state.json")

# Resume later
monitor = AutoMonitor.load("run_42_state.json")

Export Data

monitor.to_json("results.json")   # full state
monitor.to_csv("snapshots.csv")   # one row per detection-phase step

AutoMonitor Parameters

Parameter Default Description
expected required Target distribution, e.g. {"task": 0.7, "safety": 0.3}
baseline_steps 300 Warm-up steps before detection activates
z_threshold 2.5 Z-score at which a component is flagged
auto_correct True Automatically adjust weights when flagged
correction_rate 0.2 Fraction of required correction applied per step
tolerance 5.0 Percentage-point tolerance for the free-tier check() method
window 200 Rolling window size for ratio computation
drift_window 30 Snapshots used to compute drift velocity
callbacks [] List of callables invoked with each AlignmentSnapshot

Free vs Premium

Feature Free (rewardguard) Premium
Live in-loop monitoring
Log-file analysis
Imbalance detection
Weight recommendations
Baseline learning
Z-score detection
Alignment score (0–1)
Drift velocity
Auto-correction
WandB / TensorBoard / SB3
Save / load state

License

Proprietary — requires an active RewardGuard Premium subscription. © 2026 RewardGuard | https://rewardguard.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rewardguard_premium-2.0.6.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rewardguard_premium-2.0.6-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file rewardguard_premium-2.0.6.tar.gz.

File metadata

  • Download URL: rewardguard_premium-2.0.6.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for rewardguard_premium-2.0.6.tar.gz
Algorithm Hash digest
SHA256 881190f65679c2a504db8abc882394c6c9780c413ca2da15255ea37c398f37e3
MD5 adce6f1d30338070ff8e79a0556555d7
BLAKE2b-256 bd8d78bd3852e6ed44070eaf3cdcf6f144fb2ec30168dd959feae24f57dd2c59

See more details on using hashes here.

File details

Details for the file rewardguard_premium-2.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for rewardguard_premium-2.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f8ca1edd6f72b6f99932753202122986ca4bcf4722b33594076e7d6ff02d0920
MD5 7bdbce76c414e270700aeb9b20f6bd4f
BLAKE2b-256 7bfb8cc8fe311f1dd9c73953ad7187f10fa93872932bce62aed754a45391b950

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page