Skip to main content

Time Robustness Audit for RL agents — measures timing reliance, deployment robustness, and stress resilience

Project description

deltatau-audit

PyPI version CI Python 3.9+ License: MIT

Find and fix timing failures in RL agents.

RL agents silently break when deployment timing differs from training — frame drops, variable inference latency, sensor rate changes. deltatau-audit finds these failures and fixes them in one command.

Try it in 30 seconds

pip install "deltatau-audit[demo]"
python -m deltatau_audit demo cartpole

No GPU. No MuJoCo. Just pip install and run. You'll see a Before/After comparison:

Scenario Before (Baseline) After (Speed-Randomized) Change
5x speed 12% 49% +37pp
Speed jitter 66% 115% +49pp
Observation delay 82% 95% +13pp
Mid-episode spike 23% 62% +39pp
Deployment FAIL (0.23) DEGRADED (0.62) +0.39

The standard agent collapses under timing perturbations. Speed-randomized training dramatically improves robustness. Full HTML reports with charts are generated in demo_report/.

The same pattern at MuJoCo scale: HalfCheetah PPO

A PPO agent trained to reward ~990 on HalfCheetah-v5 shows even more catastrophic timing failures — all 4 scenarios statistically significant (95% bootstrap CI):

Scenario Return (% of nominal) 95% CI Drop
Observation delay (1 step) 3.8% [2.4%, 5.2%] -96%
Speed jitter (2 +/- 1) 25.4% [23.5%, 27.8%] -75%
5x speed (unseen) -9.3% [-10.6%, -8.4%] -109%
Mid-episode spike (1->5->1) 90.9% [86.3%, 97.8%] -9%

A single step of observation delay destroys 96% of performance. The agent goes negative at 5x speed.

HalfCheetah robustness audit results

View interactive report | Download report ZIP

Speed-randomized training fixes the problem

Scenario Before (Standard) After (Speed-Randomized) Change
Observation delay 2% 148% +146pp
Speed jitter 28% 121% +93pp
5x speed (unseen) -12% 38% +50pp
Mid-episode spike 100% 113% +13pp
Deployment FAIL (0.02) PASS (1.00)
Quadrant deployment_fragile deployment_ready

Robust agent audit results

View Before report | View After report

Reproduce HalfCheetah results
pip install "deltatau-audit[sb3,mujoco]"
git clone https://github.com/maruyamakoju/deltatau-audit.git
cd deltatau-audit
python examples/audit_halfcheetah.py              # standard PPO audit (~30 min)
python examples/train_robust_halfcheetah.py        # train robust PPO (~30 min)
python examples/audit_before_after.py              # Before/After comparison

Or download pre-trained models from Releases.

Install

pip install deltatau-audit            # core
pip install "deltatau-audit[demo]"    # + CartPole demo (recommended start)
pip install "deltatau-audit[sb3,mujoco]"  # + SB3 + MuJoCo environments

Find and Fix in One Command

pip install "deltatau-audit[sb3]"
deltatau-audit fix-sb3 --algo ppo --model my_model.zip --env HalfCheetah-v5

This single command:

  1. Audits your model (finds timing failures)
  2. Retrains with speed randomization (the fix)
  3. Re-audits the fixed model (verifies the fix)
  4. Generates Before/After comparison report
BEFORE vs AFTER

  Scenario        Before       After      Change
  ------------  ----------  ----------  ----------
  speed_5x           12.7%       76.6%  +    63.9pp
  jitter             43.7%      100.0%  +    56.3pp
  delay             100.0%      100.0%  +     0.0pp
  spike              26.7%       91.9%  +    65.2pp

  Deployment: FAIL (0.27) -> MILD (0.92)
  Quadrant:   deployment_fragile -> deployment_ready

Output: fixed model (.zip) + HTML reports + comparison.md.

Options: --timesteps (training budget), --speed-min/--speed-max (speed range), --ci (pipeline gate).

Audit Your Own SB3 Model

Just want the diagnosis? Use audit-sb3:

deltatau-audit audit-sb3 --algo ppo --model my_model.zip --env HalfCheetah-v5 --out my_report/

No model handy? Try with a sample:

gh release download assets -R maruyamakoju/deltatau-audit -p cartpole_ppo_sb3.zip
deltatau-audit audit-sb3 --algo ppo --model cartpole_ppo_sb3.zip --env CartPole-v1

Supported algorithms: ppo, sac, td3, a2c. Any Gymnasium environment ID works.

Python API (for custom workflows)
# Audit only
from deltatau_audit.adapters.sb3 import SB3Adapter
from deltatau_audit.auditor import run_full_audit
from deltatau_audit.report import generate_report
from stable_baselines3 import PPO
import gymnasium as gym

model = PPO.load("my_model.zip")
adapter = SB3Adapter(model)
result = run_full_audit(
    adapter,
    lambda: gym.make("HalfCheetah-v5"),
    speeds=[1, 2, 3, 5, 8],
    n_episodes=30,
)
generate_report(result, "my_audit/", title="My Agent Audit")

# Full fix pipeline
from deltatau_audit.fixer import fix_sb3_model
result = fix_sb3_model("my_model.zip", "ppo", "HalfCheetah-v5",
                       output_dir="fix_output/")
# result["fixed_model_path"] -> "fix_output/ppo_fixed.zip"

What It Measures

Badge What it tests How
Reliance Does the agent use internal timing? Tampers with internal Dt, measures value prediction error
Deployment Does the agent survive realistic timing changes? Jitter, observation delay, mid-episode speed spikes
Stress Does the agent survive extreme timing changes? 5x speed (unseen during training)

Agents without internal timing (standard PPO, SAC, etc.) get Reliance: N/A — only Deployment and Stress are tested.

Rating Scale

Rating Return Ratio Meaning
PASS > 95% Production ready
MILD > 80% Minor degradation
DEGRADED > 50% Significant loss
FAIL <= 50% Agent breaks

All return ratios include bootstrap 95% confidence intervals with significance testing.

CI Mode

python -m deltatau_audit demo cartpole --ci --out ci_report/
# exit 0 = pass, exit 1 = warn (stress), exit 2 = fail (deployment)

Outputs ci_summary.json and ci_summary.md for pipeline gates and PR comments.

GitHub Action (one line)

- uses: maruyamakoju/deltatau-audit@main
  with:
    command: audit-sb3
    model: model.zip
    algo: ppo
    env: CartPole-v1
    extras: sb3

Outputs status, deployment-score, stress-score for downstream steps. Exit code 0/1/2 for pass/warn/fail.

Full workflow examples

CartPole demo gate (zero config):

- uses: maruyamakoju/deltatau-audit@main

- uses: actions/upload-artifact@v4
  if: always()
  with:
    name: timing-audit
    path: audit_report/

Audit your own SB3 model:

- uses: maruyamakoju/deltatau-audit@main
  id: audit
  with:
    command: audit-sb3
    model: model.zip
    algo: ppo
    env: HalfCheetah-v5
    extras: "sb3,mujoco"

- run: echo "Deployment score: ${{ steps.audit.outputs.deployment-score }}"

Manual install (if you prefer):

- run: pip install "deltatau-audit[sb3]"
- run: deltatau-audit audit-sb3 --algo ppo --model model.zip --env CartPole-v1 --ci

Speed-Randomized Training (the fix)

The fix for timing failures is simple: train with variable speed. Use JitterWrapper during SB3 training:

import gymnasium as gym
from stable_baselines3 import PPO
from deltatau_audit.wrappers import JitterWrapper

# Wrap env with speed randomization (speed 1-5)
env = JitterWrapper(gym.make("CartPole-v1"), base_speed=3, jitter=2)

model = PPO("MlpPolicy", env)
model.learn(total_timesteps=100_000)
model.save("robust_model")

This is exactly what fix-sb3 does under the hood. Use the wrapper directly when you want more control over training.

Available wrappers: JitterWrapper (random speed), FixedSpeedWrapper (constant speed), PiecewiseSwitchWrapper (scheduled speed changes), ObservationDelayWrapper (sensor delay).

Audit CleanRL Agents

CleanRL agents are plain nn.Module subclasses — no framework wrapper needed.

deltatau-audit audit-cleanrl \
  --checkpoint runs/CartPole-v1/agent.pt \
  --agent-module ppo_cartpole.py \
  --agent-class Agent \
  --agent-kwargs obs_dim=4,act_dim=2 \
  --env CartPole-v1

Or via Python API:

from deltatau_audit.adapters.cleanrl import CleanRLAdapter

# Agent class must implement get_action_and_value(obs)
adapter = CleanRLAdapter(agent, lstm=False)
result = run_full_audit(adapter, env_factory, speeds=[1, 2, 3, 5, 8])

LSTM agents: pass --lstm (CLI) or CleanRLAdapter(agent, lstm=True) (API).

See examples/audit_cleanrl.py for a complete runnable example.

Sim-to-Real Transfer

Timing failures are one of the main causes of sim-to-real gaps. A policy that runs at 50 Hz in simulation may be deployed at 30 Hz or with variable latency in the real world — and collapse.

Simulation → Reality
  50 Hz → 30 Hz (0.6x speed)
  Fixed dt → Variable dt (jitter)
  Instant obs → Observation delay (network/sensor lag)
  Stable → Mid-episode spikes (system load)

deltatau-audit measures exactly these failure modes. If your agent passes Deployment ≥ MILD, it is likely to survive real-world timing variation.

IsaacLab / RSL-RL

For policies trained with IsaacLab (RSL-RL format):

from deltatau_audit.adapters.torch_policy import TorchPolicyAdapter

# Define your actor/critic architectures (same as training)
actor = MyActorNet(obs_dim=48, act_dim=12)
critic = MyCriticNet(obs_dim=48)

# Loads RSL-RL checkpoint format automatically
adapter = TorchPolicyAdapter.from_checkpoint(
    "model.pt",
    actor=actor,
    critic=critic,
    is_discrete=False,  # continuous actions
)

result = run_full_audit(adapter, env_factory, speeds=[1, 2, 3, 5])

Supported checkpoint formats:

  • {"model_state_dict": {"actor.*": ..., "critic.*": ...}} (RSL-RL)
  • {"actor": state_dict, "critic": state_dict} (explicit split)
  • Raw state_dict (actor-only)

Or use a callable — no checkpoint loading needed:

# Works with any framework's inference API
def my_act(obs):
    action = runner.alg.actor_critic.act(obs)
    value  = runner.alg.actor_critic.evaluate(obs)
    return action, value

adapter = TorchPolicyAdapter(my_act)

See examples/isaaclab_skeleton.py for a complete IsaacLab skeleton.

Custom Adapters

Implement AgentAdapter (see deltatau_audit/adapters/base.py):

from deltatau_audit.adapters.base import AgentAdapter

class MyAdapter(AgentAdapter):
    def reset_hidden(self, batch=1, device="cpu"):
        return torch.zeros(batch, hidden_dim)

    def act(self, obs, hidden):
        # Returns: (action, value, hidden_new, dt_or_None)
        ...
        return action, value, hidden_new, None

Built-in adapters: SB3Adapter (PPO/SAC/TD3/A2C), SB3RecurrentAdapter (RecurrentPPO), CleanRLAdapter (CleanRL MLP/LSTM), TorchPolicyAdapter (IsaacLab/RSL-RL/custom), InternalTimeAdapter (Dt-GRU models).

Comparing Results

python -m deltatau_audit diff before/summary.json after/summary.json --out comparison.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltatau_audit-0.4.1.tar.gz (226.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltatau_audit-0.4.1-py3-none-any.whl (221.0 kB view details)

Uploaded Python 3

File details

Details for the file deltatau_audit-0.4.1.tar.gz.

File metadata

  • Download URL: deltatau_audit-0.4.1.tar.gz
  • Upload date:
  • Size: 226.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deltatau_audit-0.4.1.tar.gz
Algorithm Hash digest
SHA256 815f3a428b8c361bb9f5de6c052a6dc6ff3876e7cc416513c213ed50a88be8a8
MD5 a3398cbb51f5e803115ee465fca97c34
BLAKE2b-256 7f2d68964c314206d85dc4b1413c880f1310c121019edeab08ad976519dc18d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for deltatau_audit-0.4.1.tar.gz:

Publisher: release.yml on maruyamakoju/deltatau-audit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deltatau_audit-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: deltatau_audit-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 221.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deltatau_audit-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a9a2a5cbc176c62bcd8a3f9cffa251cc33cda85f9c03833c39b2ab53039743e6
MD5 90d7ac42bb909bb8e087451da4ebf47c
BLAKE2b-256 969aff579a562e0001edce68f6172c84faab4afc41fc8b6cf762f8a9e15d27bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for deltatau_audit-0.4.1-py3-none-any.whl:

Publisher: release.yml on maruyamakoju/deltatau-audit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page