Multi-echelon inventory management Gymnasium environment with configurable topology, composable demand engines, and endogenous goodwill dynamics.

These details have not been verified by PyPI

Project description

gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods

A Gymnasium-compatible multi-echelon inventory management environment for reinforcement learning and operations research.

This repository contains the standalone environment package only. The paper benchmark agents, trained weights, result tables, and manuscript source live in the companion benchmark repository.

Project Links

Standalone environment package: r2barati/gym-invmgmt
Paper/code repository: r2barati/gym-invmgmt-paper
Trained checkpoint archive: rezabarati/gym-invmgmt-weights

Tested with:

Framework	Version
Stable Baselines3	≥2.0
Gymnasium	≥0.26
Ray RLlib	≥2.0
CleanRL	—

The environment simulates a configurable supply chain network with realistic logistics — production capacities, pipeline lead times, holding costs, backlog penalties — driven by a composable demand engine supporting non-stationary patterns and endogenous customer goodwill dynamics.

Multi-Echelon Network Topology

Installation

pip install gym-invmgmt

Until the first PyPI release is published, install directly from GitHub:

pip install git+https://github.com/r2barati/gym-invmgmt.git

For development (editable install):

git clone https://github.com/r2barati/gym-invmgmt.git
cd gym-invmgmt
pip install -e .

Release instructions for maintainers are in docs/releasing_to_pypi.md.

Repository Structure

gym-invmgmt/
├── gym_invmgmt/               ← Source code (all Python modules)
│   ├── core_env.py            ←   Gymnasium environment (step, reset, reward)
│   ├── demand_engine.py       ←   Non-stationary demand generation
│   ├── network_topology.py    ←   Graph builder (presets + YAML parser)
│   ├── visualization.py       ←   Network plotting
│   ├── utils.py               ←   Shared helpers (run_episode, compute_kpis)
│   ├── topologies/             ←   YAML network topology definitions
│   │   ├── diamond.yaml
│   │   ├── divergent.yaml
│   │   └── serial.yaml
│   └── wrappers/               ←   Action rounding, episode logging
├── examples/                  ← Runnable example scripts
│   ├── quickstart.py          ←   Minimal env usage
│   ├── run_heuristic.py       ←   Newsvendor base-stock policy
│   ├── run_or_baselines.py    ←   Classical OR policies comparison
│   ├── train_ppo.py           ←   PPO training with SB3
│   ├── benchmark_agents.py    ←   Multi-agent benchmark across scenarios
│   └── generate_videos.py     ←   Dashboard video generator
├── tests/                     ← Test suite
├── docs/                      ← Documentation
│   ├── guides/                ←   Tutorials and walkthroughs
│   └── reference/             ←   Technical reference docs
└── assets/                    ← Images for documentation

Quick Start

import gymnasium as gym
import gym_invmgmt

env = gym.make("GymInvMgmt/MultiEchelon-v0")
obs, info = env.reset(seed=42)

done = False
while not done:
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated

Environments

Environment ID	Topology	Nodes	Echelons	Action Dim	Obs Dim
`GymInvMgmt/MultiEchelon-v0`	Divergent Network	9	4 (Raw → Factory → Distributor → Retail)	11	71
`GymInvMgmt/Serial-v0`	Serial Chain	5	4 (Raw → Factory → Distributor → Retail)	3	15

Both environments default to 30-period episodes with stationary Poisson demand (μ=20).

Environment Details

Network Topology

The multi-echelon network features factories with production capacities, distributors as intermediary holding points, and retailers facing stochastic customer demand:

Detailed Network Parameters

The serial chain provides a simpler linear topology for focused experiments:

Serial Chain Topology

Observation Space

Box(-inf, inf, shape=(obs_dim,)) — a flat vector containing:

Demand: Current realized demand at each retail link
Inventory: On-hand inventory at each managed node (distributors + factories)
Pipeline: In-transit quantities for each supply link, broken out by lead-time position
Extra Features: Current time period t and demand sentiment s (goodwill multiplier)

Action Space

Box(0, max_order, shape=(n_reorder_links,)) — continuous order quantities for each supply link.

Note for RL practitioners: The upper bound is a conservative theoretical maximum (initial inventory + capacity × horizon). In practice, meaningful orders are much smaller. If using PPO/SAC out-of-the-box, consider wrapping with gymnasium.wrappers.RescaleAction(env, -1, 1) to normalize the action range, or use IntegerActionWrapper for discrete order quantities.

Reward Function

Each step returns the network-wide profit:

Reward = Σ (Revenue − Purchasing Cost − Holding Cost − Operating Cost − Backlog Penalty)

Revenue: Selling price × units sold at retail
Purchasing Cost: Unit cost × units ordered from upstream
Holding Cost: Per-unit cost for on-hand inventory and in-transit pipeline
Operating Cost: Factory variable cost per unit produced
Backlog Penalty: Per-unit penalty for unmet demand

Episode Termination

Episodes truncate after num_periods steps (default 30). There is no early termination.

Configuration

All parameters are configurable via gymnasium.make() kwargs or direct CoreEnv() instantiation:

Network Topologies

scenario='network' — Multi-echelon divergent network (3 factories, 2 distributors, 1 retailer)
scenario='serial' — Serial supply chain (1 factory → 1 distributor → 1 retailer)
scenario='custom' — User-defined topology loaded from a YAML config file (see below)

Demand Scenarios

The DemandEngine supports composable non-stationary effects:

Parameter	Description	Default
`type`	Demand profile: `'stationary'`, `'trend'`, `'seasonal'`, `'shock'`	`'stationary'`
`effects`	Composable list: `['trend', 'seasonal']` applies both simultaneously	—
`base_mu`	Base mean demand	`20`
`external_series`	NumPy array of per-period demand (replaces `base_mu` with real data)	`None`
`use_goodwill`	Enable endogenous demand–service feedback loop	`False`
`noise_scale`	Variance multiplier (0.0 = deterministic, 1.0 = default)	`1.0`

Example — shock demand with goodwill:

env = gym.make("GymInvMgmt/MultiEchelon-v0",
    demand_config={
        'type': 'shock',
        'base_mu': 25,
        'use_goodwill': True,
        'shock_time': 15,
        'shock_mag': 2.0,
    },
    num_periods=50,
)

Example — real-world demand data (e.g., M5 competition):

import numpy as np
real_demand = np.loadtxt("my_sales_data.csv")  # one value per period

env = gym.make("GymInvMgmt/MultiEchelon-v0",
    demand_config={
        'type': 'stationary',
        'base_mu': float(np.mean(real_demand)),
        'external_series': real_demand,
    },
    num_periods=len(real_demand),
)

Note: When overriding demand_config via gym.make(), provide the full dictionary — partial dictionaries replace the registered defaults entirely rather than merging recursively.

Fulfillment Modes

backlog=True (default) — Unmet demand accumulates as backlog, penalized each period
backlog=False — Unmet demand is lost immediately (lost sales model)

Custom Network Topologies

Beyond the two built-in presets, you can define any topology via a YAML config file:

Approach	When to Use	How
Built-in presets	Benchmarking, reproducible experiments	`scenario='network'` or `scenario='serial'`
YAML config file	Custom topologies without changing Python code	`scenario='custom', config_path='...'`

from gym_invmgmt import make_custom_env

# Load a custom topology — diamond network with parallel factories
env = make_custom_env('gym_invmgmt/topologies/diamond.yaml', num_periods=30)
obs, info = env.reset(seed=42)

The built-in presets (_build_network_scenario(), _build_serial_scenario()) define topologies in Python. The YAML parser (_build_custom_scenario()) reads the same node/edge structure from a config file, auto-detects node roles, and validates the graph.

See gym_invmgmt/topologies/ for ready-to-use YAML files, and the full YAML schema reference for all supported parameters.

Visualization

from gym_invmgmt import CoreEnv

env = CoreEnv(scenario='network')
env.plot_network()              # Simple topology view
env.plot_network(detailed=True) # With costs, lead times, capacities

Wrappers

The package includes two agent-agnostic wrappers:

`IntegerActionWrapper`

Rounds continuous actions to integers for physical realism (you can't order 3.7 units):

from gym_invmgmt import IntegerActionWrapper
env = IntegerActionWrapper(env)

`EpisodeLoggerWrapper`

Saves full trajectory matrices (inventory, demand, orders, profit, backlog) as .npz files:

from gym_invmgmt import EpisodeLoggerWrapper
env = EpisodeLoggerWrapper(env, log_dir="./logs", run_name="experiment_1")

Action Scaling for Deep RL

When training with PPO, SAC, or other deep RL algorithms, normalize the action space to [-1, 1]:

from gymnasium.wrappers import RescaleAction

env = CoreEnv(scenario='network')
env = RescaleAction(env, min_action=-1.0, max_action=1.0)
# Agent now outputs actions in [-1, 1], automatically mapped to valid order quantities

Architecture

CoreEnv (Gymnasium Environment)
├── Simulation Dynamics         # Order placement, delivery, demand fulfillment
│   ├── Allocation Logic        # Raw-material, distribution, factory constraints
│   ├── Pipeline Advancement    # Lead-time delayed deliveries & inventory update
│   └── Profit Calculation      # Revenue – procurement – holding – ops – penalty
├── State & Spaces              # Observation vector, action/observation spaces
├── SupplyChainNetwork          # Topology: nodes, edges, lead times, capacities
│   ├── _build_network_scenario()   # Built-in: multi-echelon divergent graph
│   ├── _build_serial_scenario()    # Built-in: serial supply chain
│   └── _build_custom_scenario()    # Custom: loads from YAML config file
├── DemandEngine                # Non-stationary demand generation
│   ├── Composable Effects      # Trend / Seasonal / Shock (can combine)
│   ├── External Series         # Real-world data injection (M5, Favorita, etc.)
│   └── Endogenous Goodwill     # Service-dependent demand feedback
└── Wrappers
    ├── IntegerActionWrapper     # Discrete order rounding
    └── EpisodeLoggerWrapper     # Trajectory recording

Documentation

For detailed mathematical formulations and parameter references:

MDP Formulation — State space, action space, transition dynamics, and reward function with equations
Demand Engine — Composable non-stationary effects (trend, seasonal, shock) and endogenous goodwill dynamics
External Datasets — Using real-world demand data (M5, Favorita, Rossmann) with recommended datasets and examples
Network Topologies — Complete node/edge parameter tables, both built-in presets and YAML custom topologies
Comparison with Prior Work — How this project relates to OR-Gym and OR-RL-Benchmarks

Tutorial & Examples

Getting Started Tutorial — A comprehensive walkthrough covering:

Section	What You'll Learn
The Problem	Why multi-echelon inventory management is hard
Network Exploration	Visualizing topology, understanding nodes & edges
Single Step Walkthrough	What happens inside `env.step()`
Constant Order Policy	Running a simple baseline and visualizing results
Lead Time Physics	Impulse response validation
Reward Breakdown	Decomposing profit into revenue, holding, and penalty
Demand Scenarios	Comparing stationary, trend, seasonal, shock, and combined
Bullwhip Effect	Observing order variance amplification across echelons
Goodwill Dynamics	Service-level feedback loops
Configuration Cookbook	Ready-to-use recipes for custom experiments

Visual Dynamics Guide — See exactly how decisions affect every node:

Visualization	What It Shows
Inventory Heatmap	On-hand stock at every node across time
Actions vs Filled	Requested orders vs capacity-constrained fulfillment
Pipeline Heatmap	In-transit units per link over time
KPI Dashboard	Demand, sales, fill rate, backlog, per-step and cumulative profit
Node Profit Breakdown	Per-node contribution to system profit

Citing

If you use this environment in your research, please cite:

@misc{gym-invmgmt,
  author = {Barati, Reza},
  title = {gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods},
  year = {2026},
  howpublished = {\url{https://github.com/r2barati/gym-invmgmt}},
}

This environment builds on foundational work by ORL Benchmarks (Balaji et al., 2019), OR-Gym (Hubbs et al., 2020), and Perez et al. (2021). See CITATION.cff for full references.

License

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

May 13, 2026

This version

0.2.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gym_invmgmt-0.2.0.tar.gz (70.4 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gym_invmgmt-0.2.0-py3-none-any.whl (61.7 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file gym_invmgmt-0.2.0.tar.gz.

File metadata

Download URL: gym_invmgmt-0.2.0.tar.gz
Upload date: May 13, 2026
Size: 70.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gym_invmgmt-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e6ce4c02ac9292fcad250aa5a2b0177a6e3628228c28cabd1cdb2bc434168deb`
MD5	`cc17f53e2489f83f9f47dbb63fc69003`
BLAKE2b-256	`41ffa5835897a7c52649b9e4f678f89a30cc9193e99c0f8f378f612ccfc3df00`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gym_invmgmt-0.2.0.tar.gz:

Publisher: publish.yml on r2barati/gym-invmgmt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gym_invmgmt-0.2.0.tar.gz
- Subject digest: e6ce4c02ac9292fcad250aa5a2b0177a6e3628228c28cabd1cdb2bc434168deb
- Sigstore transparency entry: 1521261012
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: r2barati/gym-invmgmt@227ca8db7174d0351f46fa0078b64d4b1e65944a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/r2barati
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@227ca8db7174d0351f46fa0078b64d4b1e65944a
- Trigger Event: push

File details

Details for the file gym_invmgmt-0.2.0-py3-none-any.whl.

File metadata

Download URL: gym_invmgmt-0.2.0-py3-none-any.whl
Upload date: May 13, 2026
Size: 61.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gym_invmgmt-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ba225abd8b5fb0ab5c30e1a981802a8d58dc2dc84afc1787ca4cccadc575dd0`
MD5	`7b7586243c7fc1a77d1fe139063f3352`
BLAKE2b-256	`a1231e1e4b9ffb94399cb8956ed2f74fdfc39640a40e94dd028d2037308b76f1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gym_invmgmt-0.2.0-py3-none-any.whl:

Publisher: publish.yml on r2barati/gym-invmgmt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gym_invmgmt-0.2.0-py3-none-any.whl
- Subject digest: 0ba225abd8b5fb0ab5c30e1a981802a8d58dc2dc84afc1787ca4cccadc575dd0
- Sigstore transparency entry: 1521261031
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: r2barati/gym-invmgmt@227ca8db7174d0351f46fa0078b64d4b1e65944a
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/r2barati
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@227ca8db7174d0351f46fa0078b64d4b1e65944a
- Trigger Event: push

gym-invmgmt 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods

Project Links

Installation

Repository Structure

Quick Start

Environments

Environment Details

Network Topology

Observation Space

Action Space

Reward Function

Episode Termination

Configuration

Network Topologies

Demand Scenarios

Fulfillment Modes

Custom Network Topologies

Visualization

Wrappers

IntegerActionWrapper

EpisodeLoggerWrapper

Action Scaling for Deep RL

Architecture

Documentation

Tutorial & Examples

Citing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`IntegerActionWrapper`

`EpisodeLoggerWrapper`