Multi-echelon inventory management Gymnasium environment with configurable topology, composable demand engines, and endogenous goodwill dynamics.
Project description
gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods
A Gymnasium-compatible multi-echelon inventory management environment for reinforcement learning and operations research.
This repository contains the standalone environment package only. The paper benchmark agents, trained weights, result tables, and manuscript source live in the companion benchmark repository.
Project Links
- PyPI package: gym-invmgmt
- Standalone environment package: r2barati/gym-invmgmt
- Paper/code repository: r2barati/gym-invmgmt-paper
- Trained checkpoint archive: rezabarati/gym-invmgmt-weights
Tested with:
| Framework | Version |
|---|---|
| Stable Baselines3 | ≥2.0 |
| Gymnasium | ≥0.26 |
| Ray RLlib | ≥2.0 |
| CleanRL | — |
The environment simulates a configurable supply chain network with realistic logistics — production capacities, pipeline lead times, holding costs, backlog penalties — driven by a composable demand engine supporting non-stationary patterns and endogenous customer goodwill dynamics.
Installation
pip install gym-invmgmt
To upgrade an existing installation:
pip install -U gym-invmgmt
To install the latest development version directly from GitHub:
pip install git+https://github.com/r2barati/gym-invmgmt.git
For development (editable install):
git clone https://github.com/r2barati/gym-invmgmt.git
cd gym-invmgmt
pip install -e .
Release instructions for maintainers are in
docs/releasing_to_pypi.md.
Repository Structure
gym-invmgmt/
├── gym_invmgmt/ ← Source code (all Python modules)
│ ├── core_env.py ← Gymnasium environment (step, reset, reward)
│ ├── demand_engine.py ← Non-stationary demand generation
│ ├── network_topology.py ← Graph builder (presets + YAML parser)
│ ├── visualization.py ← Network plotting
│ ├── utils.py ← Shared helpers (run_episode, compute_kpis)
│ ├── topologies/ ← YAML network topology definitions
│ │ ├── assembly.yaml
│ │ ├── diamond.yaml
│ │ ├── distribution_tree.yaml
│ │ ├── divergent.yaml
│ │ ├── serial.yaml
│ │ └── w_network.yaml
│ └── wrappers/ ← Action rounding, episode logging
├── examples/ ← Runnable example scripts
│ ├── quickstart.py ← Minimal env usage
│ ├── run_heuristic.py ← Newsvendor base-stock policy
│ ├── run_or_baselines.py ← Classical OR policies comparison
│ ├── train_ppo.py ← PPO training with SB3
│ ├── benchmark_agents.py ← Multi-agent benchmark across scenarios
│ └── generate_videos.py ← Dashboard video generator
├── tests/ ← Test suite
├── docs/ ← Documentation
│ ├── guides/ ← Tutorials and walkthroughs
│ └── reference/ ← Technical reference docs
└── assets/ ← Images for documentation
Quick Start
import gymnasium as gym
import gym_invmgmt
env = gym.make("GymInvMgmt/MultiEchelon-v0")
obs, info = env.reset(seed=42)
done = False
while not done:
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
Environments
| Environment ID | Topology | Nodes | Echelons | Action Dim | Obs Dim |
|---|---|---|---|---|---|
GymInvMgmt/MultiEchelon-v0 |
Divergent Network | 9 | 4 (Raw → Factory → Distributor → Retail) | 11 | 71 |
GymInvMgmt/Serial-v0 |
Serial Chain | 5 | 4 (Raw → Factory → Distributor → Retail) | 3 | 15 |
Both environments default to 30-period episodes with stationary Poisson demand (μ=20).
Environment Details
Network Topology
The multi-echelon network features factories with production capacities, distributors as intermediary holding points, and retailers facing stochastic customer demand:
The serial chain provides a simpler linear topology for focused experiments:
Observation Space
Box(-inf, inf, shape=(obs_dim,)) — a flat vector containing:
- Demand: Current realized demand at each retail link
- Inventory: On-hand inventory at each managed node (distributors + factories)
- Pipeline: In-transit quantities for each supply link, broken out by lead-time position
- Extra Features: Current time period
tand demand sentiments(goodwill multiplier)
Action Space
Box(0, max_order, shape=(n_reorder_links,)) — continuous order quantities for each supply link.
Note for RL practitioners: The upper bound is a conservative theoretical maximum (initial inventory + capacity × horizon). In practice, meaningful orders are much smaller. If using PPO/SAC out-of-the-box, consider wrapping with
gymnasium.wrappers.RescaleAction(env, -1, 1)to normalize the action range, or useIntegerActionWrapperfor discrete order quantities.
Reward Function
Each step returns the network-wide profit:
Reward = Σ (Revenue − Purchasing Cost − Holding Cost − Operating Cost − Backlog Penalty)
- Revenue: Selling price × units sold at retail
- Purchasing Cost: Unit cost × units ordered from upstream
- Holding Cost: Per-unit cost for on-hand inventory and in-transit pipeline
- Operating Cost: Factory variable cost per unit produced
- Backlog Penalty: Per-unit penalty for unmet demand
Episode Termination
Episodes truncate after num_periods steps (default 30). There is no early termination.
Configuration
All parameters are configurable via gymnasium.make() kwargs or direct CoreEnv() instantiation:
Network Topologies
scenario='network'— Multi-echelon divergent network (3 factories, 2 distributors, 1 retailer)scenario='serial'— Serial supply chain (1 factory → 1 distributor → 1 retailer)scenario='custom'— User-defined topology loaded from a YAML config file (see below)
Demand Scenarios
The DemandEngine supports composable non-stationary effects:
| Parameter | Description | Default |
|---|---|---|
type |
Demand profile: 'stationary', 'trend', 'seasonal', 'shock' |
'stationary' |
effects |
Composable list: ['trend', 'seasonal'] applies both simultaneously |
— |
base_mu |
Base mean demand | 20 |
external_series |
NumPy array of per-period demand (replaces base_mu with real data) |
None |
use_goodwill |
Enable endogenous demand–service feedback loop | False |
noise_scale |
Variance multiplier (0.0 = deterministic, 1.0 = default) | 1.0 |
Example — shock demand with goodwill:
env = gym.make("GymInvMgmt/MultiEchelon-v0",
demand_config={
'type': 'shock',
'base_mu': 25,
'use_goodwill': True,
'shock_time': 15,
'shock_mag': 2.0,
},
num_periods=50,
)
Example — real-world demand data (e.g., M5 competition):
import numpy as np
real_demand = np.loadtxt("my_sales_data.csv") # one value per period
env = gym.make("GymInvMgmt/MultiEchelon-v0",
demand_config={
'type': 'stationary',
'base_mu': float(np.mean(real_demand)),
'external_series': real_demand,
},
num_periods=len(real_demand),
)
Note: When overriding
demand_configviagym.make(), provide the full dictionary — partial dictionaries replace the registered defaults entirely rather than merging recursively.
Fulfillment Modes
backlog=True(default) — Unmet demand accumulates as backlog, penalized each periodbacklog=False— Unmet demand is lost immediately (lost sales model)
Custom Network Topologies
Beyond the two built-in presets, you can define any topology via a YAML config file:
| Approach | When to Use | How |
|---|---|---|
| Built-in presets | Benchmarking, reproducible experiments | scenario='network' or scenario='serial' |
| YAML config file | Custom topologies without changing Python code | scenario='custom', config_path='...' |
from gym_invmgmt import make_custom_env
# Load a custom topology — diamond network with parallel factories
env = make_custom_env('gym_invmgmt/topologies/diamond.yaml', num_periods=30)
obs, info = env.reset(seed=42)
The built-in presets (_build_network_scenario(), _build_serial_scenario()) define topologies in Python. The YAML parser (_build_custom_scenario()) reads the same node/edge structure from a config file, auto-detects node roles, and validates the graph.
See gym_invmgmt/topologies/ for ready-to-use YAML files, and the full YAML schema reference for all supported parameters.
Visualization
from gym_invmgmt import CoreEnv
env = CoreEnv(scenario='network')
env.plot_network() # Simple topology view
env.plot_network(detailed=True) # With costs, lead times, capacities
Wrappers
The package includes two agent-agnostic wrappers:
IntegerActionWrapper
Rounds continuous actions to integers for physical realism (you can't order 3.7 units):
from gym_invmgmt import IntegerActionWrapper
env = IntegerActionWrapper(env)
EpisodeLoggerWrapper
Saves full trajectory matrices (inventory, demand, orders, profit, backlog) as .npz files:
from gym_invmgmt import EpisodeLoggerWrapper
env = EpisodeLoggerWrapper(env, log_dir="./logs", run_name="experiment_1")
Action Scaling for Deep RL
When training with PPO, SAC, or other deep RL algorithms, normalize the action space to [-1, 1]:
from gymnasium.wrappers import RescaleAction
env = CoreEnv(scenario='network')
env = RescaleAction(env, min_action=-1.0, max_action=1.0)
# Agent now outputs actions in [-1, 1], automatically mapped to valid order quantities
Architecture
CoreEnv (Gymnasium Environment)
├── Simulation Dynamics # Order placement, delivery, demand fulfillment
│ ├── Allocation Logic # Raw-material, distribution, factory constraints
│ ├── Pipeline Advancement # Lead-time delayed deliveries & inventory update
│ └── Profit Calculation # Revenue – procurement – holding – ops – penalty
├── State & Spaces # Observation vector, action/observation spaces
├── SupplyChainNetwork # Topology: nodes, edges, lead times, capacities
│ ├── _build_network_scenario() # Built-in: multi-echelon divergent graph
│ ├── _build_serial_scenario() # Built-in: serial supply chain
│ └── _build_custom_scenario() # Custom: loads from YAML config file
├── DemandEngine # Non-stationary demand generation
│ ├── Composable Effects # Trend / Seasonal / Shock (can combine)
│ ├── External Series # Real-world data injection (M5, Favorita, etc.)
│ └── Endogenous Goodwill # Service-dependent demand feedback
└── Wrappers
├── IntegerActionWrapper # Discrete order rounding
└── EpisodeLoggerWrapper # Trajectory recording
Documentation
For detailed mathematical formulations and parameter references:
- MDP Formulation — State space, action space, transition dynamics, and reward function with equations
- Demand Engine — Composable non-stationary effects (trend, seasonal, shock) and endogenous goodwill dynamics
- External Datasets — Using real-world demand data (M5, Favorita, Rossmann) with recommended datasets and examples
- Network Topologies — Complete node/edge parameter tables, both built-in presets and YAML custom topologies
- Comparison with Prior Work — How this project relates to OR-Gym and OR-RL-Benchmarks
Tutorial & Examples
Getting Started Tutorial — A comprehensive walkthrough covering:
| Section | What You'll Learn |
|---|---|
| The Problem | Why multi-echelon inventory management is hard |
| Network Exploration | Visualizing topology, understanding nodes & edges |
| Single Step Walkthrough | What happens inside env.step() |
| Constant Order Policy | Running a simple baseline and visualizing results |
| Lead Time Physics | Impulse response validation |
| Reward Breakdown | Decomposing profit into revenue, holding, and penalty |
| Demand Scenarios | Comparing stationary, trend, seasonal, shock, and combined |
| Bullwhip Effect | Observing order variance amplification across echelons |
| Goodwill Dynamics | Service-level feedback loops |
| Configuration Cookbook | Ready-to-use recipes for custom experiments |
Visual Dynamics Guide — See exactly how decisions affect every node:
| Visualization | What It Shows |
|---|---|
| Inventory Heatmap | On-hand stock at every node across time |
| Actions vs Filled | Requested orders vs capacity-constrained fulfillment |
| Pipeline Heatmap | In-transit units per link over time |
| KPI Dashboard | Demand, sales, fill rate, backlog, per-step and cumulative profit |
| Node Profit Breakdown | Per-node contribution to system profit |
Citing
If you use this environment in your research, please cite:
@misc{gym-invmgmt,
author = {Barati, Reza},
title = {gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods},
year = {2026},
howpublished = {\url{https://github.com/r2barati/gym-invmgmt}},
}
This environment builds on foundational work by ORL Benchmarks (Balaji et al., 2019), OR-Gym (Hubbs et al., 2020), and Perez et al. (2021). See CITATION.cff for full references.
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gym_invmgmt-0.2.1.tar.gz.
File metadata
- Download URL: gym_invmgmt-0.2.1.tar.gz
- Upload date:
- Size: 70.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2a286cd95d9967b27db5ed5301d262862dd98bfbe1b50358fc9944af1d60311
|
|
| MD5 |
69ff7ff68c8acb1518c76c2cafa973c6
|
|
| BLAKE2b-256 |
056405f98536003de4d181b8fa7583ec37ec9e75a3e5a26752f4357b4e00c02e
|
Provenance
The following attestation bundles were made for gym_invmgmt-0.2.1.tar.gz:
Publisher:
publish.yml on r2barati/gym-invmgmt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gym_invmgmt-0.2.1.tar.gz -
Subject digest:
a2a286cd95d9967b27db5ed5301d262862dd98bfbe1b50358fc9944af1d60311 - Sigstore transparency entry: 1521389415
- Sigstore integration time:
-
Permalink:
r2barati/gym-invmgmt@e3e1e2132cfb6d3d7c2c8e7f2e79e670a994d97d -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/r2barati
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e3e1e2132cfb6d3d7c2c8e7f2e79e670a994d97d -
Trigger Event:
push
-
Statement type:
File details
Details for the file gym_invmgmt-0.2.1-py3-none-any.whl.
File metadata
- Download URL: gym_invmgmt-0.2.1-py3-none-any.whl
- Upload date:
- Size: 61.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55947e88a5f8287a31f7fa320d57de548df85fb5c033697bb742888b3db3a302
|
|
| MD5 |
ebc6ce1cc50b0b3b1868025696abd551
|
|
| BLAKE2b-256 |
c5af9e0c1baa79a9856f805d8b74928fd418a82bb053909920a6fd32f7b78958
|
Provenance
The following attestation bundles were made for gym_invmgmt-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on r2barati/gym-invmgmt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gym_invmgmt-0.2.1-py3-none-any.whl -
Subject digest:
55947e88a5f8287a31f7fa320d57de548df85fb5c033697bb742888b3db3a302 - Sigstore transparency entry: 1521389472
- Sigstore integration time:
-
Permalink:
r2barati/gym-invmgmt@e3e1e2132cfb6d3d7c2c8e7f2e79e670a994d97d -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/r2barati
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e3e1e2132cfb6d3d7c2c8e7f2e79e670a994d97d -
Trigger Event:
push
-
Statement type: