Gymnasium environments for simulating energy nodes with battery energy storage systems
Project description
StorageNode Environment
Gymnasium environment for simulating an energy node with battery energy storage system (BESS). Physics-based battery modeling using commercial datasheet parameters for reinforcement learning applications.
Features
- Gymnasium-compatible environment registered as
storage_node_env/EnergyStorage-v0 - Physics-based battery modeling with commercial datasheet parameters
- Two energy node types: Producer (production only) and Prosumer (production + consumption)
- Modular reward system for different optimization objectives (self-consumption, energy arbitrage)
- Rule-based controllers for baseline comparison
- Flexible observation space with optional preprocessing and cyclical encoding
Installation
From Source (Development Mode)
git clone https://github.com/unisi-lab305/storage-node-environment.git
cd storage-node-environment
pip install -e .
From PyPI (When Published)
pip install storage-node-env
The environment is automatically registered with Gymnasium on import and can be instantiated using gym.make().
Quick Start
Method 1: Using gym.make() (Recommended)
import gymnasium as gym
import storage_node_env # Trigger environment registration
# Battery configuration
battery_config = {
'capacity': 5.12,
'dod_max': 90,
'power_charge_max': 2.5,
'power_discharge_max': 2.5,
'efficiency_charge': 0.95,
'efficiency_discharge': 0.95
}
# Create environment
env = gym.make(
'storage_node_env/EnergyStorage-v0',
node_type='prosumer',
csv_path='dataset/1h/prosumer_test_data.csv',
battery_config=battery_config,
delta_t=1.0
)
# Run simulation
obs, info = env.reset(seed=42)
for step in range(100):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
env.close()
Method 2: Direct Import (Backward Compatible)
from storage_node_env.gym import EnergyStorageEnv
battery_config = {
'capacity': 5.12,
'dod_max': 90,
'power_charge_max': 2.5,
'power_discharge_max': 2.5,
'efficiency_charge': 0.95,
'efficiency_discharge': 0.95
}
env = EnergyStorageEnv(
node_type='prosumer',
csv_path='dataset/1h/prosumer_test_data.csv',
battery_config=battery_config,
delta_t=1.0
)
obs, info = env.reset()
# ... same usage as above
Note: The gym.make() approach is recommended as it follows standard Gymnasium conventions and ensures compatibility with Gymnasium ecosystem tools.
Environment Parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
node_type |
str |
- | Yes | Type of energy node: 'producer' or 'prosumer' |
csv_path |
str |
- | Yes | Path to CSV file with historical data |
battery_config |
dict[str, float] |
- | Yes | Dictionary with battery parameters (see below) |
delta_t |
float |
- | Yes | Timestep duration in hours (e.g., 1.0, 0.25) |
lookback_n |
int |
2 |
No | Number of historical timesteps in observation buffer |
num_actions |
int |
21 |
No | Number of discrete actions (must be odd) |
use_preprocessing |
bool |
False |
No | Enable observation preprocessing (cyclical encoding, normalization) |
add_holiday |
bool |
True |
No | Add Italian holiday feature (requires use_preprocessing=True) |
reward_settings |
dict | None |
None |
No | Reward configuration (see Reward System section) |
CSV Data Requirements
The CSV file must contain a datetime column and node-specific columns:
For Producer Nodes:
datetime: Timestamp (e.g.,'2024-01-15 00:00:00')production: Power produced in kWbuy_price: Grid purchase pricesell_price: Grid selling price in €/kWh
For Prosumer Nodes:
datetime: Timestampproduction: Power produced in kWconsumption: Power consumed by loads in kWbuy_price: Grid purchase pricesell_price: Grid selling price in €/kWh
Important: The delta_t parameter must match the frequency of your CSV data (e.g., delta_t=1.0 for hourly data, delta_t=0.25 for 15-minute data).
Battery Configuration
The battery_config dictionary contains physical parameters for battery simulation based on commercial datasheets.
Parameters
| Parameter | Type | Required | Valid Range | Units | Description |
|---|---|---|---|---|---|
capacity |
float |
Yes | > 0 | kWh | Nominal capacity (C_nom) |
dod_max |
float |
Yes | 0 < x ≤ 100 | % | Maximum depth of discharge |
power_charge_max |
float |
Yes | > 0 | kW | Maximum charging power |
power_discharge_max |
float |
Yes | > 0 | kW | Maximum discharging power |
efficiency_charge |
float |
Yes | 0 < x ≤ 1 | - | Charging efficiency (e.g., 0.95 for 95%) |
efficiency_discharge |
float |
Yes | 0 < x ≤ 1 | - | Discharging efficiency (e.g., 0.95 for 95%) |
alpha |
float |
No | 0 ≤ x < 1 | - | Parasitic loss coefficient (default: 0.0) |
soc_initial |
float | None |
No | C_min ≤ x ≤ C_max | kWh | Initial state of charge (default: 50% capacity) |
Physical Meaning
- Capacity: Total energy storage when fully charged
- DoD (Depth of Discharge): Usable capacity percentage (e.g., 90% DoD means 90% of nominal capacity is usable)
- Power limits: C-rate constraints from battery datasheet (separate for charge/discharge)
- Efficiency: Round-trip energy losses during charge/discharge operations (separate for each direction)
- Alpha: Standby consumption per timestep (e.g., 0.001 = 0.1% loss per timestep)
- SoC initial: Starting energy level in kWh (if
None, starts at 50% of nominal capacity)
Power Convention
- Positive power = charging (battery absorbs energy from the grid)
- Negative power = discharging (battery releases energy to the grid)
Example Configuration
Typical values based on ZCS AZZURRO HV ZBT 5K battery:
battery_config = {
'capacity': 5.12, # 5.12 kWh nominal capacity
'dod_max': 90, # 90% depth of discharge
'power_charge_max': 2.5, # 2.5 kW maximum charging power
'power_discharge_max': 2.5, # 2.5 kW maximum discharging power
'efficiency_charge': 0.95, # 95% charging efficiency
'efficiency_discharge': 0.95, # 95% discharging efficiency
'alpha': 0.0, # No parasitic losses (optional)
'soc_initial': 2.56 # Start at 50% SoC (optional)
}
Derived parameters (computed automatically):
C_min = (1 - dod_max/100) × capacity→ Minimum usable SoC (kWh)C_max = capacity→ Maximum usable SoC (kWh)
Reward System
The environment provides a modular reward system supporting different optimization objectives through configurable reward calculators.
Available Reward Types
| Reward Type | Description | Best For | Suitable Node Types |
|---|---|---|---|
'self_consumption' |
Maximize local energy consumption, minimize grid dependency | Prosumer nodes optimizing grid independence | ['prosumer'] |
'energy_arbitrage' |
Maximize profit through price-based arbitrage (buy low, sell high) | Economic optimization | ['producer', 'prosumer'] |
Configuration Structure
reward_settings = {
'type': str, # Required: 'self_consumption' or 'energy_arbitrage'
'weights': dict[str, float], # Optional: weight coefficients
'normalize': bool # Optional: normalize rewards (default: False)
}
Weight Parameters
| Weight Key | Default | Description |
|---|---|---|
'main' |
1.0 |
Weight for main reward component |
'violation_penalty' |
0.1 |
Weight for power constraint violation penalty |
'storage_usage_penalty' |
0.01 |
Weight for battery usage/wear penalty |
Reward Composition
The total reward is a weighted linear combination:
total_reward = (weights['main'] × R_main)
- (weights['violation_penalty'] × P_violation)
- (weights['storage_usage_penalty'] × P_usage)
Where:
R_main: Main reward component (implementation-specific)P_violation: Absolute power constraint violation in kWP_usage: Battery usage penalty (absolute SoC change in percentage points)
Configuration Examples
1. Default (Automatic Selection)
If reward_settings=None, the environment automatically selects:
- Prosumer nodes →
'self_consumption' - Producer nodes →
'energy_arbitrage'
env = gym.make(
'storage_node_env/EnergyStorage-v0',
node_type='prosumer',
csv_path='dataset/1h/prosumer_test_data.csv',
battery_config=battery_config,
delta_t=1.0
# No reward_settings → uses 'self_consumption' by default
)
2. Minimal Configuration
Specify only the reward type, use default weights:
reward_settings = {
'type': 'energy_arbitrage'
# 'weights' will use defaults from registry
# 'normalize' will default to False
}
3. Balanced Strategy
Moderate optimization with constraint awareness:
reward_settings = {
'type': 'self_consumption',
'weights': {
'main': 1.0,
'violation_penalty': 0.5,
'storage_usage_penalty': 0.1
},
'normalize': False
}
4. Aggressive Optimization
High main weight, low penalties (may violate constraints):
reward_settings = {
'type': 'energy_arbitrage',
'weights': {
'main': 10.0, # Strong economic signal
'violation_penalty': 0.1, # Allow some violations
'storage_usage_penalty': 0.01 # Minimal wear penalty
}
}
5. Conservative Strategy
High penalties for strict constraint adherence:
reward_settings = {
'type': 'self_consumption',
'weights': {
'main': 1.0,
'violation_penalty': 5.0, # Strict constraint adherence
'storage_usage_penalty': 1.0 # Discourage battery cycling
}
}
Choosing Reward Type
| Node Type | Primary Goal | Recommended Reward |
|---|---|---|
| Prosumer | Minimize grid dependency | 'self_consumption' |
| Prosumer | Minimize costs | 'energy_arbitrage' |
| Producer | Maximize profit | 'energy_arbitrage' |
Reward Normalization
By default, rewards are raw (unnormalized) for interpretability and Stable-Baselines3 compatibility.
Option 1: SB3 VecNormalize (Recommended)
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
env = gym.make('storage_node_env/EnergyStorage-v0', ...)
env = DummyVecEnv([lambda: env])
env = VecNormalize(
env,
norm_obs=False, # Disable observation normalization
norm_reward=True, # Enable reward normalization
clip_reward=10.0,
gamma=0.99
)
Option 2: Built-in Normalization
reward_settings = {
'type': 'self_consumption',
'normalize': True # Enable built-in normalization
}
Rule-Based Controllers
The environment includes rule-based controllers that serve as baselines for comparing reinforcement learning agents. These controllers implement fixed decision rules.
Two usage patterns:
- Direct node evaluation (recommended for standalone RBC testing): Use controllers with energy node classes (Battery + Producer/Prosumer)
- Gymnasium environment evaluation (v0.4.0+, for RBC vs RL comparison): Use
get_controller_observation()method to evaluate controllers on Gymnasium environments
Available Controllers
| Controller | Policy | Use Case | Parameters |
|---|---|---|---|
NaiveController |
Always neutral action (no battery control) | Baseline to measure value of any control strategy | num_actions |
PriceBasedController |
Energy arbitrage based on electricity prices (charge at low prices, discharge at high prices) | Producer nodes or prosumers with time-of-use tariffs | num_actions, window_size, charge_action_pct, discharge_action_pct |
SelfConsumptionController |
Maximize local self-consumption (charge during excess production, discharge during deficit) | Prosumer nodes optimizing for grid independence | num_actions, balance_threshold |
Usage Example
Controllers are used with Node classes (Producer/Prosumer), not with the Gymnasium environment:
from storage_node_env.core import Prosumer, Battery
from storage_node_env.gym.controllers import SelfConsumptionController
# Create battery and node
battery = Battery(
capacity=30.0,
dod_max=90,
power_charge_max=10.0,
power_discharge_max=10.0,
efficiency_charge=0.95,
efficiency_discharge=0.95
)
node = Prosumer(
csv_path='dataset/1h/prosumer_test_data.csv',
delta_t=1.0,
num_actions=21
)
node.set_storage(battery)
node.reset()
# Create controller
controller = SelfConsumptionController(num_actions=21, balance_threshold=0.5)
# Evaluation loop
total_cost = 0.0
for t in range(len(node.data) - 2):
# Get current data
current_row = node.data.iloc[node.time_step]
# Build observation dictionary for controller
observation = {
'production': current_row['production'],
'consumption': current_row['consumption'],
'buy_price': current_row['buy_price'],
'sell_price': current_row['sell_price'],
'energy_balance': current_row['production'] - current_row['consumption'],
'final_soc': battery.soc_percent,
'upper_bound': battery.get_bounds_percent(node.delta_t)[0],
'lower_bound': battery.get_bounds_percent(node.delta_t)[1]
}
# Get action from controller
action = controller.choose_action(observation, {})
# Step node
node_results = node.step(action)
total_cost += node_results['net_cost']
# Advance time
node.advance_time()
print(f'Total cost: {total_cost:.4f} €')
Evaluating Controllers on Gymnasium Environment (v0.4.0+)
NEW: For comparing rule-based controllers against RL agents on the same environment:
from typing import cast
import gymnasium as gym
from storage_node_env.gym import EnergyStorageEnv
from storage_node_env.gym.controllers import SelfConsumptionController
# Create environment
env = gym.make(
'storage_node_env/EnergyStorage-v0',
node_type='prosumer',
csv_path='dataset/1h/prosumer_test_data.csv',
battery_config=battery_config,
delta_t=1.0
)
# Access unwrapped environment for custom methods
gym_env = cast(EnergyStorageEnv, env.unwrapped)
# Create controller
controller = SelfConsumptionController(num_actions=21)
# Evaluation loop
obs, info = env.reset(seed=42)
total_cost = 0.0
while True:
# Get controller observation from unwrapped environment
controller_obs = gym_env.get_controller_observation()
action = controller.choose_action(controller_obs, {})
obs, reward, terminated, truncated, info = env.step(action)
total_cost += info['net_cost']
if terminated or truncated:
break
print(f'Total cost: {total_cost:.4f} €')
env.close()
Benefits:
- ✅ RBC and RL agents see identical data
- ✅ Works with Gym wrappers (VecEnv, Monitor)
- ✅ Type-safe API with
get_controller_observation()
Instantiation Examples
from storage_node_env.gym.controllers import (
NaiveController,
PriceBasedController,
SelfConsumptionController
)
# 1. Naive controller (baseline)
naive = NaiveController(num_actions=21)
# 2. Price-based controller (energy arbitrage)
price_based = PriceBasedController(
num_actions=21,
window_size=168, # 1 week rolling window
charge_action_pct=75.0, # 50% charge power
discharge_action_pct=25.0 # 50% discharge power
)
price_based.reset() # Reset before each episode
# 3. Self-consumption controller
self_consumption = SelfConsumptionController(
num_actions=21,
balance_threshold=0.5 # Minimum 0.5 kW imbalance to act
)
Utility Functions
from storage_node_env.gym.controllers import list_controllers, print_controllers
# List available controllers
controllers_info = list_controllers()
# Returns: {'NaiveController': 'description...', 'PriceBasedController': ...}
# Print formatted information
print_controllers()
Complete Examples
Example 1: Prosumer with Preprocessing
import gymnasium as gym
import storage_node_env
battery_config = {
'capacity': 5.12,
'dod_max': 90,
'power_charge_max': 2.5,
'power_discharge_max': 2.5,
'efficiency_charge': 0.95,
'efficiency_discharge': 0.95
}
reward_settings = {
'type': 'self_consumption',
'weights': {
'main': 1.0,
'violation_penalty': 0.1,
'storage_usage_penalty': 0.01
}
}
env = gym.make(
'storage_node_env/EnergyStorage-v0',
node_type='prosumer',
csv_path='dataset/1h/prosumer_test_data.csv',
battery_config=battery_config,
delta_t=1.0,
lookback_n=2,
use_preprocessing=True, # Enable cyclical encoding
add_holiday=True, # Add holiday feature
reward_settings=reward_settings
)
obs, info = env.reset(seed=42)
for step in range(100):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
print(f'Step {step+1}: reward={reward:.4f}, net_cost={info["net_cost"]:.4f} €')
if terminated or truncated:
break
env.close()
Example 2: Producer with Energy Arbitrage
import gymnasium as gym
import storage_node_env
battery_config = {
'capacity': 30.0,
'dod_max': 90,
'power_charge_max': 10.0,
'power_discharge_max': 10.0,
'efficiency_charge': 0.95,
'efficiency_discharge': 0.95
}
reward_settings = {
'type': 'energy_arbitrage',
'weights': {
'main': 100.0, # Amplify economic signal
'violation_penalty': 10.0,
'storage_usage_penalty': 1.0
}
}
env = gym.make(
'storage_node_env/EnergyStorage-v0',
node_type='producer',
csv_path='dataset/1h/producer_test_data.csv',
battery_config=battery_config,
delta_t=1.0,
reward_settings=reward_settings
)
obs, info = env.reset()
for step in range(100):
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
print(f'Step {step+1}: reward={reward:.4f}, net_profit={info["net_profit"]:.4f} €')
if terminated or truncated:
break
env.close()
Example 3: Training with Stable-Baselines3
import gymnasium as gym
import storage_node_env
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
battery_config = {
'capacity': 5.12,
'dod_max': 90,
'power_charge_max': 2.5,
'power_discharge_max': 2.5,
'efficiency_charge': 0.95,
'efficiency_discharge': 0.95
}
# Create environment
env = gym.make(
'storage_node_env/EnergyStorage-v0',
node_type='prosumer',
csv_path='dataset/1h/prosumer_test_data.csv',
battery_config=battery_config,
delta_t=1.0,
use_preprocessing=True
)
# Wrap in vectorized environment and normalize rewards
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=False, norm_reward=True, clip_reward=10.0)
# Train PPO agent
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=100000)
# Save model
model.save('ppo_prosumer')
Project Structure
storage_node_env/
├── core/ # Core simulation components
│ ├── base/ # Abstract base classes
│ ├── storage/ # Battery implementation
│ └── nodes/ # Energy node implementations (Producer, Prosumer)
├── gym/ # Gymnasium integration
│ ├── energy_storage_env.py # Main environment class
│ ├── utils.py # Observation building utilities
│ ├── preprocessing/ # Feature encoding and preprocessing
│ ├── rewards/ # Modular reward system
│ └── controllers/ # Rule-based baseline controllers
└── __init__.py # Package initialization and version info
Documentation
- REWARD_SYSTEM.md: Detailed reward system documentation
- CONTROLLERS.md: Detailed reward system documentation
Repository
- GitHub: https://github.com/unisi-lab305/storage-node-environment
- License: MIT
Citation
If you use this environment in your research, please cite:
@software{storage_node_env,
title = {Storage Node Environment: Gymnasium Environment for Battery Energy Storage Systems},
author = {Leonardo Guiducci},
email = {leonardo.guiducci@unisi.it},
year = {2025},
url = {https://github.com/unisi-lab305/storage-node-environment}
}
Contributing
Contributions are welcome! Please see CLAUDE.md for development guidelines and coding standards.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file storage_node_env-0.8.2.tar.gz.
File metadata
- Download URL: storage_node_env-0.8.2.tar.gz
- Upload date:
- Size: 82.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7cab24ff863c153538698b2d9eb3e1ed82cac26595fc7a2b2d0b41f970045e5e
|
|
| MD5 |
21456dd73140239717e451dfb10f15f4
|
|
| BLAKE2b-256 |
0d8186b8f5295204fcdf835b6452e2ca9f1c9481721e317936bf722d802b540d
|
File details
Details for the file storage_node_env-0.8.2-py3-none-any.whl.
File metadata
- Download URL: storage_node_env-0.8.2-py3-none-any.whl
- Upload date:
- Size: 100.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f31e3fc68c0f7e11db64a79f6f90f7f3ad850b363ba63b00cc86001bcd9d05e9
|
|
| MD5 |
5d59c3df108edf46a6f890eadaaed538
|
|
| BLAKE2b-256 |
85f9956e7504d8ef95688cb86d1cb9fa2adc14306a51dd0e0bf56ba0913d116f
|