Skip to main content

Gymnasium environments for simulating energy nodes with battery energy storage systems

Project description

StorageNode Environment

Gymnasium environment for simulating an energy node with battery energy storage system (BESS). Physics-based battery modeling using commercial datasheet parameters for reinforcement learning applications.

Python 3.12 License: MIT CI Tests

Features

  • Gymnasium-compatible environment registered as storage_node_env/EnergyStorage-v0
  • Physics-based battery modeling with commercial datasheet parameters
  • Two energy node types: Producer (production only) and Prosumer (production + consumption)
  • Modular reward system for different optimization objectives (self-consumption, energy arbitrage)
  • Rule-based controllers for baseline comparison
  • Flexible observation space with optional preprocessing and cyclical encoding

Installation

From Source (Development Mode)

git clone https://github.com/unisi-lab305/storage-node-environment.git
cd storage-node-environment
pip install -e .

From PyPI (When Published)

pip install storage-node-env

The environment is automatically registered with Gymnasium on import and can be instantiated using gym.make().

Quick Start

Method 1: Using gym.make() (Recommended)

import gymnasium as gym
import storage_node_env  # Trigger environment registration

# Battery configuration
battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

# Run simulation
obs, info = env.reset(seed=42)
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        break

env.close()

Method 2: Direct Import (Backward Compatible)

from storage_node_env.gym import EnergyStorageEnv

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

env = EnergyStorageEnv(
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

obs, info = env.reset()
# ... same usage as above

Note: The gym.make() approach is recommended as it follows standard Gymnasium conventions and ensures compatibility with Gymnasium ecosystem tools.

Environment Parameters

Parameter Type Default Required Description
node_type str - Yes Type of energy node: 'producer' or 'prosumer'
csv_path str - Yes Path to CSV file with historical data
battery_config dict[str, float] - Yes Dictionary with battery parameters (see below)
delta_t float - Yes Timestep duration in hours (e.g., 1.0, 0.25)
lookback_n int 2 No Number of historical timesteps in observation buffer
num_actions int 21 No Number of discrete actions (must be odd)
use_preprocessing bool False No Enable observation preprocessing (cyclical encoding, normalization)
add_holiday bool True No Add Italian holiday feature (requires use_preprocessing=True)
reward_settings dict | None None No Reward configuration (see Reward System section)

CSV Data Requirements

The CSV file must contain a datetime column and node-specific columns:

For Producer Nodes:

  • datetime: Timestamp (e.g., '2024-01-15 00:00:00')
  • production: Power produced in kW
  • buy_price: Grid purchase price
  • sell_price: Grid selling price in €/kWh

For Prosumer Nodes:

  • datetime: Timestamp
  • production: Power produced in kW
  • consumption: Power consumed by loads in kW
  • buy_price: Grid purchase price
  • sell_price: Grid selling price in €/kWh

Important: The delta_t parameter must match the frequency of your CSV data (e.g., delta_t=1.0 for hourly data, delta_t=0.25 for 15-minute data).

Battery Configuration

The battery_config dictionary contains physical parameters for battery simulation based on commercial datasheets.

Parameters

Parameter Type Required Valid Range Units Description
capacity float Yes > 0 kWh Nominal capacity (C_nom)
dod_max float Yes 0 < x ≤ 100 % Maximum depth of discharge
power_charge_max float Yes > 0 kW Maximum charging power
power_discharge_max float Yes > 0 kW Maximum discharging power
efficiency_charge float Yes 0 < x ≤ 1 - Charging efficiency (e.g., 0.95 for 95%)
efficiency_discharge float Yes 0 < x ≤ 1 - Discharging efficiency (e.g., 0.95 for 95%)
alpha float No 0 ≤ x < 1 - Parasitic loss coefficient (default: 0.0)
soc_initial float | None No C_min ≤ x ≤ C_max kWh Initial state of charge (default: 50% capacity)
allow_arbitrage bool No True / False - If False, charging is capped at current PV production each timestep — battery cannot charge from the grid. Compatible with all reward types and controllers. (default: True)

Physical Meaning

  • Capacity: Total energy storage when fully charged
  • DoD (Depth of Discharge): Usable capacity percentage (e.g., 90% DoD means 90% of nominal capacity is usable)
  • Power limits: C-rate constraints from battery datasheet (separate for charge/discharge)
  • Efficiency: Round-trip energy losses during charge/discharge operations (separate for each direction)
  • Alpha: Standby consumption per timestep (e.g., 0.001 = 0.1% loss per timestep)
  • SoC initial: Starting energy level in kWh (if None, starts at 50% of nominal capacity)

Power Convention

  • Positive power = charging (battery absorbs energy from the grid)
  • Negative power = discharging (battery releases energy to the grid)

Example Configuration

Typical values based on ZCS AZZURRO HV ZBT 5K battery:

battery_config = {
    'capacity': 5.12,                    # 5.12 kWh nominal capacity
    'dod_max': 90,                       # 90% depth of discharge
    'power_charge_max': 2.5,             # 2.5 kW maximum charging power
    'power_discharge_max': 2.5,          # 2.5 kW maximum discharging power
    'efficiency_charge': 0.95,           # 95% charging efficiency
    'efficiency_discharge': 0.95,        # 95% discharging efficiency
    'alpha': 0.0,                        # No parasitic losses (optional)
    'soc_initial': 2.56                  # Start at 50% SoC (optional)
}

Derived parameters (computed automatically):

  • C_min = (1 - dod_max/100) × capacity → Minimum usable SoC (kWh)
  • C_max = capacity → Maximum usable SoC (kWh)

Reward System

The environment provides a modular reward system supporting different optimization objectives through configurable reward calculators.

Available Reward Types

Reward Type Description Best For Suitable Node Types
'self_consumption' Maximize local energy consumption, minimize grid dependency Prosumer nodes optimizing grid independence ['prosumer']
'economic' Maximize profit / minimize cost based on net economic outcome Economic optimization ['producer', 'prosumer']

Configuration Structure

reward_settings = {
    'type': str,                 # Required: 'self_consumption' or 'economic'
    'weights': dict[str, float], # Optional: weight coefficients
    'normalize': bool            # Optional: normalize rewards (default: False)
}

Weight Parameters

Weight Key Default Description
'main' 1.0 Weight for main reward component
'violation_penalty' 0.1 Weight for power constraint violation penalty
'storage_usage_penalty' 0.01 Weight for battery usage/wear penalty

Reward Composition

The total reward is a weighted linear combination:

total_reward = (weights['main'] × R_main)
               - (weights['violation_penalty'] × P_violation)
               - (weights['storage_usage_penalty'] × P_usage)

Where:

  • R_main: Main reward component (implementation-specific)
  • P_violation: Absolute power constraint violation in kW
  • P_usage: Battery usage penalty (absolute SoC change in percentage points)

Configuration Examples

1. Default (Automatic Selection)

If reward_settings=None, the environment automatically selects:

  • Prosumer nodes'self_consumption'
  • Producer nodes'economic'
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
    # No reward_settings → uses 'self_consumption' by default
)

2. Minimal Configuration

Specify only the reward type, use default weights:

reward_settings = {
    'type': 'economic'
    # 'weights' will use defaults from registry
    # 'normalize' will default to False
}

3. Balanced Strategy

Moderate optimization with constraint awareness:

reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.5,
        'storage_usage_penalty': 0.1
    },
    'normalize': False
}

4. Aggressive Optimization

High main weight, low penalties (may violate constraints):

reward_settings = {
    'type': 'economic',
    'weights': {
        'main': 10.0,              # Strong economic signal
        'violation_penalty': 0.1,   # Allow some violations
        'storage_usage_penalty': 0.01  # Minimal wear penalty
    }
}

5. Conservative Strategy

High penalties for strict constraint adherence:

reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 5.0,    # Strict constraint adherence
        'storage_usage_penalty': 1.0  # Discourage battery cycling
    }
}

Choosing Reward Type

Node Type Primary Goal Recommended Reward
Prosumer Minimize grid dependency 'self_consumption'
Prosumer Minimize costs 'economic'
Producer Maximize profit 'economic'

Reward Normalization

By default, rewards are raw (unnormalized) for interpretability and Stable-Baselines3 compatibility.

Option 1: SB3 VecNormalize (Recommended)

from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

env = gym.make('storage_node_env/EnergyStorage-v0', ...)
env = DummyVecEnv([lambda: env])
env = VecNormalize(
    env,
    norm_obs=False,      # Disable observation normalization
    norm_reward=True,    # Enable reward normalization
    clip_reward=10.0,
    gamma=0.99
)

Option 2: Built-in Normalization

reward_settings = {
    'type': 'self_consumption',
    'normalize': True  # Enable built-in normalization
}

Rule-Based Controllers

The environment includes rule-based controllers that serve as baselines for comparing reinforcement learning agents. These controllers implement fixed decision rules.

Two usage patterns:

  1. Direct node evaluation (recommended for standalone RBC testing): Use controllers with energy node classes (Battery + Producer/Prosumer)
  2. Gymnasium environment evaluation (v0.4.0+, for RBC vs RL comparison): Use get_controller_observation() method to evaluate controllers on Gymnasium environments

Available Controllers

Controller Policy Use Case Parameters
NaiveController Always neutral action (no battery control) Baseline to measure value of any control strategy num_actions
PriceBasedController Energy arbitrage based on electricity prices (charge at low prices, discharge at high prices) Producer nodes or prosumers with time-of-use tariffs num_actions, window_size, charge_action_pct, discharge_action_pct
SelfConsumptionController Maximize local self-consumption (charge during excess production, discharge during deficit) Prosumer nodes optimizing for grid independence num_actions, balance_threshold

Usage Example

Controllers are used with Node classes (Producer/Prosumer), not with the Gymnasium environment:

from storage_node_env.core import Prosumer, Battery
from storage_node_env.gym.controllers import SelfConsumptionController

# Create battery and node
battery = Battery(
    capacity=30.0,
    dod_max=90,
    power_charge_max=10.0,
    power_discharge_max=10.0,
    efficiency_charge=0.95,
    efficiency_discharge=0.95
)

node = Prosumer(
    csv_path='dataset/1h/prosumer_test_data.csv',
    delta_t=1.0,
    num_actions=21
)
node.set_storage(battery)
node.reset()

# Create controller
controller = SelfConsumptionController(num_actions=21, balance_threshold=0.5)

# Evaluation loop
total_cost = 0.0
for t in range(len(node.data) - 2):
    # Get current data
    current_row = node.data.iloc[node.time_step]

    # Build observation dictionary for controller
    observation = {
        'production': current_row['production'],
        'consumption': current_row['consumption'],
        'buy_price': current_row['buy_price'],
        'sell_price': current_row['sell_price'],
        'energy_balance': current_row['production'] - current_row['consumption'],
        'final_soc': battery.soc_percent,
        'upper_bound': battery.get_bounds_percent(node.delta_t)[0],
        'lower_bound': battery.get_bounds_percent(node.delta_t)[1]
    }

    # Get action from controller
    action = controller.choose_action(observation, {})

    # Step node
    node_results = node.step(action)
    total_cost += node_results['net_cost']

    # Advance time
    node.advance_time()

print(f'Total cost: {total_cost:.4f} €')

Evaluating Controllers on Gymnasium Environment (v0.4.0+)

NEW: For comparing rule-based controllers against RL agents on the same environment:

from typing import cast
import gymnasium as gym
from storage_node_env.gym import EnergyStorageEnv
from storage_node_env.gym.controllers import SelfConsumptionController

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

# Access unwrapped environment for custom methods
gym_env = cast(EnergyStorageEnv, env.unwrapped)

# Create controller
controller = SelfConsumptionController(num_actions=21)

# Evaluation loop
obs, info = env.reset(seed=42)
total_cost = 0.0

while True:
    # Get controller observation from unwrapped environment
    controller_obs = gym_env.get_controller_observation()
    action = controller.choose_action(controller_obs, {})

    obs, reward, terminated, truncated, info = env.step(action)
    total_cost += info['net_cost']

    if terminated or truncated:
        break

print(f'Total cost: {total_cost:.4f} €')
env.close()

Benefits:

  • ✅ RBC and RL agents see identical data
  • ✅ Works with Gym wrappers (VecEnv, Monitor)
  • ✅ Type-safe API with get_controller_observation()

Instantiation Examples

from storage_node_env.gym.controllers import (
    NaiveController,
    PriceBasedController,
    SelfConsumptionController
)

# 1. Naive controller (baseline)
naive = NaiveController(num_actions=21)

# 2. Price-based controller (energy arbitrage)
price_based = PriceBasedController(
    num_actions=21,
    window_size=168,           # 1 week rolling window
    charge_action_pct=75.0,    # 50% charge power
    discharge_action_pct=25.0  # 50% discharge power
)
price_based.reset()  # Reset before each episode

# 3. Self-consumption controller
self_consumption = SelfConsumptionController(
    num_actions=21,
    balance_threshold=0.5  # Minimum 0.5 kW imbalance to act
)

Utility Functions

from storage_node_env.gym.controllers import list_controllers, print_controllers

# List available controllers
controllers_info = list_controllers()
# Returns: {'NaiveController': 'description...', 'PriceBasedController': ...}

# Print formatted information
print_controllers()

Complete Examples

Example 1: Prosumer with Preprocessing

import gymnasium as gym
import storage_node_env

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.1,
        'storage_usage_penalty': 0.01
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    lookback_n=2,
    use_preprocessing=True,    # Enable cyclical encoding
    add_holiday=True,          # Add holiday feature
    reward_settings=reward_settings
)

obs, info = env.reset(seed=42)
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    print(f'Step {step+1}: reward={reward:.4f}, net_cost={info["net_cost"]:.4f} €')

    if terminated or truncated:
        break

env.close()

Example 2: Producer with Energy Arbitrage

import gymnasium as gym
import storage_node_env

battery_config = {
    'capacity': 30.0,
    'dod_max': 90,
    'power_charge_max': 10.0,
    'power_discharge_max': 10.0,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

reward_settings = {
    'type': 'economic',
    'weights': {
        'main': 100.0,             # Amplify economic signal
        'violation_penalty': 10.0,
        'storage_usage_penalty': 1.0
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='producer',
    csv_path='dataset/1h/producer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    reward_settings=reward_settings
)

obs, info = env.reset()
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    print(f'Step {step+1}: reward={reward:.4f}, net_profit={info["net_profit"]:.4f} €')

    if terminated or truncated:
        break

env.close()

Example 3: Training with Stable-Baselines3

import gymnasium as gym
import storage_node_env
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    use_preprocessing=True
)

# Wrap in vectorized environment and normalize rewards
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=False, norm_reward=True, clip_reward=10.0)

# Train PPO agent
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=100000)

# Save model
model.save('ppo_prosumer')

Project Structure

storage_node_env/
├── core/                    # Core simulation components
│   ├── base/                # Abstract base classes
│   ├── storage/             # Battery implementation
│   └── nodes/               # Energy node implementations (Producer, Prosumer)
├── gym/                       # Gymnasium integration
│   ├── energy_storage_env.py  # Main environment class
│   ├── utils.py               # Observation building utilities
│   ├── preprocessing/         # Feature encoding and preprocessing
│   ├── rewards/               # Modular reward system
│   └── controllers/           # Rule-based baseline controllers
└── __init__.py                # Package initialization and version info

Documentation

Repository

Citation

If you use this environment in your research, please cite:

@software{storage_node_env,
  title = {Storage Node Environment: Gymnasium Environment for Battery Energy Storage Systems},
  author = {Leonardo Guiducci},
  email = {leonardo.guiducci@unisi.it},
  year = {2025},
  url = {https://github.com/unisi-lab305/storage-node-environment}
}

Contributing

Contributions are welcome! Please see CLAUDE.md for development guidelines and coding standards.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

storage_node_env-0.11.0.tar.gz (83.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

storage_node_env-0.11.0-py3-none-any.whl (101.3 kB view details)

Uploaded Python 3

File details

Details for the file storage_node_env-0.11.0.tar.gz.

File metadata

  • Download URL: storage_node_env-0.11.0.tar.gz
  • Upload date:
  • Size: 83.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for storage_node_env-0.11.0.tar.gz
Algorithm Hash digest
SHA256 43d7ce03f09a3f5e03f9e0471873ace71a2466f54e89d875570a7c10a9927d2b
MD5 b4d5974426f09ae8f7794554686b903c
BLAKE2b-256 d363d5af199e743a5b06b418e7e7a04870820f604a1b15bfe6b86e8fdafd2c0c

See more details on using hashes here.

File details

Details for the file storage_node_env-0.11.0-py3-none-any.whl.

File metadata

File hashes

Hashes for storage_node_env-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9de87cb32783415aa9648746678a9f6fd3cf2175000198a0601a3365d90a4f7f
MD5 6186d3a9e25c013f5646cc93fbd931bf
BLAKE2b-256 4306258dd062d767aac4308e428e68d68f8340c933f17533f88a60d046c31315

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page