Gymnasium environments for simulating energy nodes with battery energy storage systems

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.12
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

StorageNode Environment

Gymnasium environment for simulating an energy node with battery energy storage system (BESS). Physics-based battery modeling using commercial datasheet parameters for reinforcement learning applications.

CI Tests

Features

Gymnasium-compatible environment registered as storage_node_env/EnergyStorage-v0
Physics-based battery modeling with commercial datasheet parameters
Two energy node types: Producer (production only) and Prosumer (production + consumption)
Modular reward system for different optimization objectives (self-consumption, energy arbitrage)
Rule-based controllers for baseline comparison
Flexible observation space with optional preprocessing and cyclical encoding

Installation

From Source (Development Mode)

git clone https://github.com/unisi-lab305/storage-node-environment.git
cd storage-node-environment
pip install -e .

From PyPI (When Published)

pip install storage-node-env

The environment is automatically registered with Gymnasium on import and can be instantiated using gym.make().

Quick Start

Method 1: Using gym.make() (Recommended)

import gymnasium as gym
import storage_node_env  # Trigger environment registration

# Battery configuration
battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

# Run simulation
obs, info = env.reset(seed=42)
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        break

env.close()

Method 2: Direct Import (Backward Compatible)

from storage_node_env.gym import EnergyStorageEnv

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

env = EnergyStorageEnv(
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

obs, info = env.reset()
# ... same usage as above

Note: The gym.make() approach is recommended as it follows standard Gymnasium conventions and ensures compatibility with Gymnasium ecosystem tools.

Environment Parameters

Parameter	Type	Default	Required	Description
`node_type`	`str`	-	Yes	Type of energy node: `'producer'` or `'prosumer'`
`csv_path`	`str`	-	Yes	Path to CSV file with historical data
`battery_config`	`dict[str, float]`	-	Yes	Dictionary with battery parameters (see below)
`delta_t`	`float`	-	Yes	Timestep duration in hours (e.g., 1.0, 0.25)
`lookback_n`	`int`	`2`	No	Number of historical timesteps in observation buffer
`num_actions`	`int`	`21`	No	Number of discrete actions (must be odd)
`use_preprocessing`	`bool`	`False`	No	Enable observation preprocessing (cyclical encoding, normalization)
`add_holiday`	`bool`	`True`	No	Add Italian holiday feature (requires `use_preprocessing=True`)
`reward_settings`	`dict \| None`	`None`	No	Reward configuration (see Reward System section)

CSV Data Requirements

The CSV file must contain a datetime column and node-specific columns:

For Producer Nodes:

datetime: Timestamp (e.g., '2024-01-15 00:00:00')
production: Power produced in kW
buy_price: Grid purchase price
sell_price: Grid selling price in €/kWh

For Prosumer Nodes:

datetime: Timestamp
production: Power produced in kW
consumption: Power consumed by loads in kW
buy_price: Grid purchase price
sell_price: Grid selling price in €/kWh

Important: The delta_t parameter must match the frequency of your CSV data (e.g., delta_t=1.0 for hourly data, delta_t=0.25 for 15-minute data).

Battery Configuration

The battery_config dictionary contains physical parameters for battery simulation based on commercial datasheets.

Parameters

Parameter	Type	Required	Valid Range	Units	Description
`capacity`	`float`	Yes	> 0	kWh	Nominal capacity (C_nom)
`dod_max`	`float`	Yes	0 < x ≤ 100	%	Maximum depth of discharge
`power_charge_max`	`float`	Yes	> 0	kW	Maximum charging power
`power_discharge_max`	`float`	Yes	> 0	kW	Maximum discharging power
`efficiency_charge`	`float`	Yes	0 < x ≤ 1	-	Charging efficiency (e.g., 0.95 for 95%)
`efficiency_discharge`	`float`	Yes	0 < x ≤ 1	-	Discharging efficiency (e.g., 0.95 for 95%)
`alpha`	`float`	No	0 ≤ x < 1	-	Parasitic loss coefficient (default: 0.0)
`soc_initial`	`float \| None`	No	C_min ≤ x ≤ C_max	kWh	Initial state of charge (default: 50% capacity)
`allow_arbitrage`	`bool`	No	`True` / `False`	-	If `False`, charging is capped at current PV production each timestep — battery cannot charge from the grid. Compatible with all reward types and controllers. (default: `True`)

Physical Meaning

Capacity: Total energy storage when fully charged
DoD (Depth of Discharge): Usable capacity percentage (e.g., 90% DoD means 90% of nominal capacity is usable)
Power limits: C-rate constraints from battery datasheet (separate for charge/discharge)
Efficiency: Round-trip energy losses during charge/discharge operations (separate for each direction)
Alpha: Standby consumption per timestep (e.g., 0.001 = 0.1% loss per timestep)
SoC initial: Starting energy level in kWh (if None, starts at 50% of nominal capacity)

Power Convention

Positive power = charging (battery absorbs energy from the grid)
Negative power = discharging (battery releases energy to the grid)

Example Configuration

Typical values based on ZCS AZZURRO HV ZBT 5K battery:

battery_config = {
    'capacity': 5.12,                    # 5.12 kWh nominal capacity
    'dod_max': 90,                       # 90% depth of discharge
    'power_charge_max': 2.5,             # 2.5 kW maximum charging power
    'power_discharge_max': 2.5,          # 2.5 kW maximum discharging power
    'efficiency_charge': 0.95,           # 95% charging efficiency
    'efficiency_discharge': 0.95,        # 95% discharging efficiency
    'alpha': 0.0,                        # No parasitic losses (optional)
    'soc_initial': 2.56                  # Start at 50% SoC (optional)
}

Derived parameters (computed automatically):

C_min = (1 - dod_max/100) × capacity → Minimum usable SoC (kWh)
C_max = capacity → Maximum usable SoC (kWh)

Reward System

The environment provides a modular reward system supporting different optimization objectives through configurable reward calculators.

Available Reward Types

Reward Type	Description	Best For	Suitable Node Types
`'self_consumption'`	Maximize local energy consumption, minimize grid dependency	Prosumer nodes optimizing grid independence	`['prosumer']`
`'economic'`	Maximize profit / minimize cost based on net economic outcome	Economic optimization	`['producer', 'prosumer']`

Configuration Structure

reward_settings = {
    'type': str,                 # Required: 'self_consumption' or 'economic'
    'weights': dict[str, float], # Optional: weight coefficients
    'normalize': bool            # Optional: normalize rewards (default: False)
}

Weight Parameters

Weight Key	Default	Description
`'main'`	`1.0`	Weight for main reward component
`'violation_penalty'`	`0.1`	Weight for power constraint violation penalty
`'storage_usage_penalty'`	`0.01`	Weight for battery usage/wear penalty

Reward Composition

The total reward is a weighted linear combination:

total_reward = (weights['main'] × R_main)
               - (weights['violation_penalty'] × P_violation)
               - (weights['storage_usage_penalty'] × P_usage)

Where:

R_main: Main reward component (implementation-specific)
P_violation: Absolute power constraint violation in kW
P_usage: Battery usage penalty (absolute SoC change in percentage points)

Configuration Examples

1. Default (Automatic Selection)

If reward_settings=None, the environment automatically selects:

Prosumer nodes → 'self_consumption'
Producer nodes → 'economic'

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
    # No reward_settings → uses 'self_consumption' by default
)

2. Minimal Configuration

Specify only the reward type, use default weights:

reward_settings = {
    'type': 'economic'
    # 'weights' will use defaults from registry
    # 'normalize' will default to False
}

3. Balanced Strategy

Moderate optimization with constraint awareness:

reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.5,
        'storage_usage_penalty': 0.1
    },
    'normalize': False
}

4. Aggressive Optimization

High main weight, low penalties (may violate constraints):

reward_settings = {
    'type': 'economic',
    'weights': {
        'main': 10.0,              # Strong economic signal
        'violation_penalty': 0.1,   # Allow some violations
        'storage_usage_penalty': 0.01  # Minimal wear penalty
    }
}

5. Conservative Strategy

High penalties for strict constraint adherence:

reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 5.0,    # Strict constraint adherence
        'storage_usage_penalty': 1.0  # Discourage battery cycling
    }
}

Choosing Reward Type

Node Type	Primary Goal	Recommended Reward
Prosumer	Minimize grid dependency	`'self_consumption'`
Prosumer	Minimize costs	`'economic'`
Producer	Maximize profit	`'economic'`

Reward Normalization

By default, rewards are raw (unnormalized) for interpretability and Stable-Baselines3 compatibility.

Option 1: SB3 VecNormalize (Recommended)

from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

env = gym.make('storage_node_env/EnergyStorage-v0', ...)
env = DummyVecEnv([lambda: env])
env = VecNormalize(
    env,
    norm_obs=False,      # Disable observation normalization
    norm_reward=True,    # Enable reward normalization
    clip_reward=10.0,
    gamma=0.99
)

Option 2: Built-in Normalization

reward_settings = {
    'type': 'self_consumption',
    'normalize': True  # Enable built-in normalization
}

Rule-Based Controllers

The environment includes rule-based controllers that serve as baselines for comparing reinforcement learning agents. These controllers implement fixed decision rules.

Two usage patterns:

Direct node evaluation (recommended for standalone RBC testing): Use controllers with energy node classes (Battery + Producer/Prosumer)
Gymnasium environment evaluation (v0.4.0+, for RBC vs RL comparison): Use get_controller_observation() method to evaluate controllers on Gymnasium environments

Available Controllers

Controller	Policy	Use Case	Parameters
`NaiveController`	Always neutral action (no battery control)	Baseline to measure value of any control strategy	`num_actions`
`PriceBasedController`	Energy arbitrage based on electricity prices (charge at low prices, discharge at high prices)	Producer nodes or prosumers with time-of-use tariffs	`num_actions`, `window_size`, `charge_action_pct`, `discharge_action_pct`
`SelfConsumptionController`	Maximize local self-consumption (charge during excess production, discharge during deficit)	Prosumer nodes optimizing for grid independence	`num_actions`, `balance_threshold`

Usage Example

Controllers are used with Node classes (Producer/Prosumer), not with the Gymnasium environment:

from storage_node_env.core import Prosumer, Battery
from storage_node_env.gym.controllers import SelfConsumptionController

# Create battery and node
battery = Battery(
    capacity=30.0,
    dod_max=90,
    power_charge_max=10.0,
    power_discharge_max=10.0,
    efficiency_charge=0.95,
    efficiency_discharge=0.95
)

node = Prosumer(
    csv_path='dataset/1h/prosumer_test_data.csv',
    delta_t=1.0,
    num_actions=21
)
node.set_storage(battery)
node.reset()

# Create controller
controller = SelfConsumptionController(num_actions=21, balance_threshold=0.5)

# Evaluation loop
total_cost = 0.0
for t in range(len(node.data) - 2):
    # Get current data
    current_row = node.data.iloc[node.time_step]

    # Build observation dictionary for controller
    observation = {
        'production': current_row['production'],
        'consumption': current_row['consumption'],
        'buy_price': current_row['buy_price'],
        'sell_price': current_row['sell_price'],
        'energy_balance': current_row['production'] - current_row['consumption'],
        'final_soc': battery.soc_percent,
        'upper_bound': battery.get_bounds_percent(node.delta_t)[0],
        'lower_bound': battery.get_bounds_percent(node.delta_t)[1]
    }

    # Get action from controller
    action = controller.choose_action(observation, {})

    # Step node
    node_results = node.step(action)
    total_cost += node_results['net_cost']

    # Advance time
    node.advance_time()

print(f'Total cost: {total_cost:.4f} €')

Evaluating Controllers on Gymnasium Environment (v0.4.0+)

NEW: For comparing rule-based controllers against RL agents on the same environment:

from typing import cast
import gymnasium as gym
from storage_node_env.gym import EnergyStorageEnv
from storage_node_env.gym.controllers import SelfConsumptionController

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0
)

# Access unwrapped environment for custom methods
gym_env = cast(EnergyStorageEnv, env.unwrapped)

# Create controller
controller = SelfConsumptionController(num_actions=21)

# Evaluation loop
obs, info = env.reset(seed=42)
total_cost = 0.0

while True:
    # Get controller observation from unwrapped environment
    controller_obs = gym_env.get_controller_observation()
    action = controller.choose_action(controller_obs, {})

    obs, reward, terminated, truncated, info = env.step(action)
    total_cost += info['net_cost']

    if terminated or truncated:
        break

print(f'Total cost: {total_cost:.4f} €')
env.close()

Benefits:

✅ RBC and RL agents see identical data
✅ Works with Gym wrappers (VecEnv, Monitor)
✅ Type-safe API with get_controller_observation()

Instantiation Examples

from storage_node_env.gym.controllers import (
    NaiveController,
    PriceBasedController,
    SelfConsumptionController
)

# 1. Naive controller (baseline)
naive = NaiveController(num_actions=21)

# 2. Price-based controller (energy arbitrage)
price_based = PriceBasedController(
    num_actions=21,
    window_size=168,           # 1 week rolling window
    charge_action_pct=75.0,    # 50% charge power
    discharge_action_pct=25.0  # 50% discharge power
)
price_based.reset()  # Reset before each episode

# 3. Self-consumption controller
self_consumption = SelfConsumptionController(
    num_actions=21,
    balance_threshold=0.5  # Minimum 0.5 kW imbalance to act
)

Utility Functions

from storage_node_env.gym.controllers import list_controllers, print_controllers

# List available controllers
controllers_info = list_controllers()
# Returns: {'NaiveController': 'description...', 'PriceBasedController': ...}

# Print formatted information
print_controllers()

Complete Examples

Example 1: Prosumer with Preprocessing

import gymnasium as gym
import storage_node_env

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

reward_settings = {
    'type': 'self_consumption',
    'weights': {
        'main': 1.0,
        'violation_penalty': 0.1,
        'storage_usage_penalty': 0.01
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    lookback_n=2,
    use_preprocessing=True,    # Enable cyclical encoding
    add_holiday=True,          # Add holiday feature
    reward_settings=reward_settings
)

obs, info = env.reset(seed=42)
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    print(f'Step {step+1}: reward={reward:.4f}, net_cost={info["net_cost"]:.4f} €')

    if terminated or truncated:
        break

env.close()

Example 2: Producer with Energy Arbitrage

import gymnasium as gym
import storage_node_env

battery_config = {
    'capacity': 30.0,
    'dod_max': 90,
    'power_charge_max': 10.0,
    'power_discharge_max': 10.0,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

reward_settings = {
    'type': 'economic',
    'weights': {
        'main': 100.0,             # Amplify economic signal
        'violation_penalty': 10.0,
        'storage_usage_penalty': 1.0
    }
}

env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='producer',
    csv_path='dataset/1h/producer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    reward_settings=reward_settings
)

obs, info = env.reset()
for step in range(100):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    print(f'Step {step+1}: reward={reward:.4f}, net_profit={info["net_profit"]:.4f} €')

    if terminated or truncated:
        break

env.close()

Example 3: Training with Stable-Baselines3

import gymnasium as gym
import storage_node_env
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize

battery_config = {
    'capacity': 5.12,
    'dod_max': 90,
    'power_charge_max': 2.5,
    'power_discharge_max': 2.5,
    'efficiency_charge': 0.95,
    'efficiency_discharge': 0.95
}

# Create environment
env = gym.make(
    'storage_node_env/EnergyStorage-v0',
    node_type='prosumer',
    csv_path='dataset/1h/prosumer_test_data.csv',
    battery_config=battery_config,
    delta_t=1.0,
    use_preprocessing=True
)

# Wrap in vectorized environment and normalize rewards
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=False, norm_reward=True, clip_reward=10.0)

# Train PPO agent
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=100000)

# Save model
model.save('ppo_prosumer')

Project Structure

storage_node_env/
├── core/                    # Core simulation components
│   ├── base/                # Abstract base classes
│   ├── storage/             # Battery implementation
│   └── nodes/               # Energy node implementations (Producer, Prosumer)
├── gym/                       # Gymnasium integration
│   ├── energy_storage_env.py  # Main environment class
│   ├── utils.py               # Observation building utilities
│   ├── preprocessing/         # Feature encoding and preprocessing
│   ├── rewards/               # Modular reward system
│   └── controllers/           # Rule-based baseline controllers
└── __init__.py                # Package initialization and version info

Documentation

REWARD_SYSTEM.md: Detailed reward system documentation
CONTROLLERS.md: Detailed reward system documentation

Repository

GitHub: https://github.com/unisi-lab305/storage-node-environment
License: MIT

Citation

If you use this environment in your research, please cite:

@software{storage_node_env,
  title = {Storage Node Environment: Gymnasium Environment for Battery Energy Storage Systems},
  author = {Leonardo Guiducci},
  email = {leonardo.guiducci@unisi.it},
  year = {2025},
  url = {https://github.com/unisi-lab305/storage-node-environment}
}

Contributing

Contributions are welcome! Please see CLAUDE.md for development guidelines and coding standards.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.12
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.14.0

Apr 30, 2026

0.13.0

Apr 30, 2026

0.11.2

Apr 28, 2026

This version

0.11.0

Apr 27, 2026

0.10.0

Apr 22, 2026

0.8.2

Nov 21, 2025

0.8.1

Nov 21, 2025

0.8.0

Nov 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

storage_node_env-0.11.0.tar.gz (83.7 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

storage_node_env-0.11.0-py3-none-any.whl (101.3 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file storage_node_env-0.11.0.tar.gz.

File metadata

Download URL: storage_node_env-0.11.0.tar.gz
Upload date: Apr 27, 2026
Size: 83.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for storage_node_env-0.11.0.tar.gz
Algorithm	Hash digest
SHA256	`43d7ce03f09a3f5e03f9e0471873ace71a2466f54e89d875570a7c10a9927d2b`
MD5	`b4d5974426f09ae8f7794554686b903c`
BLAKE2b-256	`d363d5af199e743a5b06b418e7e7a04870820f604a1b15bfe6b86e8fdafd2c0c`

See more details on using hashes here.

File details

Details for the file storage_node_env-0.11.0-py3-none-any.whl.

File metadata

Download URL: storage_node_env-0.11.0-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 101.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for storage_node_env-0.11.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9de87cb32783415aa9648746678a9f6fd3cf2175000198a0601a3365d90a4f7f`
MD5	`6186d3a9e25c013f5646cc93fbd931bf`
BLAKE2b-256	`4306258dd062d767aac4308e428e68d68f8340c933f17533f88a60d046c31315`

See more details on using hashes here.

storage-node-env 0.11.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

StorageNode Environment

Features

Installation

From Source (Development Mode)

From PyPI (When Published)

Quick Start

Method 1: Using gym.make() (Recommended)

Method 2: Direct Import (Backward Compatible)

Environment Parameters

CSV Data Requirements

Battery Configuration

Parameters

Physical Meaning

Power Convention

Example Configuration

Reward System

Available Reward Types

Configuration Structure

Weight Parameters

Reward Composition

Configuration Examples

1. Default (Automatic Selection)

2. Minimal Configuration

3. Balanced Strategy

4. Aggressive Optimization

5. Conservative Strategy

Choosing Reward Type

Reward Normalization

Option 1: SB3 VecNormalize (Recommended)

Option 2: Built-in Normalization

Rule-Based Controllers

Available Controllers

Usage Example

Evaluating Controllers on Gymnasium Environment (v0.4.0+)

Instantiation Examples

Utility Functions

Complete Examples

Example 1: Prosumer with Preprocessing

Example 2: Producer with Energy Arbitrage

Example 3: Training with Stable-Baselines3

Project Structure

Documentation

Repository

Citation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes