Data Generation Kit
Project description
Overview
This library is brought to you by the GridFM team to generate power flow data to train machine learning and foundation models.
Comparison with other PF datasets/ libraries
| Feature | GraphNeuralSolver [1] | OPFData [2] | OPFLearn [3] | PowerFlowNet [4] | TypedGNN [5] | PF△ [6] | gridfm-datakit [7] |
|---|---|---|---|---|---|---|---|
| Generator Profile | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ |
| N-1 | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| > 1000 Buses | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
| N-k, k > 1 | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| Load Scenarios from Real World Data | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| Multi-processing and scalable to very large (1M+) datasets | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
Installation
-
⭐ Star the repository on GitHub to support the project!
-
Clone the repository and set up a Python virtual environment:
git clone https://github.com/gridfm/gridfm-datakit.git
cd "gridfm-datakit"
python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip # Upgrade pip to ensure compatibility with pyproject.toml
pip3 install .
Getting Started
Option 1: Run data gen using interactive interface
To use the interactive interface, either open scripts/interactive_interface.ipynb or copy the following into a Jupyter notebook and follow the instructions:
from gridfm_datakit.interactive_utils import interactive_interface
interactive_interface()
Option 2: Using the command line interface
Run the data generation routine from the command line:
gridfm_datakit path/to/config.yaml
Configuration Overview
Refer to the sections Network, Load Scenarios, and Topology perturbations for a description of the configuration parameters.
Sample configuration files are provided in scripts/config, e.g. default.yaml:
network:
name: "case24_ieee_rts" # Name of the power grid network (without extension)
source: "pglib" # Data source for the grid; options: pglib, pandapower, file
network_dir: "scripts/grids" # if using source "file", this is the directory containing the network file (relative to the project root)
load:
generator: "agg_load_profile" # Name of the load generator; options: agg_load_profile, powergraph
agg_profile: "default" # Name of the aggregated load profile
scenarios: 200 # Number of different load scenarios to generate
# WARNING: the following parameters are only used if generator is "agg_load_profile"
# if using generator "powergraph", these parameters are ignored
sigma: 0.05 # max local noise
change_reactive_power: true # If true, changes reactive power of loads. If False, keeps the ones from the case file
global_range: 0.4 # Range of the global scaling factor. used to set the lower bound of the scaling factor
max_scaling_factor: 4.0 # Max upper bound of the global scaling factor
step_size: 0.025 # Step size when finding the upper bound of the global scaling factor
start_scaling_factor: 0.8 # Initial value of the global scaling factor
topology_perturbation:
type: "random" # Type of topology generator; options: n_minus_k, random, none
# WARNING: the following parameters are only used if type is not "none"
k: 1 # Maximum number of components to drop in each perturbation
n_topology_variants: 5 # Number of unique perturbed topologies per scenario
elements: ["line", "trafo", "gen", "sgen"] # elements to perturb options: line, trafo, gen, sgen
settings:
num_processes: 10 # Number of parallel processes to use
data_dir: "./data_out" # Directory to save generated data relative to the project root
large_chunk_size: 50 # Number of load scenarios processed before saving
no_stats: false # If true, disables statistical calculations
overwrite: true # If true, overwrites existing files, if false, appends to files (note that bus_params.csv, edge_params.csv, scenarios_{load.generator}.csv and scenarios_{load.generator}.html will still be overwritten)
mode: "pf" # Mode of the script; options: contingency, pf
Output Files
The data generation process produces several output files in the specified data directory:
- tqdm.log: Progress bar log.
- error.log: Log of the errors raised during data generation.
- args.log: Copy of the config file used.
- pf_node.csv: Data related to the nodes (buses) in the network, such as voltage levels and power injections.
- pf_edge.csv: Branch admittance matrix for each pf case.
- branch_idx_removed.csv: List of the indices of the branches (lines and transformers) that got removed when perturbing the topologies.
- edge_params.csv: Branch admittance matrix and branch rate limits for the unperturbed topology.
- bus_params.csv: Parameters for the buses (voltage limits and the base voltage).
- scenario_{args.load.generator}.csv: Load element-level load profile obtained after using the load scenario generator.
- scenario_{args.load.generator}.html: Plots of the element-level load profile.
- scenario_{args.load.generator}.log: If generator is "agg_load_profile", stores the upper and lower bounds for the global scaling factor.
- stats.csv: Stats about the generated data.
- stats_plot.html: Plots of the stats about the generated data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gridfm_datakit-0.0.1.tar.gz.
File metadata
- Download URL: gridfm_datakit-0.0.1.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b48fe72e953899b1c7dc804d1cf752de69dffd187583bc79866bb33ec48533f2
|
|
| MD5 |
81f2296eb5ee75d93a91c7c7a7786e13
|
|
| BLAKE2b-256 |
36a875a938bd11025a8e458a75b1fbd61c038afba944f4290a5f85d0e24042d8
|
Provenance
The following attestation bundles were made for gridfm_datakit-0.0.1.tar.gz:
Publisher:
release.yaml on gridfm/gridfm-datakit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gridfm_datakit-0.0.1.tar.gz -
Subject digest:
b48fe72e953899b1c7dc804d1cf752de69dffd187583bc79866bb33ec48533f2 - Sigstore transparency entry: 248659327
- Sigstore integration time:
-
Permalink:
gridfm/gridfm-datakit@bf90384c2f29dadd2e9cbc4f37b45fcc7943f123 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/gridfm
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@bf90384c2f29dadd2e9cbc4f37b45fcc7943f123 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gridfm_datakit-0.0.1-py3-none-any.whl.
File metadata
- Download URL: gridfm_datakit-0.0.1-py3-none-any.whl
- Upload date:
- Size: 39.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c028bed2795dd06c664f0239af8f67d371ffc142c698dd75d4bbaed0b1f0e08a
|
|
| MD5 |
b2d11f325f6966691181d64466b3e198
|
|
| BLAKE2b-256 |
b3abf093d5dc626c2a2878084bd4b5bf47be5f7ca21d1d0edd057c31d657535a
|
Provenance
The following attestation bundles were made for gridfm_datakit-0.0.1-py3-none-any.whl:
Publisher:
release.yaml on gridfm/gridfm-datakit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gridfm_datakit-0.0.1-py3-none-any.whl -
Subject digest:
c028bed2795dd06c664f0239af8f67d371ffc142c698dd75d4bbaed0b1f0e08a - Sigstore transparency entry: 248659328
- Sigstore integration time:
-
Permalink:
gridfm/gridfm-datakit@bf90384c2f29dadd2e9cbc4f37b45fcc7943f123 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/gridfm
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@bf90384c2f29dadd2e9cbc4f37b45fcc7943f123 -
Trigger Event:
release
-
Statement type: