Skip to main content

Data Generation Kit

Project description

GridFM logo

gridfm-datakit

Docs Coverage Python License

This library is brought to you by the GridFM team to generate power flow data to train machine learning and foundation models.


Comparison with other PF datasets/ libraries

Feature GraphNeuralSolver [1] OPFData [2] OPFLearn [3] PowerFlowNet [4] TypedGNN [5] PF△ [6] PGLearn [7] gridfm-datakit [8]
Generator Profile
N-1
> 1000 Buses
N-k, k > 1
Load Scenarios from Real World Data
Net Param Perturbation
Multi-processing and scalable to very large (1M+) datasets

Installation

  1. ⭐ Star the repository on GitHub to support the project!

  2. Run:

    python -m pip install --upgrade pip  # Upgrade pip
    pip install gridfm-datakit
    

Getting Started

Option 1: Run data gen using interactive interface

To use the interactive interface, either open scripts/interactive_interface.ipynb or copy the following into a Jupyter notebook and follow the instructions:

from gridfm_datakit.interactive import interactive_interface
interactive_interface()

Option 2: Using the command line interface

Run the data generation routine from the command line:

gridfm_datakit path/to/config.yaml

Configuration Overview

Refer to the sections Network, Load Scenarios, and Topology perturbations for a description of the configuration parameters.

Sample configuration files are provided in scripts/config, e.g. default.yaml:

network:
  name: "case24_ieee_rts" # Name of the power grid network (without extension)
  source: "pglib" # Data source for the grid; options: pglib, pandapower, file
  network_dir: "scripts/grids" # if using source "file", this is the directory containing the network file (relative to the project root)


load:
  generator: "agg_load_profile" # Name of the load generator; options: agg_load_profile, powergraph
  agg_profile: "default" # Name of the aggregated load profile
  scenarios: 200 # Number of different load scenarios to generate
  # WARNING: the following parameters are only used if generator is "agg_load_profile"
  # if using generator "powergraph", these parameters are ignored
  sigma: 0.05 # max local noise
  change_reactive_power: true # If true, changes reactive power of loads. If False, keeps the ones from the case file
  global_range: 0.4 # Range of the global scaling factor. used to set the lower bound of the scaling factor
  max_scaling_factor: 4.0 # Max upper bound of the global scaling factor
  step_size: 0.025 # Step size when finding the upper bound of the global scaling factor
  start_scaling_factor: 0.8 # Initial value of the global scaling factor

topology_perturbation:
  type: "random" # Type of topology generator; options: n_minus_k, random, none
  # WARNING: the following parameters are only used if type is not "none"
  k: 1 # Maximum number of components to drop in each perturbation
  n_topology_variants: 5 # Number of unique perturbed topologies per scenario
  elements: ["line", "trafo", "gen", "sgen"] # elements to perturb options: line, trafo, gen, sgen

generation_perturbation:
  type: "cost_permutation" # Type of generation perturbation; options: cost_permutation, cost_perturbation, none
  # WARNING: the following parameters are onlyused if type is "cost_perturbation"
  sigma: 1.0 # Size of range use for sampling scaling factor

settings:
  num_processes: 10 # Number of parallel processes to use
  data_dir: "./data_out" # Directory to save generated data relative to the project root
  large_chunk_size: 50 # Number of load scenarios processed before saving
  no_stats: false # If true, disables statistical calculations
  overwrite: true # If true, overwrites existing files, if false, appends to files (note that bus_params.csv, edge_params.csv, scenarios_{load.generator}.csv and scenarios_{load.generator}.html will still be overwritten)
  mode: "pf" # Mode of the script; options: contingency, pf

Output Files

The data generation process produces several output files in the specified data directory:

  • tqdm.log: Progress bar log.
  • error.log: Log of the errors raised during data generation.
  • args.log: Copy of the config file used.
  • pf_node.csv: Data related to the nodes (buses) in the network, such as voltage levels and power injections.
  • pf_edge.csv: Branch admittance matrix for each pf case.
  • branch_idx_removed.csv: List of the indices of the branches (lines and transformers) that got removed when perturbing the topologies.
  • edge_params.csv: Branch admittance matrix and branch rate limits for the unperturbed topology.
  • bus_params.csv: Parameters for the buses (voltage limits and the base voltage).
  • scenario_{args.load.generator}.csv: Load element-level load profile obtained after using the load scenario generator.
  • scenario_{args.load.generator}.html: Plots of the element-level load profile.
  • scenario_{args.load.generator}.log: If generator is "agg_load_profile", stores the upper and lower bounds for the global scaling factor.
  • stats.csv: Stats about the generated data.
  • stats_plot.html: Plots of the stats about the generated data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gridfm_datakit-0.0.7.tar.gz (373.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gridfm_datakit-0.0.7-py3-none-any.whl (378.9 kB view details)

Uploaded Python 3

File details

Details for the file gridfm_datakit-0.0.7.tar.gz.

File metadata

  • Download URL: gridfm_datakit-0.0.7.tar.gz
  • Upload date:
  • Size: 373.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gridfm_datakit-0.0.7.tar.gz
Algorithm Hash digest
SHA256 f7790736bc73cb87d0050b970766f00a0501907dc2fe378fad334fd482ef347e
MD5 10237a51acc626854c53cb42ef179aca
BLAKE2b-256 17e16734f4e3b5f93e77493cd021fba8d2fc6bf890d4f298346811c87e8b6719

See more details on using hashes here.

Provenance

The following attestation bundles were made for gridfm_datakit-0.0.7.tar.gz:

Publisher: release.yaml on gridfm/gridfm-datakit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gridfm_datakit-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: gridfm_datakit-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 378.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gridfm_datakit-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 79a3dde8a0ecebff3089174e4ea44cbfaeb21c897da9b74ec06324417cfc9429
MD5 6f72c34b6e183a0ddd0ee861c7aa1d40
BLAKE2b-256 f5ce7c80e46fce0f3b4d010f429bd99cd5c24af150af870d921d36bf5fb65cca

See more details on using hashes here.

Provenance

The following attestation bundles were made for gridfm_datakit-0.0.7-py3-none-any.whl:

Publisher: release.yaml on gridfm/gridfm-datakit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page