Data Generation Kit

These details have not been verified by PyPI

Project description

GridFM logo

gridfm-datakit

Docs Coverage Python License

This library is brought to you by the GridFM team to generate power flow data to train machine learning and foundation models.

Installation

⭐ Star the repository on GitHub to support the project!
Make sure you have Python 3.10, 3.11, or 3.12 installed. ⚠️ Windows users: Python 3.12 is not supported. Use Python 3.10.11 or 3.11.9.

Install gridfm-datakit

python -m pip install --upgrade pip  # Upgrade pip
pip install gridfm-datakit

Install Julia with Powermodels and Ipopt
```
gridfm_datakit setup_pm
```

For Developers

To install the latest development version from GitHub, follow these steps instead of step 3.

git clone https://github.com/gridfm/gridfm-datakit.git
cd "gridfm-datakit"
python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip  # Upgrade pip to ensure compatibility with pyproject.toml
pip3 install -e '.[test,dev]'

Getting Started

Option 1: Run data gen using interactive interface

To use the interactive interface, either open scripts/interactive_interface.ipynb or copy the following into a Jupyter notebook and follow the instructions:

from gridfm_datakit.interactive import interactive_interface
interactive_interface()

Option 2: Using the command line interface

Generate Data

Run the data generation routine from the command line:

gridfm-datakit generate path/to/config.yaml

Validate Data

Validate generated power flow data for integrity and physical consistency:

gridfm-datakit validate /path/to/data/ [--n-scenarios N] [--sn-mva 100]

Compute Statistics

Generate statistics plots from generated data:

gridfm-datakit stats /path/to/data/ [--sn-mva 100]

Plot Feature Distributions

Create violin plots for bus feature distributions:

gridfm-datakit plots /path/to/data/ [--output-dir DIR] [--sn-mva 100]

Configuration Overview

Refer to the sections Network, Load Scenarios, and Topology perturbations of the documentation for a description of the configuration parameters.

Sample configuration files are provided in scripts/config, e.g. default.yaml:

network:
  name: "case24_ieee_rts" # Name of the power grid network (without extension)
  source: "pglib" # Data source for the grid; options: pglib, file
  # WARNING: the following parameter is only used if source is "file"
  network_dir: "scripts/grids" # if using source "file", this is the directory containing the network file (relative to the project root)

load:
  generator: "agg_load_profile" # Name of the load generator; options: agg_load_profile, powergraph
  agg_profile: "default" # Name of the aggregated load profile
  scenarios: 10000 # Number of different load scenarios to generate
  # WARNING: the following parameters are only used if generator is "agg_load_profile"
  # if using generator "powergraph", these parameters are ignored
  sigma: 0.2 # max local noise
  change_reactive_power: true # If true, changes reactive power of loads. If False, keeps the ones from the case file
  global_range: 0.4 # Range of the global scaling factor. used to set the lower bound of the scaling factor
  max_scaling_factor: 4.0 # Max upper bound of the global scaling factor
  step_size: 0.1 # Step size when finding the upper bound of the global scaling factor
  start_scaling_factor: 1.0 # Initial value of the global scaling factor

topology_perturbation:
  type: "random" # Type of topology generator; options: n_minus_k, random, none
  # WARNING: the following parameters are only used if type is not "none"
  k: 1 # Maximum number of components to drop in each perturbation
  n_topology_variants: 20 # Number of unique perturbed topologies per scenario
  elements: [branch, gen] # elements to perturb. options: branch, gen

generation_perturbation:
  type: "cost_permutation" # Type of generation perturbation; options: cost_permutation, cost_perturbation, none
  # WARNING: the following parameter is only used if type is "cost_permutation"
  sigma: 1.0 # Size of range use for sampling scaling factor

admittance_perturbation:
  type: "random_perturbation" # Type of admittance perturbation; options: random_perturbation, none
  # WARNING: the following parameter is only used if type is "random_perturbation"
  sigma: 0.2 # Size of range used for sampling scaling factor

settings:
  num_processes: 16 # Number of parallel processes to use
  data_dir: "./data_out" # Directory to save generated data relative to the project root
  large_chunk_size: 1000 # Number of load scenarios processed before saving
  overwrite: true # If true, overwrites existing files, if false, appends to files
  mode: "pf" # Mode of the script; options: pf, opf. pf: power flow data where one or more operating limits – the inequality constraints defined in OPF, e.g., voltage magnitude or branch limits – may be violated. opf:  datapoints for training OPF solvers, with cost-optimal dispatches that satisfy all operating limits (OPF-feasible)
  include_dc_res: true # If true, also stores the results of dc power flow (in addition to the results AC power flow). does not work with mode "opf"
  enable_solver_logs: true # If true, write OPF/PF logs to {data_dir}/solver_log; PF fast and DCPF fast do not log.
  pf_fast: true # Whether to use fast PF solver by default (compute_ac_pf from powermodels.jl); if false, uses Ipopt-based PF. Some networks e.g. case10000_goc do not work with pf_fast: true. pf_fast is faster and more accurate than the Ipopt-based PF.
  dcpf_fast: true # Whether to use fast DCPF solver by default (compute_dc_pf from PowerModels.jl)
  max_iter: 200 # Max iterations for Ipopt-based solvers

Output Files

The data generation process writes the following artifacts under: {settings.data_dir}/{network.name}/raw

tqdm.log: Progress bar log.
error.log: Error messages captured during generation.
args.log: YAML dump of the configuration used for this run.
scenarios_{generator}.parquet: Load scenarios (per-element time series) produced by the selected load generator.
scenarios_{generator}.html: Plot of the generated load scenarios.
scenarios_{generator}.log: Generator-specific notes (e.g., bounds for the global scaling factor when using agg_load_profile).
bus_data.parquet: Bus-level features for each processed scenario (columns BUS_COLUMNS and, if settings.include_dc_res=True, also DC_BUS_COLUMNS).
gen_data.parquet: Generator features per scenario (columns GEN_COLUMNS).
branch_data.parquet: Branch features per scenario (columns BRANCH_COLUMNS).
y_bus_data.parquet: Nonzero Y-bus entries per scenario with columns [scenario, index1, index2, G, B].
runtime_data.parquet: Runtime data for each scenario (AC and DC solver execution times).

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.4

Mar 10, 2026

1.0.3

Mar 4, 2026

1.0.2

Jan 16, 2026

1.0.1

Jan 7, 2026

1.0

Dec 28, 2025

This version

0.0.9a0 pre-release

Dec 2, 2025

0.0.8

Sep 9, 2025

0.0.7

Sep 8, 2025

0.0.6

Jul 10, 2025

0.0.5

Jun 25, 2025

0.0.4

Jun 25, 2025

0.0.3

Jun 25, 2025

0.0.2

Jun 24, 2025

0.0.2b0 pre-release

Jun 24, 2025

0.0.1

Jun 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gridfm_datakit-0.0.9a0.tar.gz (419.4 kB view details)

Uploaded Dec 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gridfm_datakit-0.0.9a0-py3-none-any.whl (409.9 kB view details)

Uploaded Dec 2, 2025 Python 3

File details

Details for the file gridfm_datakit-0.0.9a0.tar.gz.

File metadata

Download URL: gridfm_datakit-0.0.9a0.tar.gz
Upload date: Dec 2, 2025
Size: 419.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gridfm_datakit-0.0.9a0.tar.gz
Algorithm	Hash digest
SHA256	`b9dc2bc9067bbe5784cd827ed9141ebf1e018474086c6567e0b2cd4ed26d6f3e`
MD5	`06049397fca316bd0159a4acef57bdd2`
BLAKE2b-256	`c453a512427177c3c0948e43a9e33165a86c0a7a53ca3dd514fcd86ae36fba87`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gridfm_datakit-0.0.9a0.tar.gz:

Publisher: release.yaml on gridfm/gridfm-datakit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gridfm_datakit-0.0.9a0.tar.gz
- Subject digest: b9dc2bc9067bbe5784cd827ed9141ebf1e018474086c6567e0b2cd4ed26d6f3e
- Sigstore transparency entry: 736774057
- Sigstore integration time: Dec 2, 2025
Source repository:
- Permalink: gridfm/gridfm-datakit@a7bb1ec4766cf32f5884bee6eb6f64a845dc02a3
- Branch / Tag: refs/tags/0.0.9a
- Owner: https://github.com/gridfm
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@a7bb1ec4766cf32f5884bee6eb6f64a845dc02a3
- Trigger Event: release

File details

Details for the file gridfm_datakit-0.0.9a0-py3-none-any.whl.

File metadata

Download URL: gridfm_datakit-0.0.9a0-py3-none-any.whl
Upload date: Dec 2, 2025
Size: 409.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gridfm_datakit-0.0.9a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c2dfceedf8a201aaf0581ca4ee790995cf3a0200135e20f1fce0b32d6451ddd`
MD5	`84a49aa22e07746fcc21704b3ba47332`
BLAKE2b-256	`43c14d3d59114dabf7c9fe99163868de8eff61676db9b3f2d783a1cba7c01917`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gridfm_datakit-0.0.9a0-py3-none-any.whl:

Publisher: release.yaml on gridfm/gridfm-datakit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gridfm_datakit-0.0.9a0-py3-none-any.whl
- Subject digest: 9c2dfceedf8a201aaf0581ca4ee790995cf3a0200135e20f1fce0b32d6451ddd
- Sigstore transparency entry: 736774061
- Sigstore integration time: Dec 2, 2025
Source repository:
- Permalink: gridfm/gridfm-datakit@a7bb1ec4766cf32f5884bee6eb6f64a845dc02a3
- Branch / Tag: refs/tags/0.0.9a
- Owner: https://github.com/gridfm
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@a7bb1ec4766cf32f5884bee6eb6f64a845dc02a3
- Trigger Event: release

gridfm-datakit 0.0.9a0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Installation

For Developers

Getting Started

Option 1: Run data gen using interactive interface

Option 2: Using the command line interface

Generate Data

Validate Data

Compute Statistics

Plot Feature Distributions

Configuration Overview

Output Files

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance