Skip to main content

An app based on pandas and scipy packages to load DataFrame-like tables either from remote or local repositories and generate synthetic samples from it.

Project description

SynthDataGen

Usage

This app is designed to be use in the following way:

  1. Load the data either from ESIOS or from a local CSV file which contains a pandas.DataFrame-like table.
  2. Apply the needed adjustments by
    • Applying a multiplication factor by year to all the rows
    • Upsampling or downsampling the data by means of different techniques
  3. Get samples from the DataFrame, following the available probability distributions, for each row.

Workflow and usage

  1. Depending on the used loader (ESIOSLoader, LocalDFLoader, etc.), the corresponding attributes are expected to be specified in the corresponding nested dictionary in the input parameters file:

    • "ESIOS": the fields for the access token, the particular indicator and the granularity for the data to be requested.
    • "LocalDF": the directory and name of the CSV file containing the DataFrame to be loaded, the variable name to get (indicator), whether to skip the first column or not, and the datetime format of the index column.
  2. The Loader.getDataFromSource(...) method receive a number of parameters, which are used as filters against the loaded DataFrame. So, the resulting DataFrame will start from an

    • initial year, and consider
    • from an initial datetime (default value: 'now')
    • a number of hours ahead.
    • Besides, whether to discard the February 29 or not should also be specified.
  3. The adjustments by year method receives a dictionary <year,adjustmentValue> = <int,int|float>. It is used for inflation or similar adjustments of a DataFrame. It is specified in percentage, so a 10 indicate a positive adjustment of a 10%, a -32.0 represents a negative adjustment of 32%, and a 347.89 represents just that.

  4. In case a posterior upsampling or downsampling of the data wanted to be performed, the corresponding methods are used to specify the granularity when running the Adjustments.upsample(...) and Adjustments.downsample(...) methods.

    • The granularity (here frequency) should be an integer followed by a unit ('D': daily, 'H': hourly, 'T': minutely, 'S': secondly). E.g. "2T" == and entry for every 2 minutes.
    • The interpolation method and the aggregation function for upsampling and downsampling respectively, should be specified too.
  5. Finally, for sampling the current data by means of te Sampling.getSamples(...) method, we should provide

    • a number of desired samples to be generated
    • and the probability distribution to consider.

Examples

A similar example has been included and extended in the ./notebooks/fullExample.ipynb Jupyter notebook.

import synthDataGen.controller as controller
from datetime import datetime

controller = controller.LocalDFLoader("./synthDataGen/settings/inputParams.json")
df = controller.getDataFromSource(initialYear=2007, datetime.now(), hoursAhead=10, include29February=False)

# DataFrame adjustments
from synthDataGen.adjustments import FactorByYear, ChangeResolution

df = FactorByYear.run(df, adjustmentsDict={2022: 10, 2021: 10, 2020: 10, 2019: 10, 2018: 10, 2017: 10, 2016: 10, 2015: 10, 2014: 10, 2013: 10, 2012: 10, 2011: 10, 2010: 10, 2009: 10, 2008: 10, 2007: 10})

# Up/down sampling
df = ChangeResolution.upsample(df, frequency="15T", method="polynomial", order=2)
df = ChangeResolution.downsample(df, frequency="2H", aggregationFunc="mean")

# Samples generation
from synthDataGen.utils import Sampling

df = Sampling.getSamples(df, 5000, "truncnorm")

Acknowledgements

© Copyright 2024, Germán Navarro $^\dagger$, Santiago Fernández Prieto $^{\ddagger,1}$, David Aller Giraldez $^\ddagger$, Ricardo Enríquez Miranda $^{\ddagger,2}$, Javier Hernanz Zájara $^{\ddagger,2}$,

$^\dagger$ Barcelona Supercomputing Center
$^\ddagger$ Repsol

$^1$ Repsol-BSC Research Center

$^2$ Repsol Quantum Advisory Team

Developed within the framework of the project CUCO. Financed by the CDTI and with the support of the Spanish Ministry of Science and Innovation under the Recovery, Transformation and Resilience Plan (RTRP).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthdatagen-0.4.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

synthDataGen-0.4-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file synthdatagen-0.4.tar.gz.

File metadata

  • Download URL: synthdatagen-0.4.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for synthdatagen-0.4.tar.gz
Algorithm Hash digest
SHA256 261df566f85162bdaf6f0f7ac0ae87bd7a2cd55720272d10e12948b2ecf05643
MD5 bf8bc206de05de32fe6f611b2be24c54
BLAKE2b-256 95e0417e221ff53f2260e38a0e0ac239737574a287ea24d604e64e324165f194

See more details on using hashes here.

File details

Details for the file synthDataGen-0.4-py3-none-any.whl.

File metadata

  • Download URL: synthDataGen-0.4-py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for synthDataGen-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3c46cefcc9488ed86f5276cb60bf3f1ecc617235f78a68326ee2a3208452ba87
MD5 749ea493757f6eee1b1b71c1ecbcb1c3
BLAKE2b-256 c918c3fc234f6ee8128c4b4c0d5bf21ba0631e23ebc03446dedd1c095905c24b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page