Skip to main content

Fake deployment and data generator

Project description

metrics-gen

dummy metrics generator

Getting Started

Metrics generator is built upon three main components:

  • Deployment: The indexes of the table, for example:
    • symbol in stock market.
    • (data_center, device_id) for devices in data centers
  • Static Data: Static data regarding the deployment, for example:
    • model_number for a device
    • score for a model
  • Metrics: Continuous metrics to generate about the deployment, for example:
    • cpu_utilization of a device
    • price of a stock

The first step in setting up the generator is creating a deployment. Then using the deployment, you can generate static data or continuous stream of metrics.

Create a deployment from configuration

To create a deployment from configuration you need to provide a yaml file containing the following:

deployment:
    <level_name>:
      faker: <faker_type>
      num_items: <num_items in the level>

Where level_name will be the name of the index, faker_type is the name of the faker generator and num_items is how many keys to create for this index.
Each provided level will create another num_items instances for each entry in it's previous levels.

Example: Given the following configuration yaml file:

deployment:
    device:
      faker: msisdn
      num_items: 2
    core:
      faker: msisdn
      num_items: 2

and running the following command:

from metrics_gen.deployment_generator import deployment_generator

dep_gen = deployment_generator()
deployment = dep_gen.generate_deployment(configuration=configuration)

Will generate the following example deployment:

device core
0 4120271911677 6950611701382
1 4120271911677 2255426557707
2 4120271911677 7717168891372
3 2260158002886 3213635322383
4 2260158002886 4007792940086
5 2260158002886 3720953132595

Notice that each extra level, multiplies the number of items created by num_item, thus we got 2 * 3 = 6 items created.

Create Static Data

To create a static data generator you need to supply a deployment dataframe and a configuration yaml.

The static data generator knows how to generator from two kinds of feature configurations: range and choice which should be specified in the yaml.

static:
    <feature_name>:
        kind: range
        min_range: <min_feature_range>, defaults to 0
        max_range: <max_feature_range>
        as_integer: <int or float>, defaults to False
    <feature_name>:
        kind: choice
        choices: <list of possible choices>

Each provided feature will generate a new feature column in the generated dataframe.

Example: Given the following yaml:

static:
    models: 
      kind: range
      min_range: 10
      max_range: 15
      as_integer: True
    country: 
      kind: choice
      choices: [A, B, C, D, E, F, G]

And the previous deployment:

from metrics_gen.static_data_generator import Static_data_generator


static_data_generator = Static_data_generator(
    deployment, static_configuration
)

generated_df = static_data_generator.generate_static_data()

Will generate the following dataframe:

device core models country
0 4120271911677 6950611701382 13 A
1 4120271911677 2255426557707 14 C
2 4120271911677 7717168891372 14 G
3 2260158002886 3213635322383 14 G
4 2260158002886 4007792940086 11 G
5 2260158002886 3720953132595 14 D

Create Continuous Metrics

To create a continuous metrics stream you need to provide a deployment dataframe and metrics creation configuration yaml.

errors:
    rate_in_ticks: < ~ticks between errors>
    length_in_ticks: < ~length of error mode>
timestamps:
    interval: <time between samples in seconds>
    stochastic_interval: <create random intervals (around interval)>
metrics:
  <metric name>:
    accuracy: <decimals to produce>
    distribution: normal
    distribution_params:
        mu: <mean>
        noise: <noise>
        sigma: <std>
    is_threshold_below: <True to produce max when in error mode, False for min>
    past_based_value: <True to add the latest metric to the last result (like in daily stock market), False to replace normally)
    produce_max: <True for candles-like presentation>
    produce_min: <True for candles-like presentation>
    validation:
        distribution: # per-sample validation
            max: <max value for individual sample>
            min: <min value for individual sample>
            validate: <True to activate validation>
      metric: # metric level validations
        max: <max value for overall-metric sample (only applicable to past-based-values)>
        min: <min value for overall-metric sample (only applicable to past-based-values)>
        validate: <True to activate validation>

Each configured feature will generate additional metric for your deployment.

Example: Given the following yaml

errors: {length_in_ticks: 10, rate_in_ticks: 5}
timestamps: {interval: 5s, stochastic_interval: true}
metrics:
  cpu_utilization:
    accuracy: 2
    distribution: normal
    distribution_params: {mu: 70, noise: 0, sigma: 10}
    is_threshold_below: true
    past_based_value: false
    produce_max: false
    produce_min: false
    validation:
      distribution: {max: 1, min: -1, validate: false}
      metric: {max: 100, min: 0, validate: true}
  throughput:
    accuracy: 2
    distribution: normal
    distribution_params: {mu: 250, noise: 0, sigma: 20}
    is_threshold_below: false
    past_based_value: false
    produce_max: false
    produce_min: false
    validation:
      distribution: {max: 1, min: -1, validate: false}
      metric: {max: 300, min: 0, validate: true}

And the previous deployment:

from metrics_gen.metrics_generator import Generator_df

metrics_generator = Generator_df(metrics_configuration, user_hierarchy=deployment)
generator = metrics_generator.generate(as_df=True)

df = next(generator)

Will generate the following dataframe:

timestamp core device cpu_utilization cpu_utilization_is_error throughput throughput_is_error is_error
2022-01-31 19:20:21.007087 2113309831673 4469221325973 100.0 True 0.0 True True
2022-01-31 19:20:21.007087 2115933686087 4469221325973 100.0 True 235.0679405785135 False False
2022-01-31 19:20:21.007087 0175482390171 4469221325973 70.26657388732976 False 208.34378630077305 False False
2022-01-31 19:20:21.007087 1626403145660 4038890878426 59.932750968399404 False 217.4335871243806 False False
2022-01-31 19:20:21.007087 7247058922310 4038890878426 83.98361382584898 False 265.3476318369042 False False
2022-01-31 19:20:21.007087 7030239128061 4038890878426 100.0 False 225.16604191632058 False False

To generate new samples all we need to do is call next(generator) and a new sample will be generated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metrics_gen-0.2.2.tar.gz (19.0 kB view hashes)

Uploaded Source

Built Distribution

metrics_gen-0.2.2-py3-none-any.whl (20.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page