Skip to main content

Timeseries data generation and preparation for batch jobs at scale

Project description

Sensorizer

Sensorizer is a python library built to simulate a flow of sensor data to disk (Avro) or event hubs, is meant to be the starting point of a data pipeline.

The library has a docker container companion so you can have a source of sensor data in approximately 5 mins, see the docker deployment section if your sink is either an avro file or an Azure Event Hub, if you want an additional sink, have a look at the issues section.

The main characteristic is that it tries to simulate traffic with similar timings, that is, it will release up to 400K readings per second, one by one. Then you can send it to a streaming sink (Azure Event Hub implemented) or a disk option (Avro implemented).

Docker deployment

The deployment is container based, you can just pull the container:

docker pull jgc31416/sensorizer:latest

Then pass the configuration as environment variables, set up the environment variables depending on the sink you want, this is an example for the Avro file sink using an environment file, see /docs folder:

docker run --env-file=avro_sink.cfg jgc31416/sensorizer:latest

You will get the generated files in the container.

Avro file sink

You might want to map the output folder of the dump file into your container host.

export NUMBER_OF_SENSORS="10000"
export NUMBER_OF_HOURS="1"
export SINK_TYPE="file"                            # store sensor readings to a file
export RUNNING_MODE="batch"                        # send the readings one by one or in batch mode
export EVENT_DUMP_FILENAME="/tmp/event_dump.avro"  # Where to save the data

Event Hub sink

export NUMBER_OF_SENSORS="10000"
export NUMBER_OF_HOURS="1"
export SINK_TYPE="event_hub"                       # store sensor readings to a file
export RUNNING_MODE="batch"                        # send the readings one by one or in batch mode
export EVENT_HUB_ADDRESS="amqps://<EventHubNamepace>.servicebus.windows.net/<EventHub>"
export EVENT_HUB_SAS_POLICY="<PolicyName>"
export EVENT_HUB_SAS_KEY="<SAS_KEY>"

Distribution of the sensor readings

The distribution of the sensor readings is the following:

  • Frequencies are: 15% 1.0s, 65% 60.0s, 20% 3600.0s (Percentage is over the number of sensors, s is seconds per reading)
  • Base reading values: 50% 1, 40% 500, 10% 1000

Sensor format

@dataclass
class TimeserieRecord:
    """
    Class for time series
    """

    ts: float  # epoch
    data_type: str  # string(3)
    plant: str  # string(3)
    quality: float
    schema: str  # string(6)
    tag: str  # UUID
    value: float

Getting started with the library development

Clone the project from github and enjoy.

Prerequisites

This software has been tested in Linux, it might work in other OSs but it is definitely not warrantied.

- Ubuntu latest stable / Debian Stretch / Fedora +25
- Python 3.7 (Dataclasses and typing in the code)
- Docker (if you want a container deployment)

Installing

Python requirements:

pip install -r requirements.txt

Running the tests

As simple as:

pytest sensorizer/tests/

Built With

  • Python 3.7
  • Docker

Contributing

Simply put, per branch features, merge to master, so:

  • Fork the repo.
  • Make a feature branch and develop.
  • Test :)
  • Create a pull request for your new feature.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sensorizer-0.0.3.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sensorizer-0.0.3-py2.py3-none-any.whl (18.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file sensorizer-0.0.3.tar.gz.

File metadata

  • Download URL: sensorizer-0.0.3.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for sensorizer-0.0.3.tar.gz
Algorithm Hash digest
SHA256 254fc5db8678628d3ad45446b04b62cc6865563b39b24f46046bcf1340485c11
MD5 3d561c29db1f6644c6721a85f7fbfe8a
BLAKE2b-256 57d9399d72eb0ba9cd1ef042ccd0a595cd0e00a595a121840391e1538b122423

See more details on using hashes here.

File details

Details for the file sensorizer-0.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: sensorizer-0.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for sensorizer-0.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9b1d2aa7f0316602b04bdbefcadc721e66cc4a0a52e0089b9bbf8f15c24276b8
MD5 99c4a9609f05f52cd005ae99a41a956a
BLAKE2b-256 5f215eedbba960b8c3eca62ba8b9170858af4633ce218c3f4f64cd5204b1cea5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page