Skip to main content

Random Data Generator

Project description

Data Generator

Upload Python Package Tests

Table of Contents

About

Random Data Generator.

Create dataset with random data of datatypes int, float, str, date (more precisely python's datetime.datetime) and timestamp (as float).

Data can be exported to .csv, .xlsx or .json files.

Data are created using CLI commands or via TOML file specification.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them.

  • Python 3.8+ with pip

Installing

  • just use [sudo] pip[3] install Data-Generator

OR:

  • clone this repo
  • switch to project directory root
  • run [sudo] python3 setup.py install

Usage

  • data parameters can be provided via:

    • command line

    • TOML file

  • currently, these Python's datatypes are supported: int, str, float, datetime.datetime

  • generated data can be exported as .csv, .xlsx or .json files

    • using .csv file format does not impact memory, since data is written in the file as they are generated

    • using .xlsx file format does not impact memory, since memory is flushed after each row of data is written. For details, see xlsxwriter's documentation

    • using .json file format has a memory impact, so be careful about that - this is given by Python's json module implementation, see Note HERE. Data has to be firstly completely generated in memory and then written into the file

OS differences

  • there should be no problems running this utility on standard linux distro or on Windows 10

  • only difference is:

    • on linux, use python3 command

    • on Windows 10, use python command

CLI syntax

General CLI commands

  • to display help for main parser in console, run python[3] -m data_generator -h
  • to display help for data parser (when entering specifications via CLI), run python[3] -m data_generator data -h
  • to display help for toml parser (when entering specifications via TOML file), run python[3] -m data_generator toml -h

Specify output file format

  • use optional parameter -sa or --save_as

  • this parameter belongs to main parser and has to be used before data subparser arguments

  • do not use this parameter together with toml subparser - all parameters are provided via .toml configuration file

  • if this parameter is not specified, default output file format is .csv

  • parameter's values:

    • csv: csv

    • json: json

    • xlsx: xlsx

  • example: python[3] -m data_generator -sa json data ...

Specify output destination

  • use optional parameter -f or --folder

  • this parameter belongs to main parser and has to be used before data subparser arguments

  • do not use this parameter together with toml subparser - all parameters are provided via .toml configuration file

  • example: python[3] -m data_generator -f my_output_folder ...

Data parser

  • to specify integers:

    • <column_name>:int:<lower_bound>:<upper_bound> - lower_bound can be negative
  • to specify floats:

    • <column_name>:float:<lower_bound>:<upper_bound> - lower_bound can be negative. You must provide decimal digit, even if it is zero, like so: xxx.0
  • to specify str:

    • <column_name>:str:<lower_bound>:<upper_bound> - lower_bound cannot be negative.
  • to specify date:

    • <column_name>:date:<format_template>

      • under the hood, generator works with Python's native datetime module. That means, that all datetime format codes listed HERE should be suppported.

      • as of now, _ and - are permitted as separators

      • for example, format template can look like this: %Y%m%d_%H%M%S. This will display generated random date in format "yyyymmdd_hhmmss".

      • minimum year is 1, maximum year is 9999. See documentation.

  • to specify timestamp:

    • <column_name>:timestamp:

    • generator will generate datetime.datetime object of random date, with minimum year of 1970 and from it returns corresponding POSIX timestamp as float. For details see documentation

Formatting checks

Basic check is done after CLI command is entered, whether argument values for data parser conforms to the syntax described above. It is not exhaustive, but should stop you from the major typos like forgetting the :, or .0, etc...

CLI examples for Data parser:
  • python3 -m data_generator data column1:str:0:50 column2:str:101:101 column3:int:10:10 column4:int:0:1000 column5:float:0.0:1000.0 1000

    • this will generate .csv file with 1000 rows of five columns with random data. First columns is of datatype str, it is str with variable length between 0 - 50 chars. Second column is str with fixed lenght of 101 chars. Third columns is int of the SAME VALUE of 10. Fourth column is int of variable size between 0 - 1000. Fifth column is float of variable size between 0.0 - 1000.0.

    • 1000 - indicates how many rows will be generated

    • generated .csv file is saved into default output folder. This can be changed using -f or --folder parameter

  • python3 -m data_generator -f my_output_folder/subfolder data header_with_underscore:str:10:10 100

    • this will generate one "column" of random str data of fixed 10 chars lenght with 100 rows into the target folder of your choice. If the folder does not exist, it will be created

    • notice, that you can use _ separator in the header names. Other separators like - are not permitted.

  • python3 -m data_generator data data_with_negative_int:int:-1000:1000 data_with_negative_float:float:-100000.0:0.0 10000

    • this will generate 10 rows of data with integer in the interval <-1000, 1000> and float in the inteval <-100000.0, 0.0>
  • python3 -m data_generator data random_dates_without_separators:date:%Y%m%d%H%M%S random_dates_with_separators:date:%Y-%m-%d_%H-%M-%S 10

    • generates two columns of random dates with and without using the allowed separators
  • python3 -m data_generator -sa json data data_with_negative_int:int:-1000:1000 data_with_negative_float:float:-100000.0:0.0 10000

    • this will generate data as .json file

TOML parser

  • when you want to generate datafile with lots of fields, or event multiple files with different specs, it may be useful to be able to specify properties of fields permanently.

  • in this case, you can use configuration files, which use TOML syntax. Two example files can be found in the root of this project. Just copy & paste and add as many fields as you like.

  • files can be saved anywhere, just have the path ready

CLI examples for TOML parser
  • python[3] -m data_generator toml data_config_example01.toml data_config_example02.toml

    • this will generate two outputs files according to specifications in these two .TOML files.
  • python[3] -m data_generator toml /custom/path/to/data_config_example01.toml

    • this will generate one output via specification file in custom location

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Data Generator-1.0.1.tar.gz (10.2 kB view hashes)

Uploaded Source

Built Distribution

Data_Generator-1.0.1-py3-none-any.whl (11.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page