Random Data Generator

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Data Generator

Upload Python Package Tests

About
Getting Started
Usage
Changelog

About

Random Data Generator.

Create dataset with random data of datatypes int, float, str, date (more precisely python's datetime.datetime) and timestamp (as float).

Data can be exported to .csv, .xlsx or .json files.

Data are created using CLI commands or via TOML file specification.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them.

Python 3.8+ with pip

Installing

just use [sudo] pip[3] install Data-Generator

OR:

clone this repo
switch to project directory root
run [sudo] python3 setup.py install

Usage

data parameters can be provided via:
- command line
- TOML file
currently, these Python's datatypes are supported: int, str, float, datetime.datetime
generated data can be exported as .csv, .xlsx or .json files
- using .csv file format does not impact memory, since data is written in the file as they are generated
- using .xlsx file format does not impact memory, since memory is flushed after each row of data is written. For details, see xlsxwriter's documentation
- using .json file format has a memory impact, so be careful about that - this is given by Python's json module implementation, see Note HERE. Data has to be firstly completely generated in memory and then written into the file

OS differences

there should be no problems running this utility on standard linux distro or on Windows 10
only difference is:
- on linux, use python3 command
- on Windows 10, use python command

CLI syntax

General CLI commands

to display help for main parser in console, run python[3] -m data_generator -h
to display help for data parser (when entering specifications via CLI), run python[3] -m data_generator data -h
to display help for toml parser (when entering specifications via TOML file), run python[3] -m data_generator toml -h

Specify output file format

use optional parameter -sa or --save_as
this parameter belongs to main parser and has to be used before data subparser arguments
do not use this parameter together with toml subparser - all parameters are provided via .toml configuration file
if this parameter is not specified, default output file format is .csv
parameter's values:
- csv: csv
- json: json
- xlsx: xlsx
example: python[3] -m data_generator -sa json data ...

Specify output destination

use optional parameter -f or --folder
this parameter belongs to main parser and has to be used before data subparser arguments
do not use this parameter together with toml subparser - all parameters are provided via .toml configuration file
example: python[3] -m data_generator -f my_output_folder ...

Data parser

to specify integers:
- <column_name>:int:<lower_bound>:<upper_bound> - lower_bound can be negative
to specify floats:
- <column_name>:float:<lower_bound>:<upper_bound> - lower_bound can be negative. You must provide decimal digit, even if it is zero, like so: xxx.0
to specify str:
- <column_name>:str:<lower_bound>:<upper_bound> - lower_bound cannot be negative.
to specify date:
- <column_name>:date:<format_template>
  - under the hood, generator works with Python's native datetime module. That means, that all datetime format codes listed HERE should be suppported.
  - as of now, _ and - are permitted as separators
  - for example, format template can look like this: %Y%m%d_%H%M%S. This will display generated random date in format "yyyymmdd_hhmmss".
  - minimum year is 1, maximum year is 9999. See documentation.
to specify timestamp:
- <column_name>:timestamp:
- generator will generate datetime.datetime object of random date, with minimum year of 1970 and from it returns corresponding POSIX timestamp as float. For details see documentation

Formatting checks

Basic check is done after CLI command is entered, whether argument values for data parser conforms to the syntax described above. It is not exhaustive, but should stop you from the major typos like forgetting the :, or .0, etc...

CLI examples for Data parser:

python3 -m data_generator data column1:str:0:50 column2:str:101:101 column3:int:10:10 column4:int:0:1000 column5:float:0.0:1000.0 1000
- this will generate .csv file with 1000 rows of five columns with random data. First columns is of datatype str, it is str with variable length between 0 - 50 chars. Second column is str with fixed lenght of 101 chars. Third columns is int of the SAME VALUE of 10. Fourth column is int of variable size between 0 - 1000. Fifth column is float of variable size between 0.0 - 1000.0.
- 1000 - indicates how many rows will be generated
- generated .csv file is saved into default output folder. This can be changed using -f or --folder parameter
python3 -m data_generator -f my_output_folder/subfolder data header_with_underscore:str:10:10 100
- this will generate one "column" of random str data of fixed 10 chars lenght with 100 rows into the target folder of your choice. If the folder does not exist, it will be created
- notice, that you can use _ separator in the header names. Other separators like - are not permitted.
python3 -m data_generator data data_with_negative_int:int:-1000:1000 data_with_negative_float:float:-100000.0:0.0 10000
- this will generate 10 rows of data with integer in the interval <-1000, 1000> and float in the inteval <-100000.0, 0.0>
python3 -m data_generator data random_dates_without_separators:date:%Y%m%d%H%M%S random_dates_with_separators:date:%Y-%m-%d_%H-%M-%S 10
- generates two columns of random dates with and without using the allowed separators
python3 -m data_generator -sa json data data_with_negative_int:int:-1000:1000 data_with_negative_float:float:-100000.0:0.0 10000
- this will generate data as .json file

TOML parser

when you want to generate datafile with lots of fields, or event multiple files with different specs, it may be useful to be able to specify properties of fields permanently.
in this case, you can use configuration files, which use TOML syntax. Two example files can be found in the root of this project. Just copy & paste and add as many fields as you like.
files can be saved anywhere, just have the path ready

CLI examples for TOML parser

python[3] -m data_generator toml data_config_example01.toml data_config_example02.toml
- this will generate two outputs files according to specifications in these two .TOML files.
python[3] -m data_generator toml /custom/path/to/data_config_example01.toml
- this will generate one output via specification file in custom location

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.1

Jun 5, 2020

1.0.0

Jun 4, 2020

0.5.3

May 29, 2020

0.5.2

May 29, 2020

0.5.1

May 27, 2020

0.5.0

May 26, 2020

0.4.2.2

May 25, 2020

0.4.1.1

May 25, 2020

0.4.0.0

May 24, 2020

0.3.0.0

May 23, 2020

0.2.1.0

May 23, 2020

0.2.0.1

May 22, 2020

0.2.0.0

May 22, 2020

0.1.0.1

May 22, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Data Generator-1.0.1.tar.gz (10.2 kB view hashes)

Uploaded Jun 5, 2020 Source

Built Distribution

Data_Generator-1.0.1-py3-none-any.whl (11.4 kB view hashes)

Uploaded Jun 5, 2020 Python 3

Hashes for Data Generator-1.0.1.tar.gz

Hashes for Data Generator-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`3b1a92b09a69458bf0c3aa3d6e91cb74e85f2e6c639a039ab37361fddcb8f9ec`
MD5	`e067ceaac0721c2ce39e0eb5166db453`
BLAKE2b-256	`4b613eefa1c8dbe56934d34978c79caca71f7260bb646b6dbb5073d700e1ea35`

Hashes for Data_Generator-1.0.1-py3-none-any.whl

Hashes for Data_Generator-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0981101ab8b80ce9649032255aa25e8ebe5ec52f0c190027d847f7c322ce877`
MD5	`f4437552ba71d08054a4a666d4ab2dba`
BLAKE2b-256	`b93782729a75caeac6b9ce75ce51dcdd4b4cd7efa914cee6667ebabdfd229e87`