A generator for synthetic sales data
Project description
synthetic_sample
synthetic_sample is a data generation application for producing synthetic sales transactions over a time series, including associated shipment and product data
Usage
Sample data is generated by running synthetic_sample_generator.py and using
python3 synthetic_sample_generator.py --json_filepath JSON_FILEPATH --output_directory OUTPUT_DIRECTORY --create_records
where
json_filepathis the filepath to the input JSON (see Request Requirements below)output_directoryis the directory to save output data to, in CSV formatcreate_recordsis a flag that indicates that raw record data should also be saved to the output directory. Running without this flag results in only aggregate output data
Request Requirements
The required input format is a JSON with the following fields:
- Required:
start_date: date in the first period to include, e.g. if 2020/02/15 is provided, the full week of that date will be includedend_date: date in the last period to include, e.g. if 2020/02/15 is provided, the full week of that date will be includedannual_growth_factor: year over year growth factor, 10% growth corresponds to a value of 1.1period_type: indicates what type of curve to generate, supports "month" or "week"- at least one of
total_sales: total number of sales for the periodtotal_packages: total number of packages shipped for the periodtotal_quantity: total number of items sold for the periodannual_sales: annualized number of sales for the periodannual_packages: annualized number of packages shipped for the periodannual_quantity: annualized number of items sold for the period
curve_definition: Definition of the curve to create, either as a list of dictionaries with each feature or as a string indicating the name of the default curve to use.- If a list of dictionaries is provided, they must adhere to the following structure
- Required Keys:
anchor_type: Type of annual anchor used to define the feature- Possible Values: "holiday", "week_of_year", "month_of_year", "day_of_year"
anchor_point: Annual point to define the feature- Possible values: (string) - holiday name, (int) - week or day of year
anchor_value: Cumulative percent of total sales (0.0-1.0) completed by the end of the period of the anchor_point
- Optional Keys:
relative_start: Number of periods before the anchor_point to define a relative cumulative percent valuestart_value: Cumulative percent of total sales (0.0-1.0) completed by the end of the period indicated by relative_startrelative_end: Number of periods before the anchor_point to define a relative cumulative percent valueend_value: Cumulative percent of total sales (0.0-1.0) completed by the end of the period indicated by relative_end
- Required Keys:
- If a string is provided, it must correspond to a default in
synthetic_sample/defaults/curves/{period_type}/{curve_definition}.json- Initial set of available curves are
modern_brandmodern_distributortraditional_brandtraditional_distributor
- Initial set of available curves are
- If a list of dictionaries is provided, they must adhere to the following structure
- Optional:
default_type: string indicating the type of defaults to use, these can be found as JSON insynthetic_sample/defaults/lib/product_distribution: dictionary of product labels (i.e. SKUs) and their relative weightsweek_distribution: dictionary of weeks of the month (where 1 is the first week and -1 is the last) and their relative weightsweekday_distribution: dictionary of weekdays (where 0 is Monday and 6 is Sunday) and their relative weightsseasonal_distribution: dictionary of seasons ("Q1"..."Q4") and their relative weightsmodifiers: list of any modifiers to apply.- "covid": Applies a 33% boost to all periods between 2020/3/26 and 2021/9/1
Example:
The below request will generate data for each month starting 2018-06 and ending 2020-12.
{
"start_date": "2018-06-01",
"end_date": "2020-12-31",
"total_sales": 1000000,
"total_packages": 1500000,
"total_quantity": 6000000,
"annual_growth_factor": 1.15,
"product_distribution": {
"AAA-01" : 1,
"AAA-02" : 2.5,
"AAA-11" : 5.6,
"BBB-10" : 0.5,
"BBB-20" : 1
},
"week_distribution": {
"1": 0.1,
"-1": 0.5
},
"weekday_distribution": {
"0": 0.0,
"1": 0.0,
"2": 0.0,
"3": 0.0,
"4": 0.0,
"5": 2.0,
"6": 1.0
},
"seasonal_distribution": {
"Q1": 1,
"Q2": 1,
"Q3": 1,
"Q4": 1
},
"period_type": "month",
"curve_definition": [
{
"anchor_type": "month_of_year",
"anchor_point": 1,
"anchor_value": 0.0424
},
{
"anchor_type": "month_of_year",
"anchor_point": 2,
"anchor_value": 0.103
},
{
"anchor_type": "month_of_year",
"anchor_point": 3,
"anchor_value": 0.203
},
{
"anchor_type": "month_of_year",
"anchor_point": 4,
"anchor_value": 0.3152
},
{
"anchor_type": "month_of_year",
"anchor_point": 5,
"anchor_value": 0.4139
},
{
"anchor_type": "month_of_year",
"anchor_point": 6,
"anchor_value": 0.4776
},
{
"anchor_type": "month_of_year",
"anchor_point": 7,
"anchor_value": 0.5321
},
{
"anchor_type": "month_of_year",
"anchor_point": 8,
"anchor_value": 0.5897
},
{
"anchor_type": "month_of_year",
"anchor_point": 9,
"anchor_value": 0.6715
},
{
"anchor_type": "month_of_year",
"anchor_point": 10,
"anchor_value": 0.7836
},
{
"anchor_type": "month_of_year",
"anchor_point": 11,
"anchor_value": 0.9018
},
{
"anchor_type": "month_of_year",
"anchor_point": 12,
"anchor_value": 1.0
}
]
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthetic_sample-1.0.1.tar.gz.
File metadata
- Download URL: synthetic_sample-1.0.1.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a49806645185a9f27fe80aa89e6d9c8402ed58bacd4561759de296688d0475a6
|
|
| MD5 |
56da8d5a6e46b9c0efacb742a6ab9d9a
|
|
| BLAKE2b-256 |
01a9efa52ed9434fd3c74cc328ba59510d7caaddedd3400b2b9185ceb02ee75b
|
File details
Details for the file synthetic_sample-1.0.1-py3-none-any.whl.
File metadata
- Download URL: synthetic_sample-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f531ddba30d0b4fccb43afe41b6eff15cb4e7c5c719ebbac0ec6eb810043ba02
|
|
| MD5 |
1417c66f73a29d1f5f62c71ec1f07d06
|
|
| BLAKE2b-256 |
7ef1dfe73e03424c7b92f8d1c61d079972f721ddf3626ec0992a23d1a522242b
|