Skip to main content

A generator for synthetic sales data

Project description

synthetic_sample

synthetic_sample is a data generation application for producing synthetic sales transactions over a time series, including associated shipment and product data

Usage

Sample data is generated by running synthetic_sample_generator.py and using

python3 synthetic_sample_generator.py --json_filepath JSON_FILEPATH --output_directory OUTPUT_DIRECTORY --create_records

where

  • json_filepath is the filepath to the input JSON (see Request Requirements below)
  • output_directory is the directory to save output data to, in CSV format
  • create_records is a flag that indicates that raw record data should also be saved to the output directory. Running without this flag results in only aggregate output data

Request Requirements

The required input format is a JSON with the following fields:

  • Required:
    • start_date: date in the first period to include, e.g. if 2020/02/15 is provided, the full week of that date will be included
    • end_date: date in the last period to include, e.g. if 2020/02/15 is provided, the full week of that date will be included
    • annual_growth_factor: year over year growth factor, 10% growth corresponds to a value of 1.1
    • period_type: indicates what type of curve to generate, supports "month" or "week"
    • at least one of
      • total_sales: total number of sales for the period
      • total_packages: total number of packages shipped for the period
      • total_quantity: total number of items sold for the period
      • annual_sales: annualized number of sales for the period
      • annual_packages: annualized number of packages shipped for the period
      • annual_quantity: annualized number of items sold for the period
    • curve_definition: Definition of the curve to create, either as a list of dictionaries with each feature or as a string indicating the name of the default curve to use.
      • If a list of dictionaries is provided, they must adhere to the following structure
        • Required Keys:
          • anchor_type: Type of annual anchor used to define the feature
            • Possible Values: "holiday", "week_of_year", "month_of_year", "day_of_year"
          • anchor_point: Annual point to define the feature
            • Possible values: (string) - holiday name, (int) - week or day of year
          • anchor_value: Cumulative percent of total sales (0.0-1.0) completed by the end of the period of the anchor_point
        • Optional Keys:
          • relative_start: Number of periods before the anchor_point to define a relative cumulative percent value
          • start_value: Cumulative percent of total sales (0.0-1.0) completed by the end of the period indicated by relative_start
          • relative_end: Number of periods before the anchor_point to define a relative cumulative percent value
          • end_value: Cumulative percent of total sales (0.0-1.0) completed by the end of the period indicated by relative_end
      • If a string is provided, it must correspond to a default in synthetic_sample/defaults/curves/{period_type}/{curve_definition}.json
        • Initial set of available curves are
          • modern_brand
          • modern_distributor
          • traditional_brand
          • traditional_distributor
  • Optional:
    • default_type: string indicating the type of defaults to use, these can be found as JSON in synthetic_sample/defaults/lib/
    • product_distribution: dictionary of product labels (i.e. SKUs) and their relative weights
    • week_distribution: dictionary of weeks of the month (where 1 is the first week and -1 is the last) and their relative weights
    • weekday_distribution: dictionary of weekdays (where 0 is Monday and 6 is Sunday) and their relative weights
    • seasonal_distribution: dictionary of seasons ("Q1"..."Q4") and their relative weights
    • modifiers: list of any modifiers to apply.
      • "covid": Applies a 33% boost to all periods between 2020/3/26 and 2021/9/1

Example:

The below request will generate data for each month starting 2018-06 and ending 2020-12.

{
  "start_date": "2018-06-01",
  "end_date": "2020-12-31",
  "total_sales": 1000000,
  "total_packages": 1500000,
  "total_quantity": 6000000,
  "annual_growth_factor": 1.15,
  "product_distribution": {
    "AAA-01" : 1,
    "AAA-02" : 2.5,
    "AAA-11" : 5.6,
    "BBB-10" : 0.5,
    "BBB-20" : 1
  },
  "week_distribution": {
    "1": 0.1,
    "-1": 0.5
  },
  "weekday_distribution": {
    "0": 0.0,
    "1": 0.0,
    "2": 0.0,
    "3": 0.0,
    "4": 0.0,
    "5": 2.0,
    "6": 1.0
  },
  "seasonal_distribution": {
    "Q1": 1,
    "Q2": 1,
    "Q3": 1,
    "Q4": 1
  },
  "period_type": "month",
  "curve_definition": [
    {
      "anchor_type": "month_of_year",
      "anchor_point": 1,
      "anchor_value": 0.0424
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 2,
      "anchor_value": 0.103
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 3,
      "anchor_value": 0.203
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 4,
      "anchor_value": 0.3152
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 5,
      "anchor_value": 0.4139
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 6,
      "anchor_value": 0.4776
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 7,
      "anchor_value": 0.5321
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 8,
      "anchor_value": 0.5897
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 9,
      "anchor_value": 0.6715
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 10,
      "anchor_value": 0.7836
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 11,
      "anchor_value": 0.9018
    },
    {
      "anchor_type": "month_of_year",
      "anchor_point": 12,
      "anchor_value": 1.0
    }
  ]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic_sample-1.0.1.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthetic_sample-1.0.1-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file synthetic_sample-1.0.1.tar.gz.

File metadata

  • Download URL: synthetic_sample-1.0.1.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for synthetic_sample-1.0.1.tar.gz
Algorithm Hash digest
SHA256 a49806645185a9f27fe80aa89e6d9c8402ed58bacd4561759de296688d0475a6
MD5 56da8d5a6e46b9c0efacb742a6ab9d9a
BLAKE2b-256 01a9efa52ed9434fd3c74cc328ba59510d7caaddedd3400b2b9185ceb02ee75b

See more details on using hashes here.

File details

Details for the file synthetic_sample-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: synthetic_sample-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for synthetic_sample-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f531ddba30d0b4fccb43afe41b6eff15cb4e7c5c719ebbac0ec6eb810043ba02
MD5 1417c66f73a29d1f5f62c71ec1f07d06
BLAKE2b-256 7ef1dfe73e03424c7b92f8d1c61d079972f721ddf3626ec0992a23d1a522242b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page