Skip to main content

A synthetic data generator CLI for a fictional Jaffle Shop

Project description

🥪 Jaffle Shop Generator 🏭

The Jaffle Shop Generator or jafgen is a simple command line tool for generating synthetic datasets suitable for analytics engineering practice or demonstrations. The data is generated in CSV format and is designed to be used with a relational database. It follows a simple schema, with tables for:

  • Customers
  • Orders from those Customers
  • Products
  • Order Items of those Products
  • Supplies needed for those Products
  • Store Locations

It uses some straightforward math to create seasonality and trends in the data, for instance weekends being less busy than weekdays, customers having certain preferences, and new store locations opening over time. We plan to add more data types and complexity as the codebase evolves.

Installation

If you have pipx installed, jafgen is an ideal tool to use via pipx. You can generate data without installing anything in the local workspace using the following:

pipx run jafgen [options]

You can also install jafgen into your project or workspace, ideally in a virtual environment.

pip install jafgen

Use

jafgen takes one argument:

  • [int] Years to generate data for. The default is 1 year.

The following options are available:

  • --pre sets a prefix for the generated files in the format [prefix]_[file_name].csv. It defaults to raw.

Generate a simulation spanning 3 years from 2016-2019 with a prefix of cool:

jafgen 3 --pre cool

Purpose

Finding a good data set to practice, learn, or teach analytics engineering with can be difficult. Most open datasets are great for machine learning -- they offer single wide tables that you can manipulate and analyze. Full, real relational databases on the other hand are generally protected by private companies. Not only that, but they're a bit too real. To get to a state that a beginner or intermediate person can understand, there needs to be an advanced amount of analytics engineering transformation applied.

Approach

Coming soon.

Contribution

We welcome contribution to the project! It's relatively simple to get started, just clone the repo, spin up a virtual environment, and install the dependencies:

gh repo clone dbt-labs/jaffle-shop-generator
# You ARE using `uv`, right? If not, check it out! https://astral.sh/uv
uv venv
# Install the package requirements
uv pip install -r requirements.txt
# Install the dev tooling (ruff and pytest)
uv pip install -r dev-requirements.txt
# Install the package in editable mode
uv pip install -e .

Working out from the jafgen command, you can see the main entrypoint in jaffle_shop_generator/cli.py. This calls the simulation found in jafgen/simulation.py. The simulation is where most of the magic happens.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jafgen-0.4.9.tar.gz (20.3 kB view hashes)

Uploaded Source

Built Distribution

jafgen-0.4.9-py3-none-any.whl (20.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page