Skip to main content

A synthetic pandas query generation tool

Project description

Pandas Query Generator 🐼

Pandas Query Generator (pqg) is a tool designed to help users generate synthetic pandas queries for training machine learning models that estimate query execution costs or predict cardinality.

Installation

You can install the query generator using pip, the Python package manager:

pip install pqg

Usage

Below is the standard output of pqg --help, which elaborates on the various command-line arguments the tool accepts:

usage: pqg [--max-groupby-columns] [--max-merges] [--max-projection-columns] [--max-selection-conditions] [--multi-line] --num-queries [--output-file] --schema [--sorted] [--verbose]

Pandas Query Generator CLI

options:
  -h --help Show this help message and exit
  --max-groupby-columns Maximum number of columns in group by operations (default: 0)
  --max-merges Maximum number of table merges allowed (default: 2)
  --max-projection-columns Maximum number of columns to project (default: 0)
  --max-selection-conditions Maximum number of conditions in selection operations (default: 0)
  --multi-line Format queries on multiple lines (default: False)
  --num-queries num_queries The number of queries to generate
  --output-file The name of the file to write the results to (default: queries.txt)
  --schema schema Path to the relational schema JSON file
  --sorted Whether or not to sort the queries by complexity (default: False)
  --verbose Print extra generation information and statistics (default: False)

The required parameters, as shown, are num-queries and schema. The num-queries parameter simply instructs the program to generate that many queries.

The schema parameter is a pointer to a JSON file path that describes meta-information about the data we're generating queries for.

A sample schema looks like this:

{
  "entities": {
    "customer": {
      "primary_key": "id",
      "properties": {
        "id": {
          "type": "int",
          "min": 1,
          "max": 1000
        },
        "name": {
          "type": "string",
          "starting_character": ["A", "B", "C"]
        },
        "status": {
          "type": "enum",
          "values": ["active", "inactive"]
        }
      },
      "foreign_keys": {}
    },
    "order": {
      "primary_key": "order_id",
      "properties": {
        "order_id": {
          "type": "int",
          "min": 1,
          "max": 5000
        },
        "customer_id": {
          "type": "int",
          "min": 1,
          "max": 1000
        },
        "amount": {
          "type": "float",
          "min": 10.0,
          "max": 1000.0
        },
        "status": {
          "type": "enum",
          "values": ["pending", "completed", "cancelled"]
        }
      },
      "foreign_keys": {
        "customer_id": ["id", "customer"]
      }
    }
  }
}

This file can be found in examples/customer/schema.json, generate a few queries from this schema with pqg --num-queries 100 --schema examples/customer/schema.json --verbose.

Prior Art

This version of the Pandas Query Generator is based off of the thorough research work of previous students of COMP 400 at McGill University, namely Ege Satir, Hongxin Huo and Dailun Li.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqg-0.0.0.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

pqg-0.0.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file pqg-0.0.0.tar.gz.

File metadata

  • Download URL: pqg-0.0.0.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.25

File hashes

Hashes for pqg-0.0.0.tar.gz
Algorithm Hash digest
SHA256 d3b071a1c1a039268b084cf075bc14a70bb0a01e0737a26b28151623279878b3
MD5 bae8b676897edf19360f54522ce468fe
BLAKE2b-256 d30cdd48935be3b1fd29b306681a094a05f9350e6f8f2f3349f90c15e6ccb328

See more details on using hashes here.

File details

Details for the file pqg-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: pqg-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.25

File hashes

Hashes for pqg-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8687cf687afdb112279fe4f19226a594c2fc0825d072aef7250def7854e19c6c
MD5 9c53dd9299d5890117c59ed7f40a6dc5
BLAKE2b-256 6472905813fbf32ebf11ea7209f48cc997009123ef05fe8053c82ac2b142face

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page