Skip to main content

A synthetic pandas query generation tool

Project description

Pandas Query Generator 🐼

Pandas Query Generator (pqg) is a tool designed to help users generate synthetic pandas queries for training machine learning models that estimate query execution costs or predict cardinality.

Installation

You can install the query generator using pip, the Python package manager:

pip install pqg

Usage

Below is the standard output of pqg --help, which elaborates on the various command-line arguments the tool accepts:

usage: pqg [--max-groupby-columns] [--max-merges] [--max-projection-columns] [--max-selection-conditions] [--multi-line] --num-queries [--output-file] --schema [--sorted] [--verbose]

Pandas Query Generator CLI

options:
  -h --help Show this help message and exit
  --max-groupby-columns Maximum number of columns in group by operations (default: 0)
  --max-merges Maximum number of table merges allowed (default: 2)
  --max-projection-columns Maximum number of columns to project (default: 0)
  --max-selection-conditions Maximum number of conditions in selection operations (default: 0)
  --multi-line Format queries on multiple lines (default: False)
  --num-queries num_queries The number of queries to generate
  --output-file The name of the file to write the results to (default: queries.txt)
  --schema schema Path to the relational schema JSON file
  --sorted Whether or not to sort the queries by complexity (default: False)
  --verbose Print extra generation information and statistics (default: False)

The required parameters, as shown, are num-queries and schema. The num-queries parameter simply instructs the program to generate that many queries.

The schema parameter is a pointer to a JSON file path that describes meta-information about the data we're generating queries for.

A sample schema looks like this:

{
  "entities": {
    "customer": {
      "primary_key": "id",
      "properties": {
        "id": {
          "type": "int",
          "min": 1,
          "max": 1000
        },
        "name": {
          "type": "string",
          "starting_character": ["A", "B", "C"]
        },
        "status": {
          "type": "enum",
          "values": ["active", "inactive"]
        }
      },
      "foreign_keys": {}
    },
    "order": {
      "primary_key": "order_id",
      "properties": {
        "order_id": {
          "type": "int",
          "min": 1,
          "max": 5000
        },
        "customer_id": {
          "type": "int",
          "min": 1,
          "max": 1000
        },
        "amount": {
          "type": "float",
          "min": 10.0,
          "max": 1000.0
        },
        "status": {
          "type": "enum",
          "values": ["pending", "completed", "cancelled"]
        }
      },
      "foreign_keys": {
        "customer_id": ["id", "customer"]
      }
    }
  }
}

This file can be found in examples/customer/schema.json, generate a few queries from this schema with pqg --num-queries 100 --schema examples/customer/schema.json --verbose.

Prior Art

This version of the Pandas Query Generator is based off of the thorough research work of previous students of COMP 400 at McGill University, namely Ege Satir, Hongxin Huo and Dailun Li.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqg-0.1.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

pqg-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file pqg-0.1.0.tar.gz.

File metadata

  • Download URL: pqg-0.1.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.25

File hashes

Hashes for pqg-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0172193a868b1df10bdcb2662aaa3e7744a9b46c99f357cdee84d1860481eed5
MD5 f205f1756a9fc2717e3188325588d23a
BLAKE2b-256 8da392e2e490d9c0a43b4cc1b71b358532c086d0a854ab266a222396f571439c

See more details on using hashes here.

File details

Details for the file pqg-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pqg-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.25

File hashes

Hashes for pqg-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e421896abe07f466b58d03dce6c5384ada2a8f6f20c3d65ef9ac46b5bd7a5c6
MD5 15ec9af4083cb29f7c43aaaae5f48311
BLAKE2b-256 0e375ae9b7583e0aa16bde68265c0500a9fa449bc4d36435cde1a91c46342c6b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page