A synthetic pandas query generation tool
Project description
Pandas Query Generator 🐼
Pandas Query Generator (pqg) is a tool designed to help users generate synthetic pandas queries for training machine learning models that estimate query execution costs or predict cardinality.
Installation
You can install the query generator using pip, the Python package manager:
pip install pqg
Usage
Below is the standard output of pqg --help
, which elaborates on the various
command-line arguments the tool accepts:
usage: pqg [--max-groupby-columns] [--max-merges] [--max-projection-columns] [--max-selection-conditions] [--multi-line] --num-queries [--output-file] --schema [--sorted] [--verbose]
Pandas Query Generator CLI
options:
-h --help Show this help message and exit
--max-groupby-columns Maximum number of columns in group by operations (default: 0)
--max-merges Maximum number of table merges allowed (default: 2)
--max-projection-columns Maximum number of columns to project (default: 0)
--max-selection-conditions Maximum number of conditions in selection operations (default: 0)
--multi-line Format queries on multiple lines (default: False)
--num-queries num_queries The number of queries to generate
--output-file The name of the file to write the results to (default: queries.txt)
--schema schema Path to the relational schema JSON file
--sorted Whether or not to sort the queries by complexity (default: False)
--verbose Print extra generation information and statistics (default: False)
The required parameters, as shown, are num-queries
and schema
. The
num-queries
parameter simply instructs the program to generate that many
queries.
The schema
parameter is a pointer to a JSON file path that describes
meta-information about the data we're generating queries for.
A sample schema looks like this:
{
"entities": {
"customer": {
"primary_key": "id",
"properties": {
"id": {
"type": "int",
"min": 1,
"max": 1000
},
"name": {
"type": "string",
"starting_character": ["A", "B", "C"]
},
"status": {
"type": "enum",
"values": ["active", "inactive"]
}
},
"foreign_keys": {}
},
"order": {
"primary_key": "order_id",
"properties": {
"order_id": {
"type": "int",
"min": 1,
"max": 5000
},
"customer_id": {
"type": "int",
"min": 1,
"max": 1000
},
"amount": {
"type": "float",
"min": 10.0,
"max": 1000.0
},
"status": {
"type": "enum",
"values": ["pending", "completed", "cancelled"]
}
},
"foreign_keys": {
"customer_id": ["id", "customer"]
}
}
}
}
This file can be found in examples/customer/schema.json
, generate a few
queries from this schema with pqg --num-queries 100 --schema examples/customer/schema.json --verbose
.
Prior Art
This version of the Pandas Query Generator is based off of the thorough research work of previous students of COMP 400 at McGill University, namely Ege Satir, Hongxin Huo and Dailun Li.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pqg-0.1.0.tar.gz
.
File metadata
- Download URL: pqg-0.1.0.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.25
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0172193a868b1df10bdcb2662aaa3e7744a9b46c99f357cdee84d1860481eed5 |
|
MD5 | f205f1756a9fc2717e3188325588d23a |
|
BLAKE2b-256 | 8da392e2e490d9c0a43b4cc1b71b358532c086d0a854ab266a222396f571439c |
File details
Details for the file pqg-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pqg-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.25
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e421896abe07f466b58d03dce6c5384ada2a8f6f20c3d65ef9ac46b5bd7a5c6 |
|
MD5 | 15ec9af4083cb29f7c43aaaae5f48311 |
|
BLAKE2b-256 | 0e375ae9b7583e0aa16bde68265c0500a9fa449bc4d36435cde1a91c46342c6b |