Skip to main content

PredQL: A framework providing a predictive query language for task generation in Relational Deep Learning

Project description

PredQL

PredQL (Predictive Query Language) is a Python framework for writing compact, expressive predictive queries over relational data, especially for Relational Deep Learning.

It lets you write shorter, more expressive queries by abstracting temporal joins and complex aggregations.

🧠 Features

  • 🎯 ANTLR-based Parser

    • Lexer and parser for PredQL syntax
  • 🌳 Structured parse-tree visitor

    • Converts parsed queries into normalized dictionaries with source positions.
  • 🔍 Semantic validation

    • Schema-aware query validation with error reporting.
  • 🔀 Two converters

    • 📌 SConverter for static prediction queries.
    • TConverter for temporal prediction queries with timestamp windows.
  • ⚙️ Dual output mode

    • execute=False returns generated SQL.
    • execute=True executes SQL and returns a Table object.

⚙️ Installation

Install PredQL via pip:

pip install predql

🚀 Quickstart

1. Build your database as RelBench Database object or use simplified PredQL version

# path to classes
from predql.base import Database, Table

2. Static query with SConverter

from predql.converter import SConverter

converter = SConverter(db)

predql_query = """
    PREDICT COUNT_DISTINCT(votes.* 
        WHERE votes.votetypeid == 2)
    FOR EACH posts.* WHERE posts.PostTypeId == 1
                       AND posts.OwnerUserId IS NOT NULL
                       AND posts.OwnerUserId != -1;
"""

# SQL only
sql_query = converter.convert(predql_query, execute=False)

# execute and get Table(fk, label)
table = converter.convert(predql_query, execute=True)

3. Temporal query with TConverter

import pandas as pd
from predql.converter import TConverter

timestamps = pd.Series(...) # define timestamps for which prediction must be made
converter = TConverter(db, timestamps)

# also, it is possible to update prediction timestamps later without recreating converter
converter.set_timestamps(new_timestamps)

predql_query = """
    PREDICT COUNT_DISTINCT(votes.* 
        WHERE votes.votetypeid == 2, 0, 91, DAYS)
    FOR EACH posts.* WHERE posts.PostTypeId == 1
                       AND posts.OwnerUserId IS NOT NULL
                       AND posts.OwnerUserId != -1;
"""

# SQL only
sql_query = converter.convert(predql_query, execute=False)

# execute and get Table(fk, timestamp, label)
table = converter.convert(predql_query, execute=True)

📐 Query Language

📌 Static query design

PREDICT <aggregation | expression | table.column> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key>
[WHERE <static_condition | static_nested_expression>];

⏰ Temporal query shape

PREDICT <aggregation | temporal_expression> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key> [WHERE <static_condition | static_nested_expression>]
[ASSUMING <temporal_condition | temporal_nested_expression>]
[WHERE <temporal_condition | temporal_nested_expression>];

🧮 Aggregations

Function Meaning Condition-Compatible
AVG average
MAX maximum
MIN minimum
SUM sum
COUNT non-null count
COUNT_DISTINCT distinct count
FIRST earliest value by time
LAST latest value by time
LIST_DISTINCT list of distinct values

🧭 Temporal window rules

  • Window format: <start>, <end>, <measure_unit>.
  • Supported units: YEARS, MONTHS, WEEKS, DAYS, HOURS, MINUTES, SECONDS.
  • Window semantics are half-open: (start, end].
  • PREDICT/WHERE: start and end must be non-negative.
  • ASSUMING: start and end must be non-positive.
  • start must be strictly less than end.

🏗️ Architecture

PredQL Query String
    ↓
[Lexer] -> Tokens
    ↓
[Parser] -> Parse Tree
    ↓
[Visitor] -> Structured Dictionary
    ↓
[Validator] -> Semantic Checks
    ↓
[Converter] -> SQL Query
    ↓ (optional execute=True)
[DuckDB] -> Result Table

🔧 Development

Install uv

  • macOS & Linux
wget -qO- https://astral.sh/uv/install.sh | sh
  • Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install dependencies

uv sync --all-extras

Regenerate parser files

If you modify lexer or parser grammar files (*.g4), regenerate ANTLR outputs from the repo root:

./regenerate_parser.sh

Run tests

pytest

Run linter

ruff check .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

predql-0.0.2.tar.gz (78.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

predql-0.0.2-py3-none-any.whl (88.6 kB view details)

Uploaded Python 3

File details

Details for the file predql-0.0.2.tar.gz.

File metadata

  • Download URL: predql-0.0.2.tar.gz
  • Upload date:
  • Size: 78.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for predql-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5db93f85bc0cb10f7c252052ac4cc822606b2f233f088dcd076b8613f4af96d8
MD5 5603cb57c645c8d2455d5f5acfdb2e6f
BLAKE2b-256 4ae0d4ec8df01d7955f5f6c4c6b5a696517f47344131feb81a1a0c98fd26f2dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for predql-0.0.2.tar.gz:

Publisher: publish-to-pypi.yml on kolesole/PredQL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file predql-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: predql-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 88.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for predql-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 295fdb23782da3a0049bbd74a9ea67aaacb4a875a2636d89f78f880bfdf057ae
MD5 13f2d88a8100242b111af5fe9a197180
BLAKE2b-256 dc8cbdb49526c6bf16b6c228b4a160eb56c4ca0df54ca5ce944d668675a5e2e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for predql-0.0.2-py3-none-any.whl:

Publisher: publish-to-pypi.yml on kolesole/PredQL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page