PredQL: A framework providing a predictive query language for task generation in Relational Deep Learning
Project description
PredQL
PredQL (Predictive Query Language) is a Python framework for writing compact, expressive predictive queries over relational data, especially for Relational Deep Learning.
It lets you write shorter, more expressive queries by abstracting temporal joins and complex aggregations.
🧠 Features
-
🎯 ANTLR-based Parser
- Lexer and parser for PredQL syntax
-
🌳 Structured parse-tree visitor
- Converts parsed queries into normalized dictionaries with source positions.
-
🔍 Semantic validation
- Schema-aware query validation with error reporting.
-
🔀 Two converters
- 📌
SConverterfor static prediction queries. - ⏰
TConverterfor temporal prediction queries with timestamp windows.
- 📌
-
⚙️ Dual output mode
execute=Falsereturns generated SQL.execute=Trueexecutes SQL and returns aTableobject.
⚙️ Installation
Install PredQL via pip:
pip install predql
🚀 Quickstart
1. Build your database as RelBench Database object or use simplified PredQL version
# path to classes
from predql.base import Database, Table
2. Static query with SConverter
from predql.converter import SConverter
converter = SConverter(db)
predql_query = """
PREDICT COUNT_DISTINCT(votes.*
WHERE votes.votetypeid == 2)
FOR EACH posts.* WHERE posts.PostTypeId == 1
AND posts.OwnerUserId IS NOT NULL
AND posts.OwnerUserId != -1;
"""
# SQL only
sql_query = converter.convert(predql_query, execute=False)
# execute and get Table(fk, label)
table = converter.convert(predql_query, execute=True)
3. Temporal query with TConverter
import pandas as pd
from predql.converter import TConverter
timestamps = pd.Series(...) # define timestamps for which prediction must be made
converter = TConverter(db, timestamps)
# also, it is possible to update prediction timestamps later without recreating converter
converter.set_timestamps(new_timestamps)
predql_query = """
PREDICT COUNT_DISTINCT(votes.*
WHERE votes.votetypeid == 2, 0, 91, DAYS)
FOR EACH posts.* WHERE posts.PostTypeId == 1
AND posts.OwnerUserId IS NOT NULL
AND posts.OwnerUserId != -1;
"""
# SQL only
sql_query = converter.convert(predql_query, execute=False)
# execute and get Table(fk, timestamp, label)
table = converter.convert(predql_query, execute=True)
📐 Query Language
📌 Static query design
PREDICT <aggregation | expression | table.column> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key>
[WHERE <static_condition | static_nested_expression>];
⏰ Temporal query shape
PREDICT <aggregation | temporal_expression> [RANK TOP K | CLASSIFY]
FOR EACH <entity_table>.<primary_key> [WHERE <static_condition | static_nested_expression>]
[ASSUMING <temporal_condition | temporal_nested_expression>]
[WHERE <temporal_condition | temporal_nested_expression>];
🧮 Aggregations
| Function | Meaning | Condition-Compatible |
|---|---|---|
AVG |
average | ✅ |
MAX |
maximum | ✅ |
MIN |
minimum | ✅ |
SUM |
sum | ✅ |
COUNT |
non-null count | ✅ |
COUNT_DISTINCT |
distinct count | ✅ |
FIRST |
earliest value by time | ✅ |
LAST |
latest value by time | ✅ |
LIST_DISTINCT |
list of distinct values | ❌ |
🧭 Temporal window rules
- Window format:
<start>, <end>, <measure_unit>. - Supported units:
YEARS,MONTHS,WEEKS,DAYS,HOURS,MINUTES,SECONDS. - Window semantics are half-open:
(start, end]. PREDICT/WHERE:startandendmust be non-negative.ASSUMING:startandendmust be non-positive.startmust be strictly less thanend.
🏗️ Architecture
PredQL Query String
↓
[Lexer] -> Tokens
↓
[Parser] -> Parse Tree
↓
[Visitor] -> Structured Dictionary
↓
[Validator] -> Semantic Checks
↓
[Converter] -> SQL Query
↓ (optional execute=True)
[DuckDB] -> Result Table
🔧 Development
Install uv
- macOS & Linux
wget -qO- https://astral.sh/uv/install.sh | sh
- Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Install dependencies
uv sync --all-extras
Regenerate parser files
If you modify lexer or parser grammar files (*.g4), regenerate ANTLR outputs from the repo root:
./regenerate_parser.sh
Run tests
pytest
Run linter
ruff check .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file predql-0.0.2.tar.gz.
File metadata
- Download URL: predql-0.0.2.tar.gz
- Upload date:
- Size: 78.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5db93f85bc0cb10f7c252052ac4cc822606b2f233f088dcd076b8613f4af96d8
|
|
| MD5 |
5603cb57c645c8d2455d5f5acfdb2e6f
|
|
| BLAKE2b-256 |
4ae0d4ec8df01d7955f5f6c4c6b5a696517f47344131feb81a1a0c98fd26f2dd
|
Provenance
The following attestation bundles were made for predql-0.0.2.tar.gz:
Publisher:
publish-to-pypi.yml on kolesole/PredQL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
predql-0.0.2.tar.gz -
Subject digest:
5db93f85bc0cb10f7c252052ac4cc822606b2f233f088dcd076b8613f4af96d8 - Sigstore transparency entry: 1290279787
- Sigstore integration time:
-
Permalink:
kolesole/PredQL@d3b6cf2c9dd2dff6d04922a52645d0d1959065d9 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/kolesole
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@d3b6cf2c9dd2dff6d04922a52645d0d1959065d9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file predql-0.0.2-py3-none-any.whl.
File metadata
- Download URL: predql-0.0.2-py3-none-any.whl
- Upload date:
- Size: 88.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
295fdb23782da3a0049bbd74a9ea67aaacb4a875a2636d89f78f880bfdf057ae
|
|
| MD5 |
13f2d88a8100242b111af5fe9a197180
|
|
| BLAKE2b-256 |
dc8cbdb49526c6bf16b6c228b4a160eb56c4ca0df54ca5ce944d668675a5e2e5
|
Provenance
The following attestation bundles were made for predql-0.0.2-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on kolesole/PredQL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
predql-0.0.2-py3-none-any.whl -
Subject digest:
295fdb23782da3a0049bbd74a9ea67aaacb4a875a2636d89f78f880bfdf057ae - Sigstore transparency entry: 1290279908
- Sigstore integration time:
-
Permalink:
kolesole/PredQL@d3b6cf2c9dd2dff6d04922a52645d0d1959065d9 -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/kolesole
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@d3b6cf2c9dd2dff6d04922a52645d0d1959065d9 -
Trigger Event:
push
-
Statement type: