Analytics Query Analyzer
Project description
analytics-query-analyzer
Analyze analytics SQL to extract referenced tables/columns and time bounds.
Support
BigQuery and Redshift are currently supported.
Install
Base install:
pip install analytics-query-analyzer
BigQuery extras (only needed for build_schema):
pip install analytics-query-analyzer[bigquery]
Redshift extras (only needed for build_schema):
pip install analytics-query-analyzer[redshift]
Usage
analyze
Extract table/column references from a query.
Schema format follows sqlglot conventions, with nested fields represented as nested dicts.
from analytics_query_analyzer import analyze
from sqlglot import dialects
schema = {
"production": {
"shop": {
"orders": {
"id": "int64",
"ordered_at": "datetime",
"user_id": "int64",
"payment": {
"amount": "int64",
"method": "string",
},
},
},
},
}
sql = """
select
id,
user_id,
payment.amount
from
shop.orders
"""
references = analyze(dialects.BigQuery, sql, schema, "production")
print(references)
# [
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "id",
# },
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "payment.amount",
# },
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "user_id",
# },
# ]
analyze with time bounds
Extract time bounds from filters by enabling with_timebounds.
from analytics_query_analyzer import analyze
from sqlglot import dialects
schema = {
"production": {
"shop": {
"orders": {
"id": "int64",
"ordered_at": "datetime",
"user_id": "int64",
},
},
},
}
sql = """
select
*
from
shop.orders
where
ordered_at >= "2025-01-01"
and ordered_at < "2026-01-01"
"""
timebounds = analyze(
dialects.BigQuery,
sql,
schema,
"production",
with_timebounds=True,
)
print(timebounds)
# [
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "id",
# "lower": None,
# "upper": None,
# },
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "ordered_at",
# "lower": "2025-01-01",
# "upper": "2026-01-01",
# },
# {
# "database": "production",
# "schema": "shop",
# "table": "orders",
# "column": "user_id",
# "lower": None,
# "upper": None,
# },
# ]
To make current_date() deterministic, pass a provider:
timebounds = analyze(
dialects.BigQuery,
"select * from shop.orders where ordered_at >= current_date()",
schema,
"production",
with_timebounds=True,
current_date_provider=lambda: "2026-01-01",
)
build_schema
Fetch a schema dictionary from BigQuery or Redshift.
from analytics_query_analyzer import build_schema
from sqlglot import dialects
schema = build_schema(dialects.BigQuery, "my_project", "my_schema", "my_table")
print(schema)
- Authentication uses Application Default Credentials (ADC).
- When
tableis omitted, it scans all tables in the dataset. - When both
datasetandtableare omitted, it scans all datasets in the project. - The returned
schemacan be passed directly toanalyze.
Fetching from Redshift is also supported:
from analytics_query_analyzer import build_schema
from sqlglot import dialects
schema = build_schema(dialects.Redshift, "my_database", "public", "orders")
print(schema)
Redshift authentication supports two modes:
- Set
REDSHIFT_HOST,REDSHIFT_USER, andREDSHIFT_PASSWORD(optionalREDSHIFT_PORT). - Set
REDSHIFT_CLUSTER_IDENTIFIER,REDSHIFT_REGION, andREDSHIFT_DB_USERto use IAM (you can also setREDSHIFT_HOST/REDSHIFT_PORT). - Use
AWS_PROFILEif you want to select a non-default AWS profile.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file analytics_query_analyzer-0.5.0.tar.gz.
File metadata
- Download URL: analytics_query_analyzer-0.5.0.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73285a5ba7862a23c9f72c0450896cc96954a8e4aecb80562ece5a10cbbc2c2c
|
|
| MD5 |
5e13b25e8fe62c2a5ce7ae7353f98f4d
|
|
| BLAKE2b-256 |
16479711bf477b7660a423da0d6e25ede5f5cc9f1b049225e9306701502de862
|
Provenance
The following attestation bundles were made for analytics_query_analyzer-0.5.0.tar.gz:
Publisher:
publish-pypi.yml on logicoffee/aqa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
analytics_query_analyzer-0.5.0.tar.gz -
Subject digest:
73285a5ba7862a23c9f72c0450896cc96954a8e4aecb80562ece5a10cbbc2c2c - Sigstore transparency entry: 853772444
- Sigstore integration time:
-
Permalink:
logicoffee/aqa@421a7476ae4ee86671d7ec8d082507c0cebe4696 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/logicoffee
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@421a7476ae4ee86671d7ec8d082507c0cebe4696 -
Trigger Event:
release
-
Statement type:
File details
Details for the file analytics_query_analyzer-0.5.0-py3-none-any.whl.
File metadata
- Download URL: analytics_query_analyzer-0.5.0-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30fe950939846c043868e38cb545fea9687e659f5ce0def079314c92c7846ee8
|
|
| MD5 |
5500456d1d82063b4e69e02faaf05dcf
|
|
| BLAKE2b-256 |
637c59602997ed03b71b0170a7245d5fd2e744cc6408d8c0fb5977c43c5d3294
|
Provenance
The following attestation bundles were made for analytics_query_analyzer-0.5.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on logicoffee/aqa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
analytics_query_analyzer-0.5.0-py3-none-any.whl -
Subject digest:
30fe950939846c043868e38cb545fea9687e659f5ce0def079314c92c7846ee8 - Sigstore transparency entry: 853772449
- Sigstore integration time:
-
Permalink:
logicoffee/aqa@421a7476ae4ee86671d7ec8d082507c0cebe4696 -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/logicoffee
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@421a7476ae4ee86671d7ec8d082507c0cebe4696 -
Trigger Event:
release
-
Statement type: