Skip to main content

Analytics Query Analyzer

Project description

analytics-query-analyzer

Analyze analytics SQL to extract referenced tables/columns and time bounds.

Support

BigQuery and Redshift are currently supported.

Install

Base install:

pip install analytics-query-analyzer

BigQuery extras (only needed for build_schema):

pip install analytics-query-analyzer[bigquery]

Redshift extras (only needed for build_schema):

pip install analytics-query-analyzer[redshift]

Usage

analyze

Extract table/column references from a query.

Schema format follows sqlglot conventions, with nested fields represented as nested dicts.

from analytics_query_analyzer import analyze
from sqlglot import dialects

schema = {
    "production": {
        "shop": {
            "orders": {
                "id": "int64",
                "ordered_at": "datetime",
                "user_id": "int64",
                "payment": {
                    "amount": "int64",
                    "method": "string",
                },
            },
        },
    },
}

sql = """
select
    id,
    user_id,
    payment.amount
from
    shop.orders
"""

references = analyze(dialects.BigQuery, sql, schema, "production")
print(references)
# [
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "id",
#   },
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "payment.amount",
#   },
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "user_id",
#   },
# ]

analyze with time bounds

Extract time bounds from filters by enabling with_timebounds.

from analytics_query_analyzer import analyze
from sqlglot import dialects

schema = {
    "production": {
        "shop": {
            "orders": {
                "id": "int64",
                "ordered_at": "datetime",
                "user_id": "int64",
            },
        },
    },
}

sql = """
select
    *
from
    shop.orders
where
    ordered_at >= "2025-01-01"
    and ordered_at < "2026-01-01"
"""

timebounds = analyze(
    dialects.BigQuery,
    sql,
    schema,
    "production",
    with_timebounds=True,
)
print(timebounds)
# [
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "id",
#     "lower": None,
#     "upper": None,
#   },
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "ordered_at",
#     "lower": "2025-01-01",
#     "upper": "2026-01-01",
#   },
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "user_id",
#     "lower": None,
#     "upper": None,
#   },
# ]

To make current_date() deterministic, pass a provider:

timebounds = analyze(
    dialects.BigQuery,
    "select * from shop.orders where ordered_at >= current_date()",
    schema,
    "production",
    with_timebounds=True,
    current_date_provider=lambda: "2026-01-01",
)

build_schema

Fetch a schema dictionary from BigQuery or Redshift.

from analytics_query_analyzer import build_schema
from sqlglot import dialects

schema = build_schema(dialects.BigQuery, "my_project", "my_schema", "my_table")
print(schema)
  • Authentication uses Application Default Credentials (ADC).
  • When table is omitted, it scans all tables in the dataset.
  • When both dataset and table are omitted, it scans all datasets in the project.
  • The returned schema can be passed directly to analyze.

Fetching from Redshift is also supported:

from analytics_query_analyzer import build_schema
from sqlglot import dialects

schema = build_schema(dialects.Redshift, "my_database", "public", "orders")
print(schema)

Redshift authentication supports two modes:

  • Set REDSHIFT_HOST, REDSHIFT_USER, and REDSHIFT_PASSWORD (optional REDSHIFT_PORT).
  • Set REDSHIFT_CLUSTER_IDENTIFIER, REDSHIFT_REGION, and REDSHIFT_DB_USER to use IAM (you can also set REDSHIFT_HOST/REDSHIFT_PORT).
  • Use AWS_PROFILE if you want to select a non-default AWS profile.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analytics_query_analyzer-0.6.0.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

analytics_query_analyzer-0.6.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file analytics_query_analyzer-0.6.0.tar.gz.

File metadata

  • Download URL: analytics_query_analyzer-0.6.0.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for analytics_query_analyzer-0.6.0.tar.gz
Algorithm Hash digest
SHA256 9715d46e035b65c5d656df7dd5ea5d9a283e66eed1b28c678b5dd1bf1328be0c
MD5 f71cdd4e2332190ee541c2103d80981b
BLAKE2b-256 a49de32c8864286bd35b8d3825c0f963b31c8ab1018dfb87437e70d0fe7a3776

See more details on using hashes here.

Provenance

The following attestation bundles were made for analytics_query_analyzer-0.6.0.tar.gz:

Publisher: publish-pypi.yml on logicoffee/aqa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file analytics_query_analyzer-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for analytics_query_analyzer-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0a9656a77b58409867be2174e2ba6edabed4c0717514d6fb418deec84d24ed2
MD5 bdac38a6b9fb84a64c35ac9b206eb70a
BLAKE2b-256 928c8268c81b234c82ae5f230c79170121b3cb22ea50bff5e37c46ae884688f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for analytics_query_analyzer-0.6.0-py3-none-any.whl:

Publisher: publish-pypi.yml on logicoffee/aqa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page