Skip to main content

Analytics Query Analyzer

Project description

analytics-query-analyzer

Analyze analytics SQL to extract referenced tables/columns and time bounds.

Support

BigQuery and Redshift are currently supported.

Install

Base install:

pip install analytics-query-analyzer

BigQuery extras (only needed for build_schema):

pip install analytics-query-analyzer[bigquery]

Redshift extras (only needed for build_schema):

pip install analytics-query-analyzer[redshift]

Usage

analyze

Extract table/column references from a query.

Schema format follows sqlglot conventions, with nested fields represented as nested dicts.

from analytics_query_analyzer import analyze
from sqlglot import dialects

schema = {
    "production": {
        "shop": {
            "orders": {
                "id": "int64",
                "ordered_at": "datetime",
                "user_id": "int64",
                "payment": {
                    "amount": "int64",
                    "method": "string",
                },
            },
        },
    },
}

sql = """
select
    id,
    user_id,
    payment.amount
from
    shop.orders
"""

references = analyze(dialects.BigQuery, sql, schema, "production")
print(references)
# {"production.shop.orders": {"id", "user_id", "payment.amount"}}

analyze_timespan

Extract time bounds from filters.

from analytics_query_analyzer import analyze_timespan
from sqlglot import dialects

schema = {
    "production": {
        "shop": {
            "orders": {
                "id": "int64",
                "ordered_at": "datetime",
                "user_id": "int64",
            },
        },
    },
}

sql = """
select
    *
from
    shop.orders
where
    ordered_at >= "2025-01-01"
    and ordered_at < "2026-01-01"
"""

timespans = analyze_timespan(dialects.BigQuery, sql, schema, "production")
print(timespans)
# {
#   "production.shop.orders.ordered_at": {
#     "lower": "2025-01-01",
#     "upper": "2026-01-01"
#   }
# }

To make current_date() deterministic, pass a provider:

timespans = analyze_timespan(
    dialects.BigQuery,
    "select * from shop.orders where ordered_at >= current_date()",
    schema,
    "production",
    current_date_provider=lambda: "2026-01-01",
)

build_schema

Fetch a schema dictionary from BigQuery or Redshift.

from analytics_query_analyzer import build_schema
from sqlglot import dialects

schema = build_schema(dialects.BigQuery, "my_project", "my_schema", "my_table")
print(schema)
  • Authentication uses Application Default Credentials (ADC).
  • When table is omitted, it scans all tables in the dataset.
  • When both dataset and table are omitted, it scans all datasets in the project.
  • The returned schema can be passed directly to analyze and analyze_timespan.

Fetching from Redshift is also supported:

from analytics_query_analyzer import build_schema
from sqlglot import dialects

schema = build_schema(dialects.Redshift, "my_database", "public", "orders")
print(schema)

Redshift authentication supports two modes:

  • Set REDSHIFT_HOST, REDSHIFT_USER, and REDSHIFT_PASSWORD (optional REDSHIFT_PORT).
  • Set REDSHIFT_CLUSTER_IDENTIFIER, REDSHIFT_REGION, and REDSHIFT_DB_USER to use IAM (you can also set REDSHIFT_HOST/REDSHIFT_PORT).
  • Use AWS_PROFILE if you want to select a non-default AWS profile.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analytics_query_analyzer-0.3.0.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

analytics_query_analyzer-0.3.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file analytics_query_analyzer-0.3.0.tar.gz.

File metadata

  • Download URL: analytics_query_analyzer-0.3.0.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for analytics_query_analyzer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6f46604dedb1317e7b4d77c55b00006ff7d4dce5185a7761c2369c9244fcd161
MD5 1bf6c8f79ea5ba5bbf289aa263d623bb
BLAKE2b-256 f9ba8e094b3a92de96cd9d6d48dc01b59db54cc316699cc2e73ffc51a532d4fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for analytics_query_analyzer-0.3.0.tar.gz:

Publisher: publish-pypi.yml on logicoffee/aqa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file analytics_query_analyzer-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for analytics_query_analyzer-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7b5de98e752b970ae65c09aa3b4d9fde88e74440752fb191998b26856aba79a
MD5 2309fbd5230b55294c2f7d86b84edabf
BLAKE2b-256 02725a0fe8c4ab151a923d3cd89c39770e3ef56bf19ce65655b66acb13bb74d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for analytics_query_analyzer-0.3.0-py3-none-any.whl:

Publisher: publish-pypi.yml on logicoffee/aqa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page