Skip to main content

Analytics Query Analyzer

Project description

analytics-query-analyzer

Analyze analytics SQL to extract referenced tables/columns and time bounds.

Support

BigQuery and Redshift are currently supported.

Install

Base install:

pip install analytics-query-analyzer

BigQuery extras (only needed for build_schema):

pip install analytics-query-analyzer[bigquery]

Redshift extras (only needed for build_schema):

pip install analytics-query-analyzer[redshift]

Usage

analyze

Extract table/column references from a query.

Schema format follows sqlglot conventions, with nested fields represented as nested dicts.

from analytics_query_analyzer import analyze
from sqlglot import dialects

schema = {
    "production": {
        "shop": {
            "orders": {
                "id": "int64",
                "ordered_at": "datetime",
                "user_id": "int64",
                "payment": {
                    "amount": "int64",
                    "method": "string",
                },
            },
        },
    },
}

sql = """
select
    id,
    user_id,
    payment.amount
from
    shop.orders
"""

references = analyze(dialects.BigQuery, sql, schema, "production")
print(references)
# [
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "id",
#   },
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "payment.amount",
#   },
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "user_id",
#   },
# ]

analyze_timespan

Extract time bounds from filters.

from analytics_query_analyzer import analyze_timespan
from sqlglot import dialects

schema = {
    "production": {
        "shop": {
            "orders": {
                "id": "int64",
                "ordered_at": "datetime",
                "user_id": "int64",
            },
        },
    },
}

sql = """
select
    *
from
    shop.orders
where
    ordered_at >= "2025-01-01"
    and ordered_at < "2026-01-01"
"""

timespans = analyze_timespan(dialects.BigQuery, sql, schema, "production")
print(timespans)
# [
#   {
#     "database": "production",
#     "schema": "shop",
#     "table": "orders",
#     "column": "ordered_at",
#     "lower": "2025-01-01",
#     "upper": "2026-01-01",
#   }
# ]

To make current_date() deterministic, pass a provider:

timespans = analyze_timespan(
    dialects.BigQuery,
    "select * from shop.orders where ordered_at >= current_date()",
    schema,
    "production",
    current_date_provider=lambda: "2026-01-01",
)

build_schema

Fetch a schema dictionary from BigQuery or Redshift.

from analytics_query_analyzer import build_schema
from sqlglot import dialects

schema = build_schema(dialects.BigQuery, "my_project", "my_schema", "my_table")
print(schema)
  • Authentication uses Application Default Credentials (ADC).
  • When table is omitted, it scans all tables in the dataset.
  • When both dataset and table are omitted, it scans all datasets in the project.
  • The returned schema can be passed directly to analyze and analyze_timespan.

Fetching from Redshift is also supported:

from analytics_query_analyzer import build_schema
from sqlglot import dialects

schema = build_schema(dialects.Redshift, "my_database", "public", "orders")
print(schema)

Redshift authentication supports two modes:

  • Set REDSHIFT_HOST, REDSHIFT_USER, and REDSHIFT_PASSWORD (optional REDSHIFT_PORT).
  • Set REDSHIFT_CLUSTER_IDENTIFIER, REDSHIFT_REGION, and REDSHIFT_DB_USER to use IAM (you can also set REDSHIFT_HOST/REDSHIFT_PORT).
  • Use AWS_PROFILE if you want to select a non-default AWS profile.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analytics_query_analyzer-0.4.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

analytics_query_analyzer-0.4.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file analytics_query_analyzer-0.4.0.tar.gz.

File metadata

  • Download URL: analytics_query_analyzer-0.4.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for analytics_query_analyzer-0.4.0.tar.gz
Algorithm Hash digest
SHA256 001ec591bb3ac14bb67427e2e94128efca9bc1907c18d6c9845aead97c6eea47
MD5 eede39bc8412650a4ca56d963b06e902
BLAKE2b-256 0286465789f8b71f6fc6888dc95784b39f28541a9028be68ab003439bec5c577

See more details on using hashes here.

Provenance

The following attestation bundles were made for analytics_query_analyzer-0.4.0.tar.gz:

Publisher: publish-pypi.yml on logicoffee/aqa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file analytics_query_analyzer-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for analytics_query_analyzer-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 594a68942c587707c20f245e3eca6431d4f63106a49d4c8d1f439b75d9133583
MD5 180291aaf4deb8a48b1e371eb8b51054
BLAKE2b-256 9dd48e42837767db6ace1dc5f1bfda2445171720b9afeeae85125e2777fe084b

See more details on using hashes here.

Provenance

The following attestation bundles were made for analytics_query_analyzer-0.4.0-py3-none-any.whl:

Publisher: publish-pypi.yml on logicoffee/aqa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page