Skip to main content

Utilities for SQLMesh

Project description

SQLMesh Utils

This repository contains things that are not included in SQLMesh Core for various reasons but may be useful in very specific cases.

For example, the custom materializations included here will typically do things like relax constraints / guarantees of the SQLMesh Core model kinds to work around environment-specific limitations.

Usage

Install the sqlmesh-utils library into your SQLMesh Python environment:

$ pip install sqlmesh-utils

Custom Materializations

After installing the sqlmesh-utils library above, you can reference the custom materializations via the CUSTOM model kind like so:

MODEL (
    name my_db.my_model,
    kind CUSTOM (
        materialization 'custom_materialization_name',
        materialization_properties (
            config_key = 'config_value'
        )
    )
);

Before using any of these materializations, you should take the time to understand the tradeoffs. There is generally a reason they are not in upstream SQLMesh.

Non-Idempotent Incremental By Time Range

This behaves similar to INCREMENTAL_BY_TIME_RANGE model kind in SQLMesh, but with one important difference - it loads and restates data using a MERGE statement instead of INSERT OVERWRITE or DELETE+INSERT.

The reason you might want to use it is to prevent dirty reads during data restatement on engines that do not support atomically replacing a partition of data, such as Trino on Iceberg / Delta Lake.

Due to the use of a MERGE statement, this materialization type supports upserts only. That is, if records are deleted from the source data, these deletions will not be reflected in the target table. So the downside of using this materialization is that it's possible to end up with ghost records in your target table after a restatement. For this reason, we call it "non-idempotent".

[!NOTE] Note that some engines can propagate deletes in a MERGE statement using syntax like WHEN NOT MATCHED [IN SOURCE] THEN DELETE. However, this is not part of ANSI SQL so engines like Trino and Postgres do not implement it.

Usage

This mostly follows the same usage as INCREMENTAL_BY_TIME_RANGE:

MODEL (
    name my_db.my_model,
    kind CUSTOM (
        materialization 'non_idempotent_incremental_by_time_range',
        materialization_properties (
            time_column = event_timestamp,
            primary_key = (event_id, event_source)
        )
    )
);

SELECT event_id, event_source, event_data, event_timestamp
FROM upstream.table

The properties are as follows:

time_column

This is the column in the dataset that contains the timestamp. It follows the same syntax as upstream INCREMENTAL_BY_TIME_RANGE and also the same rules with regards to respecting the project time_column_format property and being automatically added to the model partition_by field list.

primary_key

This is the column or combination of columns that uniquely identifies a record.

The columns listed here are used in the ON clause of the SQL Merge to join the source and target datasets.

Note that the time_column is not automatically injected into this list (to allow timestamps on records to be updated), so if the time_column does actually form part of the primary key in your dataset then it needs to be added here.

partition_by_time_column

By default, the time_column will get added to the list of fields in the model partitioned_by property, causing it to be included in the table partition key. This may be undesirable in some circumstances.

To opt out of this behaviour, you can set partition_by_time_column = false like so:

MODEL (
    name my_db.my_model,
    kind CUSTOM (
        materialization 'non_idempotent_incremental_by_time_range',
        materialization_properties (
            ...,
            partition_by_time_column = false
        )
    )
);

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlmesh_utils-0.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlmesh_utils-0.1.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file sqlmesh_utils-0.1.0.tar.gz.

File metadata

  • Download URL: sqlmesh_utils-0.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlmesh_utils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8e1943b5e98ec03e0d03b0002cb0403034c2b1d35b9babf613f2bcce1b13c443
MD5 71b1d9f38d5372de605ef85415fa8759
BLAKE2b-256 b24e4c13868f0c00be27fdb995c9521a2097866ac8963278bb2155d729d73cae

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlmesh_utils-0.1.0.tar.gz:

Publisher: publish.workflow.yaml on TobikoData/sqlmesh-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlmesh_utils-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sqlmesh_utils-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlmesh_utils-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ef838de7ffc22f786877ccbc5626ae7ee9342aa1d09710cb0f1723e50e31cb2b
MD5 38cba4d51ccd2db0d91bb0710626b3cb
BLAKE2b-256 61be62fb89ec767b5406b313c3935ac8b5218a20da68e1b7b9be6b299b3288ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlmesh_utils-0.1.0-py3-none-any.whl:

Publisher: publish.workflow.yaml on TobikoData/sqlmesh-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page