Skip to main content

Standardizing models

Project description

bollhav ⚽ 🌊

Model definition framework for data pipeline targets with multiple target implementations:


Installation

pip install bollhav

Model creation example

from bollhav import Model, ModelConfig, WriteMode, Database, PostgresColumn, PostgresType, TZInterval
import polars as pl

config = ModelConfig(
    name="orders",
    source_entity="raw.orders",
    table="orders",
    schema="public",
    database=Database.POSTGRES,
    columns=[
        PostgresColumn(name="id", data_type=PostgresType.BIGINT, primary_key=True, nullable=False),
        PostgresColumn(name="created_at", data_type=PostgresType.TIMESTAMPTZ, nullable=False),
        PostgresColumn(name="email", data_type=PostgresType.TEXT, nullable=True, sensitive=True),
    ],
    write_mode=WriteMode.APPEND,
    cron="0 3 * * *",
    partitioned_by="created_at",
)

def execute(interval: TZInterval) -> pl.DataFrame:
    return pl.read_database(
        f"SELECT * FROM {config.source_entity} WHERE created_at >= '{interval.since}' AND created_at < '{interval.until}'",
        connection=...,
    )

model = Model(model_config=config, execute=execute)

Parameters

Parameter Type Default Description
name str required Unique identifier for the model
source_entity str required Source table or view to read from
table str "" Destination table name
schema str "" Destination schema name
database Database None Target database. Required if columns is set
columns list[PostgresColumn | ParquetColumn] None Column definitions. Required if database is set
model_type ModelType TABLE TABLE or VIEW
write_mode WriteMode APPEND How to write data. VIEW requires ModelType.VIEW
tags set[str] None Labels for filtering
cron str None Cron expression. Automatically infers batch_size
enabled bool True Whether the model is active
debug bool False Enables debug mode
description str None Human-readable description
source_dsn str None DSN for the source connection
source_query str None Optional query to use instead of source_entity
partitioned_by str None Column name to partition by. Must exist in columns
begin datetime None Backfill start — must be UTC-aware
end datetime None Backfill end — must be UTC-aware
retries int None Retry count on failure
lookback int None Lookback window in batch units
tz_aware bool True Enforces UTC on begin/end
**kwargs Extra metadata. Callable values are resolved with non-callable kwargs as arguments

Computed attributes

Attribute Description
batch_size Inferred from cron if set, otherwise None
sensitive True if any column has sensitive=True
unique_columns Columns with unique=True — required for UPDATE_INSERT
partitioned_by_index True if partitioned_by is set

Write modes

Read more here

from bollhav import WriteMode

WriteMode.APPEND
WriteMode.OVERWRITE_INSERT  # requires partitioned_by
WriteMode.TRUNCATE_INSERT
WriteMode.UPDATE_INSERT     # requires at least one column with unique=True
WriteMode.VIEW              # requires ModelType.VIEW

UTC enforcement

When tz_aware=True (default), begin and end must be UTC-aware. Naive or non-UTC datetimes raise ValueError.

from datetime import datetime, timezone

model = Model(
    ...,
    begin=datetime(2025, 1, 1, tzinfo=timezone.utc),
    end=datetime(2025, 2, 1, tzinfo=timezone.utc),
)

model.extra # {"static": "production", "env": "env=production"}


## Batch intervals

`Model.get_batch_intervals` splits a `TZInterval` into sub-intervals driven by the model's cron expression. Useful for chunked backfills.

```python
from datetime import datetime, timezone
from bollhav.intervals import TZInterval

interval = TZInterval(
    since=datetime(2025, 1, 1, tzinfo=timezone.utc),
    until=datetime(2025, 1, 1, 3, 0, tzinfo=timezone.utc),
)

batches = model.get_batch_intervals(interval)
# With cron="0 * * * *":
# [TZInterval(00:00, 01:00), TZInterval(01:00, 02:00), TZInterval(02:00, 03:00)]

Pass cron_override to use a different cron expression without changing the model config:

batches = model.get_batch_intervals(interval, cron_override="*/15 * * * *")

Tag filtering

Tags are automatically populated at init time. By default name, schema, and "all" are added.

model = ModelConfig(name="orders", source_entity="raw.orders", schema="public")
model.tags  # {"orders", "public", "all"}

Control which tags are auto-added:

ModelConfig(..., name_add_to_tags=False, schema_add_to_tags=False, model_gets_all_tag=False)

Use match_models to discover and filter model instances from a folder by tag expression:

from bollhav.match_models import match_models

models = match_models(folder="src/models", tags="[orders|payments]")
models = match_models(folder="src/models", tags="[public&reporting]")
models = match_models(folder="src/models", tags="[public&(orders|payments)]")

Tag expression syntax

Syntax Meaning
[tag] model has tag
[a|b] model has a OR b
[a&b] model has a AND b
[a&(b|c)] model has a AND (b OR c)
[g1],[g2] matches g1 OR g2 (comma = outer OR)

Square brackets are required around every group. Only one level of parentheses is supported.


Testing

Tests use pytest. Run the full suite:

pytest tests/

Project details


Release history Release notifications | RSS feed

This version

1.6.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bollhav-1.6.2.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bollhav-1.6.2-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file bollhav-1.6.2.tar.gz.

File metadata

  • Download URL: bollhav-1.6.2.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for bollhav-1.6.2.tar.gz
Algorithm Hash digest
SHA256 7d93852e94ced807b01f5d1c0969cb7320a4d7accaf76170fff0a37a3345124b
MD5 7f1c18a9d234539aa50fcbb4821d8d85
BLAKE2b-256 3ebae135f54bc30a680ed5200f7c7100a791702b8f6edb7921bd9dd7fa481a25

See more details on using hashes here.

File details

Details for the file bollhav-1.6.2-py3-none-any.whl.

File metadata

  • Download URL: bollhav-1.6.2-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for bollhav-1.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c731b1d10e7ae81bb03e83488e51fef0f15617bf9c9ca19f6bd5b2d22be9394f
MD5 a3c7d1211a004458f46d813ed4120ec1
BLAKE2b-256 ce7a2a8492f9ed05b2566341f5c426d970f6015c855e251e680d5cd9e8cb7660

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page