Skip to main content

Define typed Python entities, generate transformations, run anywhere. A dbt alternative built on Pydantic + Ibis.

Project description

Fyrnheim

Activities-first data transformation framework.

Built on Pydantic + Ibis. Define typed sources, detect business events from state changes, resolve identities across systems, and project entity models -- all in Python.

Install

pip install fyrnheim[duckdb]

Quick Start

1. Create a project:

fyr init myproject && cd myproject

2. Define your pipeline in entities/customers.py:

from fyrnheim import (
    StateSource, ActivityDefinition, RowAppeared, FieldChanged,
    IdentityGraph, IdentitySource, EntityModel, StateField,
)

# Source -- a slowly-changing state table
crm = StateSource(name="crm_contacts", project="p", dataset="raw", table="contacts", id_field="id")

# Activities -- named business events from state changes
signup = ActivityDefinition(name="signup", source="crm_contacts", trigger=RowAppeared())
became_paying = ActivityDefinition(
    name="became_paying", source="crm_contacts",
    trigger=FieldChanged(field="plan", to_values=["pro", "enterprise"]),
)

# Identity -- resolve across sources
identity = IdentityGraph(
    name="customer_identity", canonical_id="customer_id",
    sources=[IdentitySource(source="crm_contacts", id_field="id", match_key_field="email")],
)

# Entity -- derived current state
customers = EntityModel(
    name="customers", identity_graph="customer_identity",
    state_fields=[
        StateField(name="email", source="crm_contacts", field="email", strategy="latest"),
        StateField(name="plan", source="crm_contacts", field="plan", strategy="latest"),
    ],
)

3. Run tests:

pytest tests/

Core Concepts

Sources

StateSource -- a slowly-changing table (CRM contacts, subscription records). The diff engine automatically detects row appearances, disappearances, and field changes between snapshots.

StateSource(name="crm_contacts", project="p", dataset="d", table="contacts", id_field="contact_id")

EventSource -- an append-only event stream (page views, transactions).

EventSource(
    name="billing_events", project="p", dataset="d", table="transactions",
    entity_id_field="customer_id", timestamp_field="created_at", event_type_field="event_type",
)

Activity Definitions

Named business events detected from raw data changes. Each activity ties to a source and a trigger:

Trigger Detects
RowAppeared() New row in a state source
RowDisappeared() Row removed from a state source
FieldChanged(field, to_values) Field value changed (optionally to specific values)
EventOccurred(event_types) Specific event types in an event source
signup = ActivityDefinition(name="signup", source="crm_contacts", trigger=RowAppeared())
became_paying = ActivityDefinition(
    name="became_paying", source="crm_contacts",
    trigger=FieldChanged(field="plan", to_values=["pro", "enterprise"]),
)

Identity Graph

Cross-source identity resolution. Link records from different systems by a shared match key:

IdentityGraph(
    name="customer_identity",
    canonical_id="customer_id",
    sources=[
        IdentitySource(source="crm_contacts", id_field="contact_id", match_key_field="email_hash"),
        IdentitySource(source="billing_events", id_field="customer_id", match_key_field="email_hash"),
    ],
)

Entity Model

Derived current-state projection from resolved identities. Each field picks a source, a column, and a merge strategy (latest, first):

EntityModel(
    name="customers",
    identity_graph="customer_identity",
    state_fields=[
        StateField(name="email", source="crm_contacts", field="email", strategy="latest"),
        StateField(name="first_seen", source="crm_contacts", field="created_at", strategy="first"),
    ],
    computed_fields=[ComputedColumn(name="is_paying", expression="plan != 'free'")],
)

Analytics Model

Time-grain metric aggregation over the activity stream:

StreamAnalyticsModel(
    name="daily_metrics",
    identity_graph="customer_identity",
    date_grain="daily",
    metrics=[
        StreamMetric(name="new_signups", expression="count()", event_filter="signup", metric_type="count"),
        StreamMetric(name="total_customers", expression="count()", metric_type="snapshot"),
    ],
)

CLI

fyr init [project_name]           # Scaffold a new project
fyr run                           # Run the pipeline
fyr run --max-parallel-io 8       # Override worker count for I/O fan-out
fyr bench                         # Run the pipeline and print per-phase timings
fyr bench --json                  # Same, but emit PipelineTimings as JSON on stdout
fyr --version                     # Show version
fyr --help                        # Show available commands

fyr bench reports wall-clock time per phase, per source, per identity graph, and per analytics entity / metrics model (split into projection vs. write), making it easy to spot where a pipeline spends its time.

Source loads and entity/metrics writes fan out across a bounded thread pool (default 4 workers). Tune with the max_parallel_io key in fyrnheim.yaml or the --max-parallel-io CLI flag on fyr run / fyr bench. Set to 1 for strictly serial behavior.

Why Fyrnheim?

dbt Fyrnheim
Language SQL + Jinja Python
Type safety Runtime errors Pydantic validation at definition time
Local dev Requires warehouse connection DuckDB on local parquet files
Backend portability Dialect-specific SQL Ibis compiles to 15+ backends
Testing Custom schema tests pytest
Identity resolution Manual SQL joins Built-in identity graph

Status

  • Alpha -- API may change before 1.0
  • DuckDB backend -- fully supported
  • BigQuery backend -- supported
  • ClickHouse output -- supported as output sink
  • Postgres backend -- supported
  • Python 3.11+ required

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fyrnheim-0.7.0.tar.gz (66.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fyrnheim-0.7.0-py3-none-any.whl (88.9 kB view details)

Uploaded Python 3

File details

Details for the file fyrnheim-0.7.0.tar.gz.

File metadata

  • Download URL: fyrnheim-0.7.0.tar.gz
  • Upload date:
  • Size: 66.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fyrnheim-0.7.0.tar.gz
Algorithm Hash digest
SHA256 253bbf16558315e7f46c0c30f7821bcfc86b8ac8e2a07b8e3c5d6e3af2820b11
MD5 bb88d8cc254c94183f631bdd5501a616
BLAKE2b-256 a30f8fee78b6764387964d29e38339754db252b03ff1eed5a3253db6a3b7f5ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for fyrnheim-0.7.0.tar.gz:

Publisher: publish.yml on deepskydatahq/fyrnheim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fyrnheim-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: fyrnheim-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 88.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fyrnheim-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a52132ed4e37f3660b5fc6c24c3fd42614bb1b9ab86140a10437a374ffdf660
MD5 a823f43ade37c79f1855257e006936d1
BLAKE2b-256 4c65d4778131da2cc6874c70e186d2a280d7198777a87e87b02e8bf4994060b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for fyrnheim-0.7.0-py3-none-any.whl:

Publisher: publish.yml on deepskydatahq/fyrnheim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page