Skip to main content

Define typed Python entities, generate transformations, run anywhere. A dbt alternative built on Pydantic + Ibis.

Project description

Fyrnheim

Activities-first data transformation framework.

Built on Pydantic + Ibis. Define typed sources, detect business events from state changes, resolve identities across systems, and project entity models -- all in Python.

Install

pip install fyrnheim[duckdb]

Quick Start

1. Create a project:

fyr init myproject && cd myproject

2. Define your pipeline in entities/customers.py:

from fyrnheim import (
    StateSource, ActivityDefinition, RowAppeared, FieldChanged,
    IdentityGraph, IdentitySource, EntityModel, StateField,
)

# Source -- a slowly-changing state table
crm = StateSource(name="crm_contacts", project="p", dataset="raw", table="contacts", id_field="id")

# Activities -- named business events from state changes
signup = ActivityDefinition(name="signup", source="crm_contacts", trigger=RowAppeared())
became_paying = ActivityDefinition(
    name="became_paying", source="crm_contacts",
    trigger=FieldChanged(field="plan", to_values=["pro", "enterprise"]),
)

# Identity -- resolve across sources
identity = IdentityGraph(
    name="customer_identity", canonical_id="customer_id",
    sources=[IdentitySource(source="crm_contacts", id_field="id", match_key_field="email")],
)

# Entity -- derived current state
customers = EntityModel(
    name="customers", identity_graph="customer_identity",
    state_fields=[
        StateField(name="email", source="crm_contacts", field="email", strategy="latest"),
        StateField(name="plan", source="crm_contacts", field="plan", strategy="latest"),
    ],
)

3. Run tests:

pytest tests/

Core Concepts

Sources

StateSource -- a slowly-changing table (CRM contacts, subscription records). The diff engine automatically detects row appearances, disappearances, and field changes between snapshots.

StateSource(name="crm_contacts", project="p", dataset="d", table="contacts", id_field="contact_id")

EventSource -- an append-only event stream (page views, transactions).

EventSource(
    name="billing_events", project="p", dataset="d", table="transactions",
    entity_id_field="customer_id", timestamp_field="created_at", event_type_field="event_type",
)

Activity Definitions

Named business events detected from raw data changes. Each activity ties to a source and a trigger:

Trigger Detects
RowAppeared() New row in a state source
RowDisappeared() Row removed from a state source
FieldChanged(field, to_values) Field value changed (optionally to specific values)
EventOccurred(event_types) Specific event types in an event source
signup = ActivityDefinition(name="signup", source="crm_contacts", trigger=RowAppeared())
became_paying = ActivityDefinition(
    name="became_paying", source="crm_contacts",
    trigger=FieldChanged(field="plan", to_values=["pro", "enterprise"]),
)

Identity Graph

Cross-source identity resolution. Link records from different systems by a shared match key:

IdentityGraph(
    name="customer_identity",
    canonical_id="customer_id",
    sources=[
        IdentitySource(source="crm_contacts", id_field="contact_id", match_key_field="email_hash"),
        IdentitySource(source="billing_events", id_field="customer_id", match_key_field="email_hash"),
    ],
)

Entity Model

Derived current-state projection from resolved identities. Each field picks a source, a column, and a merge strategy (latest, first):

EntityModel(
    name="customers",
    identity_graph="customer_identity",
    state_fields=[
        StateField(name="email", source="crm_contacts", field="email", strategy="latest"),
        StateField(name="first_seen", source="crm_contacts", field="created_at", strategy="first"),
    ],
    computed_fields=[ComputedColumn(name="is_paying", expression="plan != 'free'")],
)

Analytics Model

Time-grain metric aggregation over the activity stream:

StreamAnalyticsModel(
    name="daily_metrics",
    identity_graph="customer_identity",
    date_grain="daily",
    metrics=[
        StreamMetric(name="new_signups", expression="count()", event_filter="signup", metric_type="count"),
        StreamMetric(name="total_customers", expression="count()", metric_type="snapshot"),
    ],
)

CLI

fyr init [project_name]           # Scaffold a new project
fyr run                           # Run the pipeline
fyr run --max-parallel-io 8       # Override worker count for I/O fan-out
fyr bench                         # Run the pipeline and print per-phase timings
fyr bench --json                  # Same, but emit PipelineTimings as JSON on stdout
fyr --version                     # Show version
fyr --help                        # Show available commands

fyr bench reports wall-clock time per phase, per source, per identity graph, and per analytics entity / metrics model (split into projection vs. write), making it easy to spot where a pipeline spends its time.

Source loads and entity/metrics writes fan out across a bounded thread pool (default 4 workers). Tune with the max_parallel_io key in fyrnheim.yaml or the --max-parallel-io CLI flag on fyr run / fyr bench. Set to 1 for strictly serial behavior.

Why Fyrnheim?

dbt Fyrnheim
Language SQL + Jinja Python
Type safety Runtime errors Pydantic validation at definition time
Local dev Requires warehouse connection DuckDB on local parquet files
Backend portability Dialect-specific SQL Ibis compiles to 15+ backends
Testing Custom schema tests pytest
Identity resolution Manual SQL joins Built-in identity graph

Status

  • Alpha -- API may change before 1.0
  • DuckDB backend -- fully supported
  • BigQuery backend -- supported
  • ClickHouse output -- supported as output sink
  • Postgres backend -- supported
  • Python 3.11+ required

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fyrnheim-0.8.0.tar.gz (70.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fyrnheim-0.8.0-py3-none-any.whl (92.4 kB view details)

Uploaded Python 3

File details

Details for the file fyrnheim-0.8.0.tar.gz.

File metadata

  • Download URL: fyrnheim-0.8.0.tar.gz
  • Upload date:
  • Size: 70.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fyrnheim-0.8.0.tar.gz
Algorithm Hash digest
SHA256 dada90f4bffec2dc9337a63cc0427123de69c8a0f4852887ef09d72d887d8267
MD5 c260d463d294d1b0f9171a2afb89156d
BLAKE2b-256 5620a9483d982f5797758e5b567d931fd716161adbce8127fc0d83546f0e9df4

See more details on using hashes here.

Provenance

The following attestation bundles were made for fyrnheim-0.8.0.tar.gz:

Publisher: publish.yml on deepskydatahq/fyrnheim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fyrnheim-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: fyrnheim-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 92.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fyrnheim-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e9ab9757a3a505389c0054463042ee1b3aa1b5776b87add62555486e025fc7a
MD5 066184729d4bdc743f84ea6ea43b662b
BLAKE2b-256 eb6468ae52f6715417c6f979a9f5aabdfbdf0f3f61af56c2b7c1b2fd8ffe886e

See more details on using hashes here.

Provenance

The following attestation bundles were made for fyrnheim-0.8.0-py3-none-any.whl:

Publisher: publish.yml on deepskydatahq/fyrnheim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page