Skip to main content

Define typed Python entities, generate transformations, run anywhere. A dbt alternative built on Pydantic + Ibis.

Project description

Fyrnheim

Define typed Python entities, generate transformations, run anywhere.

A dbt alternative built on Pydantic + Ibis.

Fyrnheim lets data teams define business entities as typed Pydantic models and automatically generates Ibis transformation code from those definitions. The same entity runs on DuckDB for instant local development and deploys to BigQuery, ClickHouse, or Postgres in production with zero changes. No SQL, no Jinja, no vendor lock-in.

Install

pip install fyrnheim[duckdb]

Quick Start

1. Create a project:

fyr init myproject && cd myproject
Created myproject/
  created  entities/
  created  data/
  created  generated/
  created  fyrnheim.yaml
  created  entities/customers.py
  created  data/customers.parquet

2. Look at the sample entity in entities/customers.py:

entity = Entity(
    name="customers",
    source=TableSource(..., duckdb_path="customers.parquet"),
    layers=LayersConfig(
        prep=PrepLayer(model_name="prep_customers", computed_columns=[
            ComputedColumn(name="email_hash", expression=hash_email("email")),
            ComputedColumn(name="amount_dollars", expression="t.amount_cents / 100.0"),
        ]),
        dimension=DimensionLayer(model_name="dim_customers", computed_columns=[
            ComputedColumn(name="is_paying", expression="t.plan != 'free'"),
        ]),
    ),
    quality=QualityConfig(checks=[NotNull("email"), Unique("email_hash")]),
)

3. Generate transformation code:

fyr generate
Generating transforms from entities
  customers            generated/customers_transforms.py   written

Generated: 1 written, 0 unchanged

4. Run the pipeline:

fyr run
Discovering entities... 1 found
Running on duckdb

  customers        prep -> dim            12 rows    0.1s  ok

Done: 1 success, 0 errors (0.2s)

Add your own entities to entities/ and data to data/. See examples/ for more.

Core Concepts

Entities

An entity is a Pydantic model describing a business object -- customers, orders, products. It declares its source, transformation layers, and quality rules in one place.

entity = Entity(
    name="customers",
    description="...",
    source=TableSource(...),
    layers=LayersConfig(prep=..., dimension=...),
    quality=QualityConfig(checks=[...]),
)

Layers

Composable transformation stages that an entity flows through:

Layer Purpose
PrepLayer Clean raw data: type casts, renames, computed columns
DimensionLayer Add business logic columns (is_paying, account_type)
SnapshotLayer Track changes over time (daily snapshots, SCD)
ActivityConfig Detect events from state changes (row_appears, status_becomes, field_changes)
AnalyticsLayer Date-grain metric aggregation (snapshot and event metrics)
layers=LayersConfig(
    prep=PrepLayer(
        model_name="prep_customers",
        computed_columns=[ComputedColumn(name="amount_dollars", expression="t.amount_cents / 100.0")],
    ),
    dimension=DimensionLayer(
        model_name="dim_customers",
        computed_columns=[ComputedColumn(name="is_paying", expression="t.plan != 'free'")],
    ),
    activity=ActivityConfig(
        model_name="activity_customers",
        entity_id_field="customer_id",
        types=[ActivityType(name="signed_up", trigger="row_appears", timestamp_field="created_at")],
    ),
    analytics=AnalyticsLayer(
        model_name="analytics_customers",
        date_expression="t.created_at.date()",
        metrics=[AnalyticsMetric(name="new_customers", expression="t.count()", metric_type="event")],
    ),
)

Source Types

Fyrnheim supports multiple source types for different data patterns:

TableSource -- read from a single warehouse table or local parquet file:

source=TableSource(
    project="myproject", dataset="raw", table="customers",
    duckdb_path="data/customers.parquet",  # local dev
)

UnionSource -- combine multiple sources into a common schema. Each sub-source can remap columns with field_mappings and inject constants with literal_columns:

source=UnionSource(
    sources=[
        TableSource(
            project="myproject", dataset="raw", table="youtube_videos",
            duckdb_path="youtube_videos/*.parquet",
            field_mappings={"video_id": "product_id"},
            literal_columns={"product_type": "video", "source_platform": "youtube"},
        ),
        TableSource(
            project="myproject", dataset="raw", table="linkedin_posts",
            duckdb_path="linkedin_posts/*.parquet",
            field_mappings={"post_id": "product_id", "text": "title"},
            literal_columns={"product_type": "post", "source_platform": "linkedin"},
        ),
    ],
)

DerivedSource -- build identity graphs by joining multiple entities on a shared key. Cascading FULL OUTER JOIN with priority-based field resolution:

source=DerivedSource(
    identity_graph="person_graph",
    identity_graph_config=IdentityGraphConfig(
        match_key="email_hash",
        sources=[
            IdentityGraphSource(name="crm", entity="crm_contacts", match_key_field="email_hash",
                                fields={"email": "email", "name": "full_name"}),
            IdentityGraphSource(name="billing", entity="transactions", match_key_field="customer_email_hash",
                                fields={"email": "customer_email", "name": "customer_name"}),
        ],
        priority=["crm", "billing"],  # CRM wins when both have a value
    ),
)

Auto-generated columns: is_{source} flags, {source}_id, first_seen_{source} dates.

AggregationSource -- aggregate from another entity with GROUP BY and Ibis expressions:

source=AggregationSource(
    source_entity="person",
    group_by_column="account_id",
    filter_expression="t.account_id.notnull()",
    aggregations=[
        ComputedColumn(name="num_persons", expression="t.person_id.nunique()"),
        ComputedColumn(name="first_seen", expression="t.created_at.min()"),
    ],
)

EventAggregationSource -- aggregate raw event streams (reads from a table, groups by a key):

source=EventAggregationSource(
    project="myproject", dataset="raw", table="page_views",
    duckdb_path="page_views/*.parquet",
    group_by_column="user_id",
)

SourceMapping

Decouple entity field names from source column names. Define a contract of required_fields on the entity, then map source columns to those fields:

entity = Entity(
    name="transactions",
    description="Customer transactions",
    required_fields=[
        Field(name="transaction_id", type="STRING"),
        Field(name="amount_cents", type="INT64"),
    ],
    source=TableSource(project="p", dataset="d", table="orders", duckdb_path="orders/*.parquet"),
    layers=LayersConfig(prep=PrepLayer(model_name="prep_transactions")),
)

source_mapping = SourceMapping(
    entity=entity,
    source=entity.source,
    field_mappings={"transaction_id": "id", "amount_cents": "subtotal"},
)

Validates that all required fields have mappings at definition time.

Multi-Entity Dependency Resolution

When entities depend on each other (DerivedSource, AggregationSource), fyr run automatically resolves the execution order using topological sort. Dependencies run first:

transactions, subscriptions   (TableSource -- no dependencies)
         |           |
         v           v
        person               (DerivedSource -- identity graph joins transactions + subscriptions)
           |
           v
        account              (AggregationSource -- groups person by account_id)

No manual ordering needed. Define your entities and Fyrnheim figures out the DAG.

Primitives

Reusable Python functions that replace SQL snippets. Hashing, date operations, categorization -- import and compose them instead of copy-pasting SQL.

from fyrnheim.primitives import hash_email, date_trunc_month

ComputedColumn(name="email_hash", expression=hash_email("email"))
ComputedColumn(name="signup_month", expression=date_trunc_month("created_at"))

Components

Multi-column patterns that generate related fields from a single config. LifecycleFlags produces is_active, is_churned, is_at_risk from a status column. TimeBasedMetrics computes tenure and recency.

from fyrnheim import LifecycleFlags

flags = LifecycleFlags(
    status_column="status",
    active_states=["active"],
    churned_states=["cancelled"],
)

Quality Checks

Declarative data quality rules that run after transformations. Built-in checks include NotNull, Unique, InRange, InSet, MatchesPattern, and ForeignKey.

quality=QualityConfig(
    primary_key="email_hash",
    checks=[
        NotNull("email"),
        Unique("email_hash"),
        InRange("amount_cents", min=0),
    ],
)

Project Configuration

Configure your project with fyrnheim.yaml at the project root:

entities_dir: entities
data_dir: data
output_dir: generated
backend: duckdb
backend_config:
  db_path: my_project.duckdb

# Push results to a separate output backend after fyr run
output_backend: clickhouse
output_config:
  host: localhost
  port: "8123"
  database: default
  user: default
  password: ""

All settings can be overridden via CLI flags. fyr run --backend bigquery runs on BigQuery regardless of what fyrnheim.yaml says.

Production Deployment

A typical production pattern:

  1. Extract raw data with DLT (or any EL tool) into parquet files or a warehouse
  2. Transform with Fyrnheim: fyr run --backend bigquery (or duckdb for local)
  3. Push results to an output backend for serving (ClickHouse, Postgres, etc.)

Configure the output backend in fyrnheim.yaml:

backend: duckdb           # transform backend
output_backend: clickhouse # push dim/analytics tables here after run
output_config:
  host: ch.example.com
  port: "8123"
  database: analytics

This separation lets you develop locally on DuckDB while pushing production results to a fast query engine.

Why Fyrnheim?

dbt Fyrnheim
Language SQL + Jinja Python
Type safety Runtime errors Pydantic validation at definition time
Local dev Requires warehouse connection DuckDB on local parquet files
Backend portability Dialect-specific SQL Ibis compiles to 15+ backends
Testing Custom schema tests pytest + quality checks
Boilerplate Jinja macros, YAML configs Python functions, Pydantic models
Identity resolution Manual SQL joins Built-in identity graph (DerivedSource)
Multi-source union Manual UNION ALL UnionSource with field mapping

Fyrnheim is not an orchestrator, not an extraction tool, and not a BI layer. It handles the transformation step: raw data in, clean business entities out.

Status

  • Alpha -- API may change before 1.0
  • DuckDB backend -- fully supported
  • BigQuery backend -- supported
  • ClickHouse output -- supported as output sink
  • Python 3.11+ required

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fyrnheim-0.1.0.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fyrnheim-0.1.0-py3-none-any.whl (65.7 kB view details)

Uploaded Python 3

File details

Details for the file fyrnheim-0.1.0.tar.gz.

File metadata

  • Download URL: fyrnheim-0.1.0.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fyrnheim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7981189b7300222a150da66b948b0d0ed6165e76c70519861eca02e41448ca06
MD5 dbee075b874251a17645ae0d77e34862
BLAKE2b-256 4731006785aa4c4c3473104c5f94ae848a7b82cd18232414d4532260469b5e49

See more details on using hashes here.

Provenance

The following attestation bundles were made for fyrnheim-0.1.0.tar.gz:

Publisher: publish.yml on deepskydatahq/fyrnheim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fyrnheim-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fyrnheim-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 65.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fyrnheim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ed17f869c54b4ff44199b2772548330cb1431e553d06fc1ed6186437929fc25
MD5 d5b69cd7480ffd6bbf0d4ec5db378819
BLAKE2b-256 4eb915937f333e0ceabb538f9c541ea85a11bc170dca2ed3235a0df68d3bff46

See more details on using hashes here.

Provenance

The following attestation bundles were made for fyrnheim-0.1.0-py3-none-any.whl:

Publisher: publish.yml on deepskydatahq/fyrnheim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page