Metadata-driven execution engine for Fabric/Spark data pipelines.

Project description

weevr

Configuration-driven data shaping for Spark in Microsoft Fabric.

weevr lets you declare data transformation intent in YAML and execute it on PySpark. Define what should happen to your data — sources, transforms, joins, validations, write behavior — and let a stable engine handle how it runs. No code generation, no abstraction leaks. Just Spark DataFrame operations driven by configuration.

Installation

pip install weevr

Quick Start

Define a thread — the smallest unit of work — using a .thread file:

# dim_customer.thread
config_version: "1.0"
sources:
  raw_customers:
    type: delta
    path: "${lakehouse_path}/raw/customers"
steps:
  - filter:
      expr: "is_active = true"
  - select:
      columns: [customer_id, name, email, region]
target:
  path: "${lakehouse_path}/curated/dim_customer"
write:
  mode: overwrite

Run it from a Fabric Notebook or any PySpark environment:

from weevr import Context

ctx = Context(spark, "my-project.weevr")
result = ctx.run("dim_customer.thread")
print(result.status)  # "success"

Features

Declarative transforms — Filter, join, dedup, sort, rename, cast, derive, select, drop, aggregate, window, pivot, and union — all expressed in YAML
Flexible write modes — Overwrite, append, and merge (upsert) with configurable match, update, and delete behavior
Validation and data quality — Pre-write rules with severity routing (info, warn, error, fatal) and automatic row quarantine; post-write assertions for row counts, null checks, uniqueness, and custom expressions
DAG orchestration — Automatic dependency resolution, parallel thread execution within weaves, sequential weave ordering, configurable failure behavior, and auto-cache management
Configuration inheritance — Define patterns once at loom or weave level, cascade to threads with child-wins semantics
Variable injection — Environment-agnostic configs with parameter files and runtime overrides
Incremental processing — Watermark-based incremental loads, CDC merge routing with hard/soft delete support, and Delta Change Data Feed integration
Observability — OTel-compatible execution spans, structured JSON logging, row count reconciliation, and execution trace trees
Null-safe defaults — Opinionated join semantics and key handling that prevent common Spark pitfalls
Python API — Context class with run() for execution, load() for config inspection, and verification modes for dry-run validation

Core Abstractions

weevr organizes work through four concepts:

Thread — The smallest executable unit. Reads from one or more sources, applies transforms, validates, and writes to a single target.
Weave — A collection of threads forming a dependency DAG. Represents a subject area or processing stage. Independent threads run in parallel.
Loom — A deployable unit packaging one or more weaves with defined execution order. The primary unit of versioning and release.
Project — A logical grouping of looms. Defines the boundary for shared configuration, parameter files, helpers, and UDFs.

Core Principles

Declarative intent, imperative execution — Configuration is read at runtime and drives execution directly. No code generation from YAML.
Spark-native, Fabric-aligned — All execution uses Spark DataFrame APIs inside Fabric's runtime. No external systems or runtimes.
Deterministic and idempotent — Same configuration and inputs produce consistent behavior. Safe to rerun.
Opinionated defaults, configurable overrides — Safe defaults for null handling, join behavior, and failure semantics. Override when you need to.
Configuration reuse through inheritance — Define patterns once at higher levels and inherit down. Reduces effort, enforces standards.

Non-goals

weevr is intentionally not:

A low-code or no-code platform
A visual workflow designer
A replacement for Spark or re-implementation of data processing primitives
An abstraction layer that hides the underlying execution engine

The goal is to reduce orchestration friction and enforce repeatable patterns — not to obscure how data is processed.

Target Audience

Analysts who know SQL but not Spark
Data engineers who want config-driven consistency
Teams building medallion architectures in Fabric
Anyone seeking repeatable, governed data transformation patterns

Deployment Model

weevr's engine is a general-purpose library distributed via PyPI. It contains no project-specific configuration.

Integration projects are separate repositories containing YAML configs, Fabric Notebooks, and any project-specific UDFs or helpers. This separation lets the engine evolve independently while teams own their configuration.

Compatibility

Component	Version
Python	3.11
PySpark	3.5
Delta Lake	3.2
Microsoft Fabric Runtime	1.3

What's Next

Extensibility — Reusable stitch patterns, project-level UDF and helper registries
Developer tooling — Test framework, CLI validation, dry-run modes
Advanced merge patterns — Insert-only mode, complex update strategies

Documentation

Full documentation is available at ardent-data.github.io/weevr.

Contributing

See CONTRIBUTING.md for development setup, workflow expectations, and pull request conventions. Contributions are welcome.

License

Apache License 2.0. See LICENSE for details.

Project details

Release history Release notifications | RSS feed

1.17.0

May 1, 2026

1.16.4

Apr 21, 2026

1.16.3

Apr 11, 2026

1.16.2

Apr 8, 2026

1.16.1

Apr 8, 2026

1.16.0

Apr 7, 2026

1.15.0

Apr 7, 2026

1.14.0

Apr 7, 2026

1.13.0

Apr 4, 2026

1.12.0

Apr 3, 2026

1.11.1

Apr 2, 2026

1.11.0

Apr 2, 2026

1.10.0

Mar 28, 2026

1.9.0

Mar 26, 2026

1.8.0

Mar 24, 2026

1.7.1

Mar 15, 2026

1.7.0

Mar 15, 2026

1.6.0

Mar 14, 2026

1.5.0

Mar 13, 2026

1.4.0

Mar 13, 2026

1.3.0

Mar 10, 2026

1.2.1

Mar 5, 2026

1.2.0

Mar 3, 2026

1.1.0

Mar 2, 2026

1.0.6

Feb 28, 2026

This version

1.0.5

Feb 28, 2026

1.0.4

Feb 27, 2026

1.0.3

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weevr-1.0.5.tar.gz (75.9 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

weevr-1.0.5-py3-none-any.whl (101.6 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file weevr-1.0.5.tar.gz.

File metadata

Download URL: weevr-1.0.5.tar.gz
Upload date: Feb 28, 2026
Size: 75.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for weevr-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`188980a453e55249c977d15bd1decfc0c88dd4a9e072baa95a95be565bd69a48`
MD5	`fae7f9ea372649a0825dd703fb2186a7`
BLAKE2b-256	`a886a46966382a273dc1d483c9b966cc5af3d834eb0f036f2b5a20e6cc9f001e`

See more details on using hashes here.

File details

Details for the file weevr-1.0.5-py3-none-any.whl.

File metadata

Download URL: weevr-1.0.5-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 101.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for weevr-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ed67febfaa75f879e5226471409285e210391bdeaf3fd4c455a87aef906651c`
MD5	`f2d599bbb9f6214c792cfb0cd529edca`
BLAKE2b-256	`5609e96a7a32a8991f2d066411af706eb441f4a63d71b40f65cfc3bc323d9ce8`

See more details on using hashes here.

weevr 1.0.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

weevr

Installation

Quick Start

Features

Core Abstractions

Core Principles

Non-goals

Target Audience

Deployment Model

Compatibility

What's Next

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes