EthereaLogic Databricks Suite — Intelligent Data Transformation Engine

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

EthereaLogic

These details have not been verified by PyPI

Project description

AetheriaForge

Intelligent Data Transformation. Coherence-Scored. Evidence-Backed.

EthereaLogic Databricks Suite — AetheriaForge

Built by Anthony Johnson | EthereaLogic LLC

If this tool is useful to your team, consider starring the repo — it helps others in the Databricks community find it.

Every Medallion transformation introduces information loss. Most pipelines ignore it. AetheriaForge measures it by transforming source records through schema contracts, scoring the result for coherence, applying optional exact-match entity resolution and latest-wins temporal reconciliation, and recording append-only evidence. Nothing is assumed to have passed unless the artifact says so.

Executive Summary

Leadership question	Answer
What business risk does this address?	Enterprises transforming data through Bronze to Silver to Gold layers have no mathematical model governing how much information loss is acceptable at each stage, no governed entity resolution across source systems, and no auditable evidence trail for transformation decisions.
What does this application prove?	A Databricks-deployable transformation engine that scores every operation for coherence, resolves entities across multiple sources, reconciles temporal conflicts, and surfaces queryable evidence artifacts in a read-only operator dashboard.
Why does it matter?	Moving data between layers is not the hard problem. Proving that the transformation preserved what it should, resolved what it needed to, and caught what it missed — with evidence — is the problem this solves.

Key Exhibits

Exhibit 1: Forge Contract Registry

The operator dashboard loads all registered forge contracts and shows every dataset with its version, source, target, and coherence engine. Eight datasets registered — all bound to Bronze-to-Silver transformation contracts with Shannon entropy coherence scoring.

Forge Registry tab showing 8 datasets registered with contract versions, source and target locations, and coherence engine

Exhibit 2: Transformation Status with Coherence Scores

The Transformation Status tab surfaces evidence artifacts across all forge operations. Every row is one artifact: filename, dataset, verdict, coherence score, record counts, and timestamp. 1,008 artifacts shown with PASS, WARN, and FAIL verdicts — coherence scores make information loss immediately visible without opening a single file.

Transformation Status tab showing 1008 evidence artifacts with PASS, WARN, and FAIL verdicts, coherence scores, and record counts

Exhibit 3: Full Evidence Artifact Inspection

The Evidence Explorer loads a single evidence artifact by filename and renders the complete JSON payload inline. The artifact shown is a forge operation with a FAIL verdict: coherence score 0.0, with the full transformation context including dataset identity, record counts, and gate results.

Evidence Explorer showing full JSON payload for a forge FAIL artifact including coherence score and transformation context

Exhibit 4: Visual Analytics Across All Forge Operations

The Analytics tab scans the evidence directory and renders four charts: verdict distribution, coherence score distribution, daily activity volume, and coherence trend over time. Operators see transformation health at a glance across all registered datasets.

Analytics tab showing verdict distribution, coherence score distribution, daily activity volume, and coherence trend charts

The Business Problem

Enterprises operating mature Lakehouse architectures face three transformation gaps that existing tools do not address:

Coherence loss is unmeasured. Every transformation from Bronze to Silver to Gold discards, reshapes, or aggregates data. Without a coherence score, there is no way to know whether the output meets the target layer's quality threshold before writing it.
Entity resolution is ad-hoc. Multiple source systems use different identifiers for the same entities. Lookup tables and manual mappings do not scale, version, or produce evidence of match decisions.
Temporal reconciliation is invisible. CDC streams, SCD Type 2 dimensions, and batch loads create overlapping records with conflicting timestamps. Merge decisions happen silently with no audit trail.

What This Repository Contains

Surface	Purpose
`src/aetheriaforge/ingest/`	Enterprise file ingestion — CSV, Parquet, JSON, Excel, XML, ORC, Avro, and more
`src/aetheriaforge/forge/`	Coherence-scored transformation engine (Shannon entropy v1.x)
`src/aetheriaforge/resolution/`	Cross-source entity resolution using policy-driven exact matching in v1.x
`src/aetheriaforge/temporal/`	Temporal reconciliation using deterministic `latest_wins` conflict handling in v1.x
`src/aetheriaforge/schema/`	Schema-contract transformation metadata and schema enforcement with recorded contract versions
`src/aetheriaforge/evidence/`	Append-only transformation artifact writing shared across all modules
`src/aetheriaforge/orchestration/`	Workflow sequencing — runs all forge operations in order
`src/aetheriaforge/config/`	Forge contract and policy configuration
`src/aetheriaforge/integration/`	Optional DriftSentinel event emission and drift follow-up ingestion
`app/`	Databricks App (Gradio) — four-tab read-only operator dashboard
`notebooks/`	Onboarding, execution, and evidence-review notebooks for Databricks
`resources/`	Databricks Asset Bundle pipeline, job, and app resource definitions
`templates/`	Forge contract, resolution policy, and schema contract templates
`specs/`	Canonical SDLC documents governing the product
`tests/`	Pytest suite covering domain logic, packaging, and governance

Every directory above contains a README.md describing its contents, including each submodule under src/aetheriaforge/.

What This Repository Proves

Verified outcome	Evidence from this repository
Coherence scoring measures information loss at every transformation	Forge engine computes Shannon entropy before and after, producing a 0.0–1.0 coherence score per operation with configurable thresholds
Entity resolution produces governed match evidence	Resolution module matches records across source systems using configured exact key rules, policy-driven ambiguity handling, and append-only evidence
Temporal reconciliation resolves conflicts with an audit trail	Temporal reconciler applies deterministic `latest_wins` selection, records duplicate-timestamp conflicts, and writes evidence for merge decisions
Schema enforcement is contract-driven and versioned	Schema enforcer validates output against YAML contracts, records the applied contract version in evidence, and honors contract enforcement settings
Enterprise file ingestion handles heterogeneous sources	Ingest module supports 10 formats with auto-detection, producing evidence metadata for every read operation
Evidence artifacts are queryable without writing scripts	Operator dashboard surfaces all artifacts with coherence scores, verdicts, timestamps, and provenance metadata across four tabs
Databricks deployment workflow is defined for a configured workspace	Asset Bundle resources and deployment docs cover validate, deploy, and app status checks; replay requires Databricks credentials and Unity Catalog
DriftSentinel integration is standalone-safe	Event emission and drift ingestion are optional — NullEventChannel is the default, and no runtime dependency on DriftSentinel exists

Decision / KPI Contract

Business decision: is the forged output trustworthy enough for the target Medallion layer?

KPI	Meaning
`coherence_score`	Information preservation ratio for the transformation (0.0–1.0), scored against the declared schema lineage for contract-backed runs
`resolution_confidence`	Match confidence for entity resolution decisions
`temporal_conflicts`	Count of temporal merge conflicts detected
`schema_conformance`	Percentage of records conforming to the target schema contract
`transformation_verdict`	PASS / WARN / FAIL for each forge operation

Control rule: no transformation is assumed to have passed unless a PASS artifact exists in the evidence directory. FAIL artifacts carry measured values and thresholds so the platform team can triage without opening raw files.

Why This Pattern

Gap 1. Transformation quality must be measured, not assumed. Shannon entropy coherence scoring gives every operation a mathematical signal — not a boolean check, but a ratio that tells you how much declared source information the transformation preserved relative to the target contract.
Gap 2. Entity resolution and temporal reconciliation must produce evidence. Match decisions and merge outcomes are written to the same append-only evidence directory as forge artifacts. A single dashboard surfaces all of them without special-casing any module.
Gap 3. The operator dashboard must be read-only. The Gradio app exposes no write surfaces. Evidence is queried, never edited through the UI. This keeps the audit trail clean and the deployment governance simple.

How It Works

Register datasets and forge contracts. Each dataset is registered with a YAML contract specifying source location, target schema, optional resolution rules, optional temporal merge policy, and coherence thresholds. File-backed datasets use a landing path; table-backed datasets use the catalog/schema/table triplet.
Run the forge pipeline. The orchestration layer loads the source surface, performs schema-contract transformation when no forged DataFrame is supplied, applies optional schema enforcement, optional exact-match entity resolution, and optional latest_wins temporal reconciliation, then scores the result for coherence. For schema-backed runs, coherence is measured against the contract's declared source lineage so column renames, multi-source derivations, and intentional projection are scored against the intended target shape rather than raw by-name overlap. Each module writes append-only evidence artifacts to the shared evidence directory.
Inspect transformation evidence. The Forge Registry tab shows all registered datasets with contract versions and locations. The Transformation Status tab surfaces artifacts with coherence scores, verdicts, and provenance. The Evidence Explorer loads any single artifact by filename and renders the full JSON payload. The Analytics tab renders verdict distribution, coherence trends, daily volume, and health over time.
Integrate with DriftSentinel (optional). When bundled, AetheriaForge emits transformation events that DriftSentinel can consume for smarter publication gating, and receives drift payloads that produce evidence-backed follow-up actions. When standalone, the NullEventChannel silently drops events with zero overhead.

Databricks Fit

Databricks Asset Bundles for source-controlled deployment of pipeline, job, and app resource definitions — validated and deployed from the repo with a single make target.
Databricks Apps (Gradio) for a governed, read-only operator dashboard with no custom web infrastructure required.
Unity Catalog for governed table publication and the evidence volume backing the operator dashboard.
Databricks Lakeflow / Jobs for scheduled forge pipeline execution across registered datasets.
AetheriaForge is contract-driven rather than domain-hardcoded. Registered datasets can vary by schema and source system, but supported execution depends on the declared file or table format, schema contract, and the forge logic implemented in this repository.

Execution Profile

v1.x execution is pandas-based. File-backed datasets and Spark/Unity Catalog tables are materialized into pandas DataFrames on the driver before transformation, resolution, temporal reconciliation, and scoring run. Real workloads therefore need to fit driver memory or be pre-filtered and partitioned upstream; this repository does not claim cluster-distributed Spark-native execution.

Quickstart

Install via pip

The fastest way to get the AetheriaForge package into your environment:

pip install etherealogic-aetheriaforge

This installs the full AetheriaForge package — contract-driven transformation, coherence scoring, exact-match entity resolution, latest_wins temporal reconciliation, schema enforcement, evidence writing, orchestration, and bundled contract and policy templates.

For enterprise file ingestion (Excel, XML, ORC, Avro, Parquet):

pip install "etherealogic-aetheriaforge[ingest]"

Clone and develop locally

To run the full test suite or contribute:

git clone https://github.com/Org-EthereaLogic/AetheriaForge.git
cd AetheriaForge

make sync   # installs runtime + dev dependencies via uv
make test   # runs the pytest suite

Databricks Bootstrap (Recommended)

The fastest path from zero to a running AetheriaForge deployment:

# Authenticate once (OAuth U2M)
databricks auth login --host <workspace-url>

# Bootstrap: verify auth, deploy bundle, create volume, upload templates,
# start app, and trigger the forge job — all in one command.
make bootstrap CATALOG=my_catalog PROFILE=<profile>

# With the NYC taxi sample dataset:
make bootstrap CATALOG=my_catalog PROFILE=<profile> SAMPLE=nyctaxi

# With a non-default schema or runtime volume:
make bootstrap CATALOG=my_catalog PROFILE=<profile> SCHEMA=ops VOLUME=af_runtime

Requires databricks-sdk — install with uv sync --group databricks.

The bootstrap script uses Databricks unified auth and the SDK to verify catalog access, deploy the Asset Bundle (which creates the runtime volume automatically), upload contract templates to the volume, start the Databricks App, and optionally trigger the forge job. It prints the app URL, evidence path, and target table when done.

Manual Databricks Deployment

For incremental deploys or step-by-step control. Pass SCHEMA and VOLUME when you are not using the bundle defaults (default and aetheriaforge_runtime):

# Prove the catalog exists for your profile.
make bundle-catalog-check CATALOG=my_catalog PROFILE=<profile>

# Validate bundle wiring against that catalog.
make bundle-validate CATALOG=my_catalog PROFILE=<profile>

# Deploy bundle resources and start the Databricks App.
make app-deploy CATALOG=my_catalog PROFILE=<profile>

bundle validate proves bundle, auth, and resource resolution. databricks apps get is the proof surface for SUCCEEDED plus RUNNING.

Notebook Import

Import the notebooks/ directory into your Databricks workspace to run the forge pipeline from the deployed bundle or standalone from GitHub. Notebooks prefer the workspace source tree when bundle-synced under /Workspace/... and otherwise install AetheriaForge from GitHub. Real dataset execution requires:

a registered forge contract with source format and either a landing path or table name
a schema contract defining the target layer's expected columns and types
optional resolution and temporal policies for cross-source datasets
for Databricks volume-backed files, use /Volumes/... notebook paths; avoid /dbfs/Volumes/...

AI-Assisted Setup

If you use an AI coding agent (Claude Code, Cursor, GitHub Copilot Workspace, or similar), paste the prompt below directly into your agent session. The agent will clone the repository, install dependencies, run the full test suite, and walk you through the Databricks deployment — no manual steps required.

Before you start, have these ready:

Python 3.11+
uv package manager — install with pip install uv
Databricks CLI configured with a valid profile — run databricks auth login if needed
A Databricks workspace with Unity Catalog enabled
The name of your Unity Catalog catalog

Copy and paste this prompt into your AI coding agent:

I want to set up AetheriaForge — a Databricks-deployable intelligent data
transformation engine that coherence-scores every Medallion layer transformation,
resolves entities across source systems, reconciles temporal conflicts, and
surfaces queryable evidence in a four-tab operator dashboard.

Repository: https://github.com/Org-EthereaLogic/AetheriaForge

Please complete these steps in order. Stop at any failure and report it before continuing.

1. Clone the repository:
   git clone https://github.com/Org-EthereaLogic/AetheriaForge.git
   cd AetheriaForge

2. Install dependencies (requires uv):
   make sync
   If uv is not installed: pip install uv

3. Run the full test suite. The full pytest suite must pass before proceeding:
   make test

4. Read these files to understand the configuration model before deployment:
   - README.md
   - templates/forge_contract.yml
   - templates/resolution_policy.yml
   - templates/schema_contract.yml

5. Ask me for my Databricks setup details:
   - My Unity Catalog catalog name
   - My Databricks CLI profile name
   Then confirm the catalog is reachable:
   make bundle-catalog-check CATALOG=<my_catalog> PROFILE=<my_profile>

6. Validate the Asset Bundle against my workspace:
   make bundle-validate CATALOG=<my_catalog> PROFILE=<my_profile>

7. If validation passes, deploy the bundle and start the Databricks App:
   make app-deploy CATALOG=<my_catalog> PROFILE=<my_profile>

8. Verify the deployment succeeded:
   databricks apps get aetheriaforge -p <my_profile> -o json
   Confirm the status shows SUCCEEDED and RUNNING, then report the app URL.

After every step, report what happened. Do not skip a step or proceed past any
error without explaining it and asking me how to continue.

Scope Boundary

AetheriaForge validates the coherence-scored transformation model using registered datasets in a local and Databricks environment. It supports contract-driven execution for heterogeneous tabular schemas across common enterprise file formats (csv, tsv, parquet, json/jsonl, excel, xml, orc, avro, fixed-width) and Spark/Unity Catalog tables. It does not yet constitute production-scale proof across every schema shape, every data size, or every multi-workspace deployment pattern. The Databricks deployment path still requires a workspace with Unity Catalog enabled.

Engineering Signals

GitHub Actions workflow: ci.yml

Additional Documentation

Part of the EthereaLogic Databricks Suite

AetheriaForge is the second product in the EthereaLogic Databricks Suite — a portfolio of Databricks-deployable applications addressing the full lifecycle of data reliability in enterprise Lakehouse platforms.

Product	Core Job	Primary Layer
DriftSentinel	Detect drift, block bad publishes	Bronze (detection) + Silver/Gold (gating)
AetheriaForge	Transform, reconcile, forge clean data	Silver (transformation engine)
EthereaLogic Suite (bundled)	Full governed Medallion pipeline	Bronze to Silver to Gold

When both products are deployed, AetheriaForge emits transformation events that DriftSentinel consumes for smarter publication gating, and DriftSentinel can feed drift payloads back for evidence-backed follow-up actions. Each product operates independently when the other is absent.

MIT License. See LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

EthereaLogic

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.6

Apr 21, 2026

0.1.5

Apr 10, 2026

0.1.4

Apr 9, 2026

0.1.3

Apr 9, 2026

0.1.2

Apr 8, 2026

0.1.0

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etherealogic_aetheriaforge-0.1.6.tar.gz (35.8 MB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

etherealogic_aetheriaforge-0.1.6-py3-none-any.whl (59.7 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file etherealogic_aetheriaforge-0.1.6.tar.gz.

File metadata

Download URL: etherealogic_aetheriaforge-0.1.6.tar.gz
Upload date: Apr 21, 2026
Size: 35.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etherealogic_aetheriaforge-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`0aa89d80202037636379bc14903b1227323921f6745269c85a3717ca890a1e28`
MD5	`d18c95753b76cb51aa8ebf4d13f899ad`
BLAKE2b-256	`5348420bec3627274f2bac28fd0e737f8ffbf0a6c34fc16552588efe3b705191`

See more details on using hashes here.

Provenance

The following attestation bundles were made for etherealogic_aetheriaforge-0.1.6.tar.gz:

Publisher: publish.yml on Org-EthereaLogic/AetheriaForge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: etherealogic_aetheriaforge-0.1.6.tar.gz
- Subject digest: 0aa89d80202037636379bc14903b1227323921f6745269c85a3717ca890a1e28
- Sigstore transparency entry: 1353209945
- Sigstore integration time: Apr 21, 2026
Source repository:
- Permalink: Org-EthereaLogic/AetheriaForge@4e813514f6098faee841fe402f5478d091429078
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/Org-EthereaLogic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4e813514f6098faee841fe402f5478d091429078
- Trigger Event: release

File details

Details for the file etherealogic_aetheriaforge-0.1.6-py3-none-any.whl.

File metadata

Download URL: etherealogic_aetheriaforge-0.1.6-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 59.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etherealogic_aetheriaforge-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e077bb94c8bb61ce7813d400201b5a86fd5303e3f5469b3df4c6d3c8b58eac8`
MD5	`b31360349f9ec221bcd58b3b6d37ea0d`
BLAKE2b-256	`19c3294c366f146e068308f0961385f948bd64cda3487cee9e0d27d03533cb03`

See more details on using hashes here.

Provenance

The following attestation bundles were made for etherealogic_aetheriaforge-0.1.6-py3-none-any.whl:

Publisher: publish.yml on Org-EthereaLogic/AetheriaForge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: etherealogic_aetheriaforge-0.1.6-py3-none-any.whl
- Subject digest: 1e077bb94c8bb61ce7813d400201b5a86fd5303e3f5469b3df4c6d3c8b58eac8
- Sigstore transparency entry: 1353210014
- Sigstore integration time: Apr 21, 2026
Source repository:
- Permalink: Org-EthereaLogic/AetheriaForge@4e813514f6098faee841fe402f5478d091429078
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/Org-EthereaLogic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4e813514f6098faee841fe402f5478d091429078
- Trigger Event: release

etherealogic-aetheriaforge 0.1.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Intelligent Data Transformation. Coherence-Scored. Evidence-Backed.

Executive Summary

Key Exhibits

Exhibit 1: Forge Contract Registry

Exhibit 2: Transformation Status with Coherence Scores

Exhibit 3: Full Evidence Artifact Inspection

Exhibit 4: Visual Analytics Across All Forge Operations

The Business Problem

What This Repository Contains

What This Repository Proves

Decision / KPI Contract

Why This Pattern

How It Works

Databricks Fit

Execution Profile

Quickstart

Install via pip

Clone and develop locally

Databricks Bootstrap (Recommended)

Manual Databricks Deployment

Notebook Import

AI-Assisted Setup

Scope Boundary

Engineering Signals

Additional Documentation

Part of the EthereaLogic Databricks Suite

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance