Skip to main content

Platform-neutral semantic core for contract-first data ingestion.

Project description

ContractForge

ContractForge

Define ingestion intent once. Run it natively anywhere.

CI Product Documentation Core Databricks adapter AWS adapter ContractForge AI Python License

Documentation · Quick Start · Adapters · ContractForge AI · Roadmap

ContractForge is a multi-runtime, contract-first ingestion platform. It turns governed ingestion intent into native platform execution and evidence while keeping the contract vocabulary stable across Databricks, AWS and future adapters.

The product remains ContractForge. contractforge-core, contractforge-databricks, contractforge-aws and contractforge-ai are functional package boundaries, not separate products.

It is built for data consultants, platform teams and engineering groups that need repeatable governed ingestion across different client runtimes without rewriting the framework for every platform.

ContractForge flow from contract to semantic core, capability matcher, platform adapter and native artifacts

Why ContractForge

Capability What it means
Contract-first ingestion Source, target, write mode, schema policy, transforms, quality, access, operations and evidence live in reviewed YAML contracts.
Honest portability The planner returns SUPPORTED, SUPPORTED_WITH_WARNINGS, REVIEW_REQUIRED or UNSUPPORTED; it does not silently downgrade semantics.
Native adapters Databricks and AWS translate the same intent into native runtime behavior instead of forcing a lowest-common-denominator engine.
Evidence as product surface Runs, errors, quality, quarantine, schema changes, lineage, governance actions and cost signals are tracked consistently.
Reusable connections Shared connection.yaml files centralize connector defaults; ingestion contracts override only dataset-specific fields.
AI-assisted project design ContractForge AI turns prompts and schemas into reviewable projects, then validates them through Core and adapter planners.

ContractForge is not a scheduler, a dbt replacement, a closed ingestion runtime or a universal Spark wrapper. It is the semantic contract and adapter layer for repeatable governed ingestion.

How It Works

Contract YAML
  -> Semantic Core
  -> Capability Matcher
  -> Abstract Execution Plan
  -> Platform Adapter
  -> Native Runtime + Evidence

The core owns portable semantics. Adapters own platform behavior. The core does not import Spark, Databricks SDK, boto3, Azure SDK, Fabric SDK or Snowflake clients.

See It In 30 Seconds

source:
  type: incremental_files
  path: s3://landing/orders
  format: json

target:
  catalog: main
  schema: bronze
  table: orders

mode: scd0_append
schema_policy: additive_only
quality_rules:
  not_null: [order_id]

Core planning result:

SUPPORTED

The Databricks adapter may render Delta/Auto Loader/Asset Bundle artifacts. The AWS adapter may render and deploy Glue Spark/Iceberg artifacts. Another adapter may return SUPPORTED_WITH_WARNINGS, REVIEW_REQUIRED or UNSUPPORTED if it cannot preserve the same semantics safely.

Status And Roadmap

Area Status Notes
Core semantic model Active Contract models, semantic normalization, capability matching, abstract planning and evidence models are implemented.
Databricks adapter Reference implementation Delta, Unity Catalog, Auto Loader, Lakeflow planning, Asset Bundles, control tables, quality, governance, lineage, cost and dashboards are implemented inside the adapter boundary.
AWS adapter Alpha with real E2E validation Glue Spark/Iceberg planning, source support, quality/evidence, Lake Formation review/apply helpers, annotations, operations, S3 artifact publication, one-command Glue deployment and Glue job helper APIs are implemented.
ContractForge AI Active Deterministic review, project generation, diagnostics, provider routing and optional model-backed enrichment over the same core contract semantics.
Snowflake adapter Alpha with real Snowflake validation SQL warehouse runtime, hosted Snowpark procedure library runner, table/staged-file/SQL sources, write modes, quality, schema policy, governance, evidence, lineage and cost reconciliation are implemented and live-smoked.
Fabric adapter Planned Future adapters must depend on the core and declare platform capabilities explicitly.

See roadmap for adapter maturity and release criteria.

Compared With Alternatives

Alternative Difference
dbt dbt models data after it lands. ContractForge defines how governed data arrives, is written, validated and evidenced.
Airbyte/Fivetran They provide managed ingestion runtimes. ContractForge provides the contract and lets adapters execute natively in your platform.
Data contract tools Validation is one slice. ContractForge covers source, write semantics, schema policy, quality, governance, evidence and native execution artifacts.
Platform-specific frameworks ContractForge keeps platform implementations in adapters so the same semantics can be evaluated for other runtimes.

Install

From GitHub:

pip install "git+https://github.com/marquesantero/contractforge-core.git"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/databricks"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/aws"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/snowflake"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=ai"

Local development:

uv sync --all-extras
uv run pytest

Build wheels independently:

uv build --wheel
cd adapters/databricks && uv build --wheel
cd ../aws && uv build --wheel
cd ../snowflake && uv build --wheel
cd ../../ai && uv build --wheel

Release package names:

pip install contractforge-core contractforge-databricks contractforge-aws contractforge-snowflake contractforge-ai

Project Shape

A complete ContractForge project keeps runtime concerns separate from contract semantics:

project.yaml
environments/
  databricks.environment.yaml
  aws.environment.yaml
connections/
  supabase.yaml
contracts/
  bronze/
    b_products/
      b_products.ingestion.yaml
      b_products.annotations.yaml
      b_products.operations.yaml
      b_products.access.yaml

Example shared connection:

source:
  type: connector
  connector: postgres
  system: supabase
  options:
    url: "{{ secret:supabase/jdbc_url }}"
auth:
  type: basic
  username: "{{ secret:supabase/user }}"
  password: "{{ secret:supabase/password }}"
read:
  fetchsize: 20000

Example ingestion override:

source:
  type: connection
  connection_path: project://connections/supabase.yaml
  table: public.products
  read:
    partition_column: product_id
    num_partitions: 8

The core resolves the connection before adapters plan or execute. Ingestion values override global connection defaults.

Platform Adapters

Adapter Package Status Native responsibilities
Databricks contractforge-databricks Reference implementation Delta, Unity Catalog, Auto Loader, Lakeflow planning, Jobs, Asset Bundles, control tables, governance, lineage, cost and dashboards.
AWS contractforge-aws Alpha with real E2E validation Glue Spark, Iceberg, Glue Catalog, Lake Formation review/apply helpers, S3 artifacts, Glue jobs, Athena/Iceberg evidence and cost records.
Fabric contractforge-fabric Planned OneLake, Lakehouse tables, Data Pipelines, Dataflow Gen2 and Purview/Fabric metadata.
Snowflake contractforge-snowflake Alpha with real Snowflake validation SQL warehouse runtime, hosted Snowpark procedure library runner with staged ZIP imports, table/staged-file/SQL sources, append/overwrite/upsert/hash-diff writes, quality, schema policy, governance, evidence/control tables, lineage, cost reconciliation and project deployment. Task graph live smoke still needs task grants. See Snowflake adapter guide.

Use the same project model for adapter deployment:

contractforge-databricks deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml --target dev
contractforge-aws deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml --dry-run --summary-only

ContractForge AI

ContractForge AI is the planning and review companion. It can generate project scaffolds from prompts and schemas, validate project folders, compare adapter planning and produce clear HTML approval reports.

contractforge-ai guided-project \
  --intent "Create a Supabase medallion project for AWS and Databricks daily at 6 Sao Paulo time." \
  --schema schemas/products.json \
  --target contractforge-yaml \
  --allow-review-required \
  --output-dir generated/supabase

contractforge-ai validate-project-structure generated/supabase \
  --adapter databricks \
  --adapter aws \
  --format html > generated/supabase/project_validation.html

Model providers are optional. Deterministic validation and adapter planners remain the source of truth; providers can explain or enrich, but they cannot invent support status.

Core Planning Example

from contractforge_core.capabilities import PlatformCapabilities
from contractforge_core.contracts import semantic_contract_from_mapping, validate_contract
from contractforge_core.planner import plan_contract

contract = validate_contract(
    {
        "source": {"type": "incremental_files", "path": "s3://landing/orders", "format": "json"},
        "target": {"catalog": "main", "schema": "bronze", "table": "orders"},
        "mode": "scd0_append",
        "schema_policy": "additive_only",
        "quality_rules": {"not_null": ["order_id"]},
    }
)

semantic = semantic_contract_from_mapping(contract)
capabilities = PlatformCapabilities(
    platform="example",
    supports_append=True,
    supports_overwrite=True,
    supports_merge=False,
    evidence_stores=("audit_tables",),
)

result = plan_contract(semantic, capabilities)
print(result.status)

Package Boundaries

Layer Package Responsibility
Semantic core contractforge-core Contract models, validation, semantic normalization, capability matching, abstract plans, portability diagnostics and neutral evidence models.
Databricks adapter contractforge-databricks Databricks capabilities, rendering, runtime execution, governance, evidence filling and deployment helpers.
AWS adapter contractforge-aws AWS capabilities, Glue/Iceberg planning, runtime helpers, S3 publication, deployment helpers and evidence filling.
AI companion contractforge-ai Deterministic review, project generation, diagnostics, provider routing, report generation and optional model-backed enrichment.

Publication stays split: each package builds its own wheel and future adapters depend explicitly on contractforge-core.

The core wheel owns only contractforge_core; adapter wheels such as contractforge-databricks own their adapter package and depend explicitly on the core.

See publication packaging.

Documentation

Topic Link
Online site marquesantero.github.io/contractforge-core
Documentation index docs/README.md
Quick start docs/quickstart.md
Architecture docs/architecture.md
Contracts docs/contracts.md
Project YAML docs/project-yaml.md
Connection YAML docs/connection-yaml.md
Adapters docs/adapters.md
Databricks adapter docs/databricks.md
AWS adapter docs/adapters/aws.md
Test contracts across adapters docs/adapters/test-contracts-across-adapters.md
Connectors docs/connectors.md
Operations and evidence docs/operations.md
ContractForge AI ai/README.md
Security docs/security.md
Adapter authoring docs/specs/adapter-authoring.md

Architecture contracts live under docs/specs, and decisions live under docs/adrs.

Non-Goals

ContractForge is not:

  • a scheduler;
  • a universal Spark wrapper;
  • a replacement for Databricks, Glue, Fabric, Snowflake or other runtimes;
  • a promise that every contract runs everywhere;
  • a dbt replacement;
  • an orchestration engine;
  • a GUI product in the core.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contractforge_core-0.1.0.tar.gz (352.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contractforge_core-0.1.0-py3-none-any.whl (129.5 kB view details)

Uploaded Python 3

File details

Details for the file contractforge_core-0.1.0.tar.gz.

File metadata

  • Download URL: contractforge_core-0.1.0.tar.gz
  • Upload date:
  • Size: 352.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contractforge_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c4d7eab11e791f951579110ca069f506274fd9e00a76b596e01dab074b292c0d
MD5 a2f68ac745082b5a5ff6c7dcd937f15b
BLAKE2b-256 1f8713f49612f6776870f26b81c0902f5ef14d0c1eaecd5d405385d54cd42937

See more details on using hashes here.

File details

Details for the file contractforge_core-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for contractforge_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2fd57f45f951958aca54749f19d458a86ca3b15f7d5889671123437104f2792b
MD5 662fcb0b9cc9c633705536f80304553c
BLAKE2b-256 dc38ec0134ac1d947633e2196b2b25367709d993921c09f9553b33ed767c5489

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page