Platform-neutral semantic core for contract-first data ingestion.
Project description
ContractForge
Define ingestion intent once. Run it natively anywhere.
Documentation · Quick Start · Adapters · ContractForge AI · Roadmap
ContractForge is a multi-runtime, contract-first ingestion platform. It turns governed ingestion intent into native platform execution and evidence while keeping the contract vocabulary stable across Databricks, AWS and future adapters.
The product remains ContractForge. contractforge-core,
contractforge-databricks, contractforge-aws and contractforge-ai are
functional package boundaries, not separate products.
It is built for data consultants, platform teams and engineering groups that need repeatable governed ingestion across different client runtimes without rewriting the framework for every platform.
Why ContractForge
| Capability | What it means |
|---|---|
| Contract-first ingestion | Source, target, write mode, schema policy, transforms, quality, access, operations and evidence live in reviewed YAML contracts. |
| Honest portability | The planner returns SUPPORTED, SUPPORTED_WITH_WARNINGS, REVIEW_REQUIRED or UNSUPPORTED; it does not silently downgrade semantics. |
| Native adapters | Databricks and AWS translate the same intent into native runtime behavior instead of forcing a lowest-common-denominator engine. |
| Evidence as product surface | Runs, errors, quality, quarantine, schema changes, lineage, governance actions and cost signals are tracked consistently. |
| Reusable connections | Shared connection.yaml files centralize connector defaults; ingestion contracts override only dataset-specific fields. |
| AI-assisted project design | ContractForge AI turns prompts and schemas into reviewable projects, then validates them through Core and adapter planners. |
ContractForge is not a scheduler, a dbt replacement, a closed ingestion runtime or a universal Spark wrapper. It is the semantic contract and adapter layer for repeatable governed ingestion.
How It Works
Contract YAML
-> Semantic Core
-> Capability Matcher
-> Abstract Execution Plan
-> Platform Adapter
-> Native Runtime + Evidence
The core owns portable semantics. Adapters own platform behavior. The core does not import Spark, Databricks SDK, boto3, Azure SDK, Fabric SDK or Snowflake clients.
See It In 30 Seconds
source:
type: incremental_files
path: s3://landing/orders
format: json
target:
catalog: main
schema: bronze
table: orders
mode: scd0_append
schema_policy: additive_only
quality_rules:
not_null: [order_id]
Core planning result:
SUPPORTED
The Databricks adapter may render Delta/Auto Loader/Asset Bundle artifacts. The
AWS adapter may render and deploy Glue Spark/Iceberg artifacts. Another adapter
may return SUPPORTED_WITH_WARNINGS, REVIEW_REQUIRED or UNSUPPORTED if it
cannot preserve the same semantics safely.
Status And Roadmap
| Area | Status | Notes |
|---|---|---|
| Core semantic model | Active | Contract models, semantic normalization, capability matching, abstract planning and evidence models are implemented. |
| Databricks adapter | Reference implementation | Delta, Unity Catalog, Auto Loader, Lakeflow planning, Asset Bundles, control tables, quality, governance, lineage, cost and dashboards are implemented inside the adapter boundary. |
| AWS adapter | Alpha with real E2E validation | Glue Spark/Iceberg planning, source support, quality/evidence, Lake Formation review/apply helpers, annotations, operations, S3 artifact publication, one-command Glue deployment and Glue job helper APIs are implemented. |
| ContractForge AI | Active | Deterministic review, project generation, diagnostics, provider routing and optional model-backed enrichment over the same core contract semantics. |
| Snowflake adapter | Alpha with real Snowflake validation | SQL warehouse runtime, hosted Snowpark procedure library runner, table/staged-file/SQL sources, write modes, quality, schema policy, governance, evidence, lineage and cost reconciliation are implemented and live-smoked. |
| Fabric adapter | Planned | Future adapters must depend on the core and declare platform capabilities explicitly. |
See roadmap for adapter maturity and release criteria.
Compared With Alternatives
| Alternative | Difference |
|---|---|
| dbt | dbt models data after it lands. ContractForge defines how governed data arrives, is written, validated and evidenced. |
| Airbyte/Fivetran | They provide managed ingestion runtimes. ContractForge provides the contract and lets adapters execute natively in your platform. |
| Data contract tools | Validation is one slice. ContractForge covers source, write semantics, schema policy, quality, governance, evidence and native execution artifacts. |
| Platform-specific frameworks | ContractForge keeps platform implementations in adapters so the same semantics can be evaluated for other runtimes. |
Install
From GitHub:
pip install "git+https://github.com/marquesantero/contractforge-core.git"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/databricks"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/aws"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=adapters/snowflake"
pip install "git+https://github.com/marquesantero/contractforge-core.git#subdirectory=ai"
Local development:
uv sync --all-extras
uv run pytest
Build wheels independently:
uv build --wheel
cd adapters/databricks && uv build --wheel
cd ../aws && uv build --wheel
cd ../snowflake && uv build --wheel
cd ../../ai && uv build --wheel
Release package names:
pip install contractforge-core contractforge-databricks contractforge-aws contractforge-snowflake contractforge-ai
Project Shape
A complete ContractForge project keeps runtime concerns separate from contract semantics:
project.yaml
environments/
databricks.environment.yaml
aws.environment.yaml
connections/
supabase.yaml
contracts/
bronze/
b_products/
b_products.ingestion.yaml
b_products.annotations.yaml
b_products.operations.yaml
b_products.access.yaml
Example shared connection:
source:
type: connector
connector: postgres
system: supabase
options:
url: "{{ secret:supabase/jdbc_url }}"
auth:
type: basic
username: "{{ secret:supabase/user }}"
password: "{{ secret:supabase/password }}"
read:
fetchsize: 20000
Example ingestion override:
source:
type: connection
connection_path: project://connections/supabase.yaml
table: public.products
read:
partition_column: product_id
num_partitions: 8
The core resolves the connection before adapters plan or execute. Ingestion values override global connection defaults.
Platform Adapters
| Adapter | Package | Status | Native responsibilities |
|---|---|---|---|
| Databricks | contractforge-databricks |
Reference implementation | Delta, Unity Catalog, Auto Loader, Lakeflow planning, Jobs, Asset Bundles, control tables, governance, lineage, cost and dashboards. |
| AWS | contractforge-aws |
Alpha with real E2E validation | Glue Spark, Iceberg, Glue Catalog, Lake Formation review/apply helpers, S3 artifacts, Glue jobs, Athena/Iceberg evidence and cost records. |
| Fabric | contractforge-fabric |
Planned | OneLake, Lakehouse tables, Data Pipelines, Dataflow Gen2 and Purview/Fabric metadata. |
| Snowflake | contractforge-snowflake |
Alpha with real Snowflake validation | SQL warehouse runtime, hosted Snowpark procedure library runner with staged ZIP imports, table/staged-file/SQL sources, append/overwrite/upsert/hash-diff writes, quality, schema policy, governance, evidence/control tables, lineage, cost reconciliation and project deployment. Task graph live smoke still needs task grants. See Snowflake adapter guide. |
Use the same project model for adapter deployment:
contractforge-databricks deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml --target dev
contractforge-aws deploy-project examples/real-world/supabase-jdbc-medallion/project.yaml --dry-run --summary-only
ContractForge AI
ContractForge AI is the planning and review companion. It can generate project scaffolds from prompts and schemas, validate project folders, compare adapter planning and produce clear HTML approval reports.
contractforge-ai guided-project \
--intent "Create a Supabase medallion project for AWS and Databricks daily at 6 Sao Paulo time." \
--schema schemas/products.json \
--target contractforge-yaml \
--allow-review-required \
--output-dir generated/supabase
contractforge-ai validate-project-structure generated/supabase \
--adapter databricks \
--adapter aws \
--format html > generated/supabase/project_validation.html
Model providers are optional. Deterministic validation and adapter planners remain the source of truth; providers can explain or enrich, but they cannot invent support status.
Core Planning Example
from contractforge_core.capabilities import PlatformCapabilities
from contractforge_core.contracts import semantic_contract_from_mapping, validate_contract
from contractforge_core.planner import plan_contract
contract = validate_contract(
{
"source": {"type": "incremental_files", "path": "s3://landing/orders", "format": "json"},
"target": {"catalog": "main", "schema": "bronze", "table": "orders"},
"mode": "scd0_append",
"schema_policy": "additive_only",
"quality_rules": {"not_null": ["order_id"]},
}
)
semantic = semantic_contract_from_mapping(contract)
capabilities = PlatformCapabilities(
platform="example",
supports_append=True,
supports_overwrite=True,
supports_merge=False,
evidence_stores=("audit_tables",),
)
result = plan_contract(semantic, capabilities)
print(result.status)
Package Boundaries
| Layer | Package | Responsibility |
|---|---|---|
| Semantic core | contractforge-core |
Contract models, validation, semantic normalization, capability matching, abstract plans, portability diagnostics and neutral evidence models. |
| Databricks adapter | contractforge-databricks |
Databricks capabilities, rendering, runtime execution, governance, evidence filling and deployment helpers. |
| AWS adapter | contractforge-aws |
AWS capabilities, Glue/Iceberg planning, runtime helpers, S3 publication, deployment helpers and evidence filling. |
| AI companion | contractforge-ai |
Deterministic review, project generation, diagnostics, provider routing, report generation and optional model-backed enrichment. |
Publication stays split: each package builds its own wheel and future adapters
depend explicitly on contractforge-core.
The core wheel owns only contractforge_core; adapter wheels such as contractforge-databricks own their adapter package and depend explicitly on the core.
Documentation
| Topic | Link |
|---|---|
| Online site | marquesantero.github.io/contractforge-core |
| Documentation index | docs/README.md |
| Quick start | docs/quickstart.md |
| Architecture | docs/architecture.md |
| Contracts | docs/contracts.md |
| Project YAML | docs/project-yaml.md |
| Connection YAML | docs/connection-yaml.md |
| Adapters | docs/adapters.md |
| Databricks adapter | docs/databricks.md |
| AWS adapter | docs/adapters/aws.md |
| Test contracts across adapters | docs/adapters/test-contracts-across-adapters.md |
| Connectors | docs/connectors.md |
| Operations and evidence | docs/operations.md |
| ContractForge AI | ai/README.md |
| Security | docs/security.md |
| Adapter authoring | docs/specs/adapter-authoring.md |
Architecture contracts live under docs/specs, and decisions live under docs/adrs.
Non-Goals
ContractForge is not:
- a scheduler;
- a universal Spark wrapper;
- a replacement for Databricks, Glue, Fabric, Snowflake or other runtimes;
- a promise that every contract runs everywhere;
- a dbt replacement;
- an orchestration engine;
- a GUI product in the core.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contractforge_core-0.1.0.tar.gz.
File metadata
- Download URL: contractforge_core-0.1.0.tar.gz
- Upload date:
- Size: 352.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4d7eab11e791f951579110ca069f506274fd9e00a76b596e01dab074b292c0d
|
|
| MD5 |
a2f68ac745082b5a5ff6c7dcd937f15b
|
|
| BLAKE2b-256 |
1f8713f49612f6776870f26b81c0902f5ef14d0c1eaecd5d405385d54cd42937
|
File details
Details for the file contractforge_core-0.1.0-py3-none-any.whl.
File metadata
- Download URL: contractforge_core-0.1.0-py3-none-any.whl
- Upload date:
- Size: 129.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fd57f45f951958aca54749f19d458a86ca3b15f7d5889671123437104f2792b
|
|
| MD5 |
662fcb0b9cc9c633705536f80304553c
|
|
| BLAKE2b-256 |
dc38ec0134ac1d947633e2196b2b25367709d993921c09f9553b33ed767c5489
|