Skip to main content

Standardised inputs and outputs based on contract files.

Project description

Contract Schema

Contract Schema is a lightweight Python package for schema-driven, structured inputs and outputs.

Table of Contents

Description

Contract Schema unifies disparate I/O layers under one authoritative contract. A single, versioned JSON file defines both the permissible inputs and the required outputs; the library takes care of CLI generation, default handling, deep validation, and rich, provenance-aware output documents.

  • Input parsing: Every field in the contract automatically becomes an argparse flag, a JSON key, and a CLI flag, with type enforcement, enums, and sensible defaults injected.
  • Output construction: The same contract defines the structure of your result document. A helper class injects run metadata, hashes inputs and findings for auditability, and captures execution environment details.
  • Meta-schema validation: A bundled meta-schema ensures every contract you write contains the minimal top-level keys (title, version, description, input, output) and that both input and output sections themselves declare a fields block. Contract mistakes are caught at load time.
  • No heavy dependencies: Only the Python standard library (>= 3.8) is required. pandas is optional and used only when you pass a DataFrame directly as an input.

The result is a single source of truth for I/O that works anywhere Python runs: CI pipelines, air-gapped analysis workstations, or serverless functions.

Dependencies

  • Required: Python >= 3.8

No other external packages are needed.

Installation

pip install contract-schema

Usage

Contract Schema supports two primary workflows: analytics (security/data analysis pipelines) and models (machine learning model training manifests). Both follow the same pattern:

  1. Load a contract from a bundled JSON schema
  2. Parse and validate inputs against the contract
  3. Build an output document that conforms to the contract
  4. Finalize and save the document

Quick Start

from contract_schema import Contract

# Load the analytic contract (bundled with the package)
contract = Contract.load("analytic_schema.json")

# Parse and validate inputs (from dict, JSON file, or CLI args)
inputs = contract.parse_and_validate_input({
    "start_dtg": "2025-01-01T00:00:00Z",
    "end_dtg": "2025-01-02T00:00:00Z",
    "data_source_type": "file",
    "data_source": "/path/to/data.csv",
})

# Create and populate an output document
doc = contract.create_document(
    input_schema_version=contract.version,
    output_schema_version=contract.version,
    author="Your Name",
    author_organization="Your Org",
    contact="you@example.com",
    license="MIT",
    documentation_link="https://example.com",
    status="success",
    exit_code=0,
    inputs=inputs,
    # ... additional required fields from the contract
)

# Finalize (validates output, computes hashes, captures environment)
doc.finalise()

# Save to file
doc.save("output_report.json")

Input Sources

Inputs can be provided in multiple formats:

# From a Python dict
inputs = contract.parse_and_validate_input({"key": "value"})

# From a JSON file path
inputs = contract.parse_and_validate_input("/path/to/config.json")

# From a JSON string
inputs = contract.parse_and_validate_input('{"key": "value"}')

# From CLI arguments
inputs = contract.parse_and_validate_input([
    "--start-dtg", "2025-01-01T00:00:00Z",
    "--end-dtg", "2025-01-02T00:00:00Z",
])

# From sys.argv (default when None is passed)
inputs = contract.parse_and_validate_input(None)

Bundled Contracts

The package includes two production-ready contracts:

  • analytic_schema.json - For security analytics and data analysis pipelines. Includes fields for findings, MITRE ATT&CK mappings, and observables.
  • model_schema.json - For ML model training manifests. Includes fields for metrics, hyperparameters, and model artifacts.

Both contracts share common metadata fields like execution environment, timestamps, and provenance hashes.

Creating Custom Contracts

Custom contracts must conform to the bundled meta-schema (contract_meta_schema.json), which requires:

  • title, version, description at the top level
  • input and output objects, each containing a fields object

See example_analytic.py and example_model.py for complete working examples.

Project structure

contract-schema/        # Repository root
|__ README.md            # This file
|
|__ contract_schema/     # Python package
|   |__ __init__.py
|   |__ contract.py      # High-level Contract class
|   |__ document.py      # Schema-aware Document builder
|   |__ loader.py        # JSON loader with resource fallback
|   |__ parser.py        # CLI / JSON / Mapping input parser
|   |__ validator.py     # Lightweight JSON-schema validator
|   |__ utils.py         # Shared helpers (hashing, timestamps, etc.)
|   |__ schemas/         # Bundled contracts
|       |__ analytic_schema.json
|       |__ model_schema.json
|       |__ contract_meta_schema.json
|
|__ example_analytic.py  # End-to-end demo script for the analytic contract
|__ example_model.py  # End-to-end demo script for the model contract
|
|__ tests/               # Unit tests
|   |__ analytic/
|   |__ model/
|   |__ meta/
|
|__ makefile             # Project makefile
|__ LICENSE.md           # License
|__ pyproject.toml       # Build metadata

Background and Motivation

Security analytics, ML pipelines, and data engineering jobs often reinvent the wheel for argument parsing and result emission. Over time, field names diverge, validation drifts, and downstream systems break.

Contract Schema solves this by treating the contract itself as code--version-controlled, validated, and consumed at runtime.

  • Uniformity - All tools speak the same language defined by the contract.
  • Reliability - Inputs and outputs are validated deeply; failures happen fast and loudly.
  • Traceability - Documents include run IDs, environment snapshots, and SHA-256 hashes of both inputs and outputs.
  • Extensibility - Write new contracts (e.g., for data ingestion) and they instantly get CLI generation, validation, and output helpers-no new code needed.

Contributing

Contributions are welcome from all, regardless of rank or position.

There are no system requirements for contributing to this project. To contribute via the web:

  1. Click GitLab's "Web IDE" button to open the online editor.
  2. Make your changes. Note: limit your changes to one part of one file per commit; for example, edit only the "Description" section here in the first commit, then the "Background and Motivation" section in a separate commit.
  3. Once finished, click the blue "Commit..." button.
  4. Write a detailed description of the changes you made in the "Commit Message" box.
  5. Select the "Create a new branch" radio button if you do not already have your own branch; otherwise, select your branch. The recommended naming convention for new branches is first.middle.last.
  6. Click the green "Commit" button.

You may also contribute to this project using your local machine by cloning this repository to your workstation, creating a new branch, committing and pushing your changes, and creating a merge request.

Contributors

This section lists project contributors. When you submit a merge request, remember to append your name to the bottom of the list below. You may also include a brief list of the sections to which you contributed.

  • Creator: Zachary Szewczyk

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can view the full text of the license in LICENSE.md. Read more about the license at the original author's website. Generally speaking, this license allows individuals to remix this work provided they release their adaptation under the same license and cite this project as the original, and prevents anyone from turning this work or its derivatives into a commercial product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contract_schema-1.0.1.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contract_schema-1.0.1-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file contract_schema-1.0.1.tar.gz.

File metadata

  • Download URL: contract_schema-1.0.1.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for contract_schema-1.0.1.tar.gz
Algorithm Hash digest
SHA256 08a0b7e6e36010779fb0590b2af7920eeaf11097c4fac0c5a4f1656a60da0df7
MD5 ce00c9e63b3f4df76f1cdbcffa4082e1
BLAKE2b-256 65dd2e6e1a64d96f8f2937b290c99e32dba2c8e31dce6c6a955b0da4f41d3fbf

See more details on using hashes here.

File details

Details for the file contract_schema-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for contract_schema-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4b9b3a862dc786ebf9373c301fc8990bb0463ed95480f52efe9e6a54f090d0c
MD5 a311e923b5efb91484cbce3b90dc956f
BLAKE2b-256 66ffc5b90626d89d85039781aee6c90d9af3c810f0ac40747a2f7d357f2a5c6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page