Provider-aware structured-output / JSON-Schema CI linter — fail CI before your schema 400s on OpenAI, Anthropic, Gemini, Mistral, or Cohere
Project description
schemafit
Provider-aware structured-output / JSON-Schema CI linter. Catch the schema
incompatibilities that make one provider 400 while another succeeds — before
they hit production, as a fast, offline CI check.
A JSON Schema / tool definition / response_format that works on OpenAI can
400 on Anthropic or Gemini (and vice-versa): nested oneOf, a missing
additionalProperties: false, a default in a property, Anthropic-rejected
validation keywords (minLength, format, pattern, …), Gemini's lack of
anyOf/dict support. The API tells you it failed but not which constraint
violated it, so teams hand-port schemas and debug by trial-and-error at runtime.
schemafit encodes each provider's documented constraint surface as a
versioned, declarative rule pack and lints your schema statically — pointing
at the exact JSON-Pointer path, the keyword, and why — with a non-zero exit code
so CI fails the PR instead of prod.
Every rule is grounded in a real, cited provider issue (see
schemafit/rules/). It is not a runtime client: it makes no model calls, needs no API key, and has zero runtime dependencies.
Why this and not Instructor / BAML / LiteLLM / Vercel AI SDK?
Those are excellent runtime clients — they normalize, repair, or constrain a
schema at call-time. schemafit fills the gap they leave: a static,
pre-ship CI lint that fails the build before the schema ever reaches a
provider, over the raw schemas you already ship, with no DSL or codegen buy-in.
Install
# From source (works today):
pip install "git+https://github.com/OrionArchitekton/schemafit"
# or build and run the container:
docker build -t schemafit . && docker run --rm schemafit demo
Once the first release is tagged (v0.1.0), pip install schemafit (PyPI) and
docker run --rm ghcr.io/orionarchitekton/schemafit demo (GHCR) become
available — both are published by the release workflow on a v* tag (PyPI via
Trusted Publishing; image to GHCR).
Usage
# Lint one schema against several providers (exit 1 if any error):
schemafit lint my-schema.json --provider openai,anthropic,gemini,mistral,cohere
# Machine-readable output for CI annotations:
schemafit lint my-schema.json --provider anthropic --format json
# SARIF 2.1.0 for GitHub code-scanning / the Security tab:
schemafit lint my-schema.json --provider openai,anthropic --format sarif > schemafit.sarif
# Confirm against the live provider (opt-in; MOCK unless a key is in the env):
schemafit lint my-schema.json --provider openai --live-verify
# Also fail on warnings (e.g. Gemini $ref recursion risk):
schemafit lint my-schema.json --provider gemini --strict
# Emit a best-effort provider-valid variant (lossy transforms are flagged):
schemafit repair my-schema.json --provider anthropic --out fixed.json
# List supported providers / run a hermetic end-to-end proof:
schemafit providers
schemafit demo
Example:
$ schemafit lint order.json --provider anthropic
[anthropic] FAIL — 2 error(s), 0 warning(s)
ERROR #/properties/sku/pattern (anthropic-no-pattern)
Anthropic rejects the 'pattern' validation keyword (400 Bad Request).
ref: https://github.com/vercel/ai/issues/13355
ERROR #/properties/qty/minimum (anthropic-no-minimum)
Anthropic rejects the 'minimum' validation keyword (400 Bad Request).
Use in CI
GitHub Actions (this repo ships a composite action):
- uses: OrionArchitekton/schemafit@v0.1.0
with:
schema: schemas/tool.json
providers: openai,anthropic,gemini
Or directly / as a pre-commit hook (.pre-commit-hooks.yaml is included):
- repo: https://github.com/OrionArchitekton/schemafit
rev: v0.1.0
hooks:
- id: schemafit
args: ["--provider", "openai,anthropic,gemini"]
files: '^schemas/.*\.json$' # scope to YOUR LLM schemas, not every .json
Scope the hook with
files:to the directory holding your LLM schemas — the defaulttypes: [json]would otherwise lint every JSON file in the repo (package.json,tsconfig.json, lockfiles), which are not LLM schemas.
GitHub code-scanning (SARIF)
--format sarif emits SARIF 2.1.0 so lint
findings show up as annotations in the Security → Code scanning tab, with the
exact JSON-Pointer path, the rule id, and the primary-source helpUri. SARIF is
written to stdout regardless of the exit code, so code-scanning still ingests the
artifact even when the gate fails (a clean schema produces a valid run with an
empty results array, which clears stale alerts):
- run: schemafit lint schemas/*.json --provider openai,anthropic,gemini --format sarif > schemafit.sarif
continue-on-error: true # let code-scanning ingest the report
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: schemafit.sarif
Live verification (--live-verify, opt-in)
--live-verify turns "the docs forbid this" into "the provider actually
accepted/rejected it" by sending a minimal structured-output probe to each
provider and failing closed on a rejection. It is opt-in and key-gated:
- Default = MOCK — with no provider key in the environment it uses a deterministic, network-free client modeled on the static rule pack, so CI and the Docker image run it with no key and no network.
- Real call — only when the provider's key (
OPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY) is present in the environment. Real calls use the standard library (no new dependency); the[live]extra is reserved for optional provider SDKs. - Tri-state —
confirmed_by_provideristrue(accepted),false(rejected → exit 1), ornull(abstained: no key / rate-limited / network error). Abstaining is not a rejection and never fails CI.
Never commit an API key. The real path reads keys only from the environment and never echoes their value. Leave
--live-verifyout of default CI.
Supported providers (v0.3 — 5-provider matrix)
| Provider | Checks (grounded in) |
|---|---|
openai |
additionalProperties:false required; all properties required; no default; no oneOf in array items (openai-agents-python#474, claude-task-master#1522) |
anthropic |
13 rejected validation keywords on the strict structured-output surface: minLength/maxLength/pattern/format/minimum/maximum/exclusiveMinimum/exclusiveMaximum/minItems/maxItems/uniqueItems/minProperties/maxProperties (vercel/ai#13355, anthropic-sdk-python#1034). General Messages-API tool input_schema is more permissive — run this pack against schemas you send on the structured-output path. |
gemini |
Portability warnings (version-sensitive, non-failing by default): anyOf (rejected by ≤2.0 / old SDKs, supported by 2.5), oneOf, open dict (additionalProperties schema), $ref recursion. Gemini's schema support changed fast (anyOf Jan 2026, additionalProperties Nov 2025), so these warn — use --strict to gate on them. (python-genai#460, docs) |
mistral (new in v0.3) |
Strict custom structured-output conventions: additionalProperties:false required and every property listed in required. Thin pack — Mistral's docs do not enumerate a per-keyword unsupported list, so no keyword-blocklist rules are invented; these two rules are example-derived from the official sample. Notes (not lint rules): the request must use response_format: json_schema with strict:true, and all models except codestral-mamba are supported. (Mistral custom structured output) |
cohere (new in v0.3) |
Hard-error unsupported structured-output keywords from Cohere's keyword-support table: composition allOf/oneOf/not; numeric ranges minimum/maximum; array-length minItems/maxItems; string-length minLength/maxLength; uniqueItems (marked unsupported for both structured-output columns — allowed only under regular Tool Use with strict_tools=False). Supported and not flagged: anyOf, $ref/$def, enum, const, pattern. Caveat (not yet a rule): regex anchors (^, $, ?=, ?!) inside a pattern are unsupported — anchor detection needs value-inspection and is deferred to v0.4. (Cohere structured outputs) |
Exit codes
| code | meaning |
|---|---|
0 |
no errors (warnings allowed unless --strict) |
1 |
at least one error (CI fail) |
2 |
bad input (unreadable / invalid JSON) |
Scope and roadmap
In scope now: the lint + repair core, five provider rule packs
(OpenAI, Anthropic, Gemini, Mistral, Cohere), human/JSON and SARIF 2.1.0
reporters, an opt-in --live-verify confirmation mode, Docker image,
GitHub Action, pre-commit hook.
Shipped in v0.2: SARIF output for GitHub code-scanning; the --live-verify
opt-in live-confirmation mode (MOCK by default, key-gated real calls, fail-closed).
Shipped in v0.3: the Mistral and Cohere provider rule packs (provider matrix 3 → 5) — both static, network-free, and core-dependency-free.
Deferred (v0.4+): Cohere's structural rules (top-level-must-be-object; every
object ≥1 required) which need new rule kinds; automatic rule-pack drift
detection (pairs with --live-verify over the live provider, built on the
mock-client foundation); Bedrock/Vertex packs; a pydantic source-model
auto-fix mode; and an npm/ajv port plus a Zod source-model for the JS/TS
ecosystem.
License
MIT © 2026 Dan Mercede
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schemafit-0.3.0.tar.gz.
File metadata
- Download URL: schemafit-0.3.0.tar.gz
- Upload date:
- Size: 35.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a69c5f93de0d8e792df0816a68ee95a2752d45941275bf2bcb91559b9e665e8a
|
|
| MD5 |
03fa360de34f0196435f66c485816ffb
|
|
| BLAKE2b-256 |
623ddb60ee8437d97d229ed9ec75fc79c819af894b3fb00ac90954642d5ae6c6
|
Provenance
The following attestation bundles were made for schemafit-0.3.0.tar.gz:
Publisher:
release.yml on OrionArchitekton/schemafit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schemafit-0.3.0.tar.gz -
Subject digest:
a69c5f93de0d8e792df0816a68ee95a2752d45941275bf2bcb91559b9e665e8a - Sigstore transparency entry: 1901079763
- Sigstore integration time:
-
Permalink:
OrionArchitekton/schemafit@0343e9ab1d7e9822860e9af9519587af28c1d2bf -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/OrionArchitekton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0343e9ab1d7e9822860e9af9519587af28c1d2bf -
Trigger Event:
push
-
Statement type:
File details
Details for the file schemafit-0.3.0-py3-none-any.whl.
File metadata
- Download URL: schemafit-0.3.0-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f86b6abefbde5dfa613e40f9b6b675a388011bf25bcd467c0fd28729fdda4466
|
|
| MD5 |
77341ae7c8a330600bbca3b7a63d7289
|
|
| BLAKE2b-256 |
88d6651325a1a1c47bd406baf45b5d19c902d20167fdc4ee4a18037e85371812
|
Provenance
The following attestation bundles were made for schemafit-0.3.0-py3-none-any.whl:
Publisher:
release.yml on OrionArchitekton/schemafit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
schemafit-0.3.0-py3-none-any.whl -
Subject digest:
f86b6abefbde5dfa613e40f9b6b675a388011bf25bcd467c0fd28729fdda4466 - Sigstore transparency entry: 1901079927
- Sigstore integration time:
-
Permalink:
OrionArchitekton/schemafit@0343e9ab1d7e9822860e9af9519587af28c1d2bf -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/OrionArchitekton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0343e9ab1d7e9822860e9af9519587af28c1d2bf -
Trigger Event:
push
-
Statement type: