Skip to main content

Python SDK and AI agent toolkit for the OpenDataProducts.org standards family, supporting ODPS, ODPC, ODPG, ODPV, MCP, CLI workflows, and LLM-assisted generation

Project description

Open Data Products Python SDK for AI Agents

Open Data Products Python SDK

PyPI version Python Support License: Apache-2.0

An AI-agent-first Python SDK for the OpenDataProducts.org standards family. It gives agents, agent hosts, and automation systems one consistent surface for loading, detecting, validating, explaining, searching, traversing, and summarizing documents across:

The package still includes developer-facing Python helpers, but the primary contract is agent-ready: structured validation results, lightweight artifact summaries, reference discovery, Data Contract orchestration, bundled retrieval resources, a unified CLI, an MCP stdio server, and an ARWS agent manifest.

Installation

pip install open-data-products

# Optional Data Contract validation adapter:
pip install "open-data-products[contracts]"

# For development:
pip install "open-data-products[dev]"

AI Agent-First SDK

Why Agent First

  • One cross-spec entry point: Agents can call load_document, validate_document, explain_document, and resolve_references across ODPS, ODPC, ODPG, and ODPV files.
  • Structured outputs: Validation, references, resources, summaries, and graph reasoning helpers return predictable objects that are easy for agents to inspect.
  • Small-context workflows: load_summary returns metadata, size, hash, spec, kind, and id without returning full document bodies.
  • Retrieval-ready resources: Bundled schemas, prompt templates, vocabulary records, catalog object records, and graph object records are discoverable through list_resources and MCP tools.
  • Agent-ready ODPC and ODPV helpers: Catalog building, catalog artifact checks, vocabulary term resolution, canonical term packets, relationship compatibility checks, and term context packets are available through Python, CLI, and MCP surfaces where safe.
  • Graph reasoning for agents: ODPG helpers support graph summaries, traversal, strategic analysis, and trusted focus-node context extraction.
  • Data Contract orchestration: Optional datacontract-cli integration validates external contracts while the SDK resolves ODPS contract references, extracts schemas, checks static product-contract alignment, and returns agent-ready reports.
  • Host integration: MCP-capable tools can launch open-data-products serve, while ARWS-compatible systems can read the generated manifest.

Unified Agent API

Use the top-level API when building AI agents, automation, validation pipelines, or tools that need to work across the Open Data Products standards family without knowing the spec namespace ahead of time:

from open_data_products import (
    explain_document,
    generate_local_artifact,
    generate_local_artifacts,
    load_generation_prompt,
    list_resources,
    load_document,
    resolve_references,
    validate_document,
)

document = load_document("examples/product.yaml")
result = validate_document(document)

print(result.valid, result.spec, result.kind)
print(explain_document(document))

for reference in resolve_references(document):
    print(reference.pointer, reference.ref)

for resource in list_resources():
    print(resource.id, resource.spec, resource.type)

prompt = load_generation_prompt("odps_data_product_fragment.md")
signal = generate_local_artifact(
    "signal",
    "open_data_products/generation/source_docs/turnaround-delay-signal.txt",
    "open_data_products/generation/fragments",
)
all_artifacts = generate_local_artifacts(
    "open_data_products/generation/source_docs",
    "open_data_products/generation/fragments",
)

The top-level CLI exposes the same workflow with machine-readable output:

open-data-products validate examples/product.yaml --json
open-data-products explain examples/product.yaml --json
open-data-products refs graph.yaml --json
open-data-products resources --json
open-data-products summary examples/product.yaml      # lightweight reference: size, hash, spec
open-data-products manifest --json           # ARWS agent manifest
open-data-products serve                     # MCP server over stdio

Data Contract support is optional and product-oriented. The SDK recognizes native ODPS /product/contract references ($ref, contractURL, and inline spec) as well as practical extension-style references such as extensions.dataContract.href. External contract lint/export uses datacontract-cli when installed; inline ODPS contract specs are used for static summaries and alignment without running live source tests.

from open_data_products import (
    check_product_contract_alignment,
    extract_contract_schema,
    generate_product_contract_report,
    resolve_product_contracts,
    summarize_contract,
    validate_contract,
)

for reference in resolve_product_contracts("examples/product.yaml"):
    print(reference.pointer, reference.href)

print(validate_contract("examples/contract.yaml").passed)
print(extract_contract_schema("examples/contract.yaml").field_count)
print(check_product_contract_alignment("examples/product.yaml", "examples/contract.yaml").summary)
print(generate_product_contract_report("examples/product.yaml").summary)

Agent Surface (MCP + ARWS)

Run open-data-products serve to expose the SDK as a local MCP server, or open-data-products manifest --json to render the ARWS manifest. See Agent surface for Codex/Claude Code setup, MCP tools, and bundled skills.

Package Structure

Use open_data_products.<spec> namespaces for every standard:

Namespace Standard Status
open_data_products.odps Open Data Product Specification Implemented
open_data_products.odpc Open Data Product Catalog Catalog helpers implemented
open_data_products.odpg Open Data Product Graph Graph helpers implemented
open_data_products.odpv Open Data Product Vocabulary Vocabulary tools implemented

Capabilities at a Glance

Area What agents and developers can do
Cross-spec API Detect, load, validate, explain, summarize, and resolve references across ODPS, ODPC, ODPG, and ODPV
MCP + ARWS Run a local stdio MCP server, expose safe tools, and generate an ARWS agent manifest
ODPS Create, load, validate, serialize, and inspect ODPS v4.1 data product documents
ODPC Build catalogs from fragments, validate catalogs, explain catalog metadata, search bundled catalog object guidance, and generate/check derived catalog schema artifacts
ODPG Validate graphs, summarize nodes and edges, traverse relationships, analyze governance/strategy signals, and extract agent context
ODPV Load, validate, search, generate vocabulary artifacts, resolve terms and aliases, explain canonical term packets, check relationships, and produce agent context for shared ODP terminology
Data Contracts Resolve ODPS contract references, validate external contracts through optional datacontract-cli, extract schemas, check static alignment, and generate product-level reports
Bundled resources Discover schemas, examples, vocabulary records, catalog object records, and graph object records through the resource registry

ODPS support is scoped to the 4.x generation of the specification. The SDK primarily targets ODPS v4.1 and keeps backward-compatible support for ODPS v4.0 documents.

ODPS field validation includes ISO language, country, currency, date/time, phone, email, and URI formats where those standards apply.

Usage Guide

This README is intentionally a short landing page. Use the focused references below for implementation details:

  • API reference: Agent API, spec helper namespaces, ODPS models, validators, serialization, and examples.
  • Agent surface: MCP server, ARWS manifest, and bundled skills for agent hosts.
  • Command guide: what each common CLI command does, what it reads, and what it writes.
  • LLM generation: Ollama or configured external LLM source-doc to ODPC fragment and ODPG graph workflow.
  • Generation development notes: contributor-facing prompt pipeline, ODPS normalization, validation, repair, and testing guidance.
  • Development notes index: contributor-facing internals notes for complex SDK surfaces.
  • Data Contract workflows: ODPS contract resolution, optional datacontract-cli, alignment, and reports.
  • Capability drift reports: dated SDK alignment reports against upstream specification tooling.
  • Tooling development model: human-facing explanation of how spec-level scripts mature into consolidated SDK capabilities.
  • Functional test report: public API, CLI, and MCP functional coverage matrix.
  • Example scripts: runnable ODPS examples, including v4.1 strategy and MCP access examples.
  • Course-style guides: beginner Python setup, simple human SDK workflows, and LLM generation lessons.
  • Sample apps: independent CLIs built on top of the SDK.
  • Agent handoff: compact machine-readable routing for AI agents.

Common Workflows

Most commands print human-readable output by default; add --json when agents, CI jobs, or scripts need a stable machine-readable response. See the command guide for what each command reads, checks, and produces.

# Cross-spec validation and summaries
open-data-products validate examples/product.yaml --json
open-data-products explain examples/odpc_catalog.yaml --json
open-data-products refs open_data_products/odpg/data/graph/graph.yaml --json
open-data-products summary examples/product.yaml

# Bundled agent resources
open-data-products resources --json
open-data-products resources --id generation.prompt.system --json
open-data-products resources --id odpc.objects --json
open-data-products resources --id odpv.terms --json
open-data-products resources --id odpg.objects --json

The LLM generation commands require Ollama or configured provider credentials.

Use the bundled default config and bundled prompts as-is:

# LLM generation
open-data-products generate \
  --input source_docs/products/ \
  --kind product-reference \
  --output generated/ \
  --json

open-data-products generate \
  --input source_docs/turnaround-delay-signal.txt \
  --kind signal \
  --output generated/ \
  --json

Customize provider, model, or paths with a project-owned config:

open-data-products config generation --copy-to my-generation.config.yaml
open-data-products config generation --config my-generation.config.yaml --print
open-data-products config generation --config my-generation.config.yaml --check

open-data-products generate \
  --config my-generation.config.yaml \
  --input source_docs/products/ \
  --kind product-reference \
  --output generated/ \
  --json

The config check verifies required provider/model settings, catches common key typos, rejects secret-looking values, and confirms configured input and prompt paths exist before generation runs.

When installed from PyPI, the bundled generation config lives inside the package as a template. Copy it to a project-owned file before editing provider or model settings; do not edit files under site-packages. The my-generation.config.yaml name below is only an example for your copied file. You can also pass a folder path, such as --copy-to config/, and missing folders are created automatically.

Override the configured provider or model for a single run when testing a different LLM:

open-data-products generate \
  --config my-generation.config.yaml \
  --provider lmstudio \
  --model any-local-model-loaded-in-the-server \
  --input source_docs/products/ \
  --kind product-reference \
  --output generated/ \
  --json

open-data-products generate \
  --config my-generation.config.yaml \
  --provider groq \
  --model openai/gpt-oss-120b \
  --input source_docs/products/ \
  --kind product-reference \
  --output generated/ \
  --json

open-data-products generate \
  --config my-generation.config.yaml \
  --provider claude \
  --model claude-sonnet-4-5 \
  --input source_docs/turnaround-delay-signal.txt \
  --kind signal \
  --output generated/ \
  --json

Generation uses bundled prompt templates by default. If you want to customize the prompts, copy them to a project-owned folder, edit the Markdown files, and pass that folder with --prompts:

open-data-products config generation --copy-prompts-to prompts/

open-data-products generate \
  --config my-generation.config.yaml \
  --prompts prompts/ \
  --input source_docs/signals/ \
  --kind signal \
  --output generated/ \
  --json
# Generated fragment artifacts
open-data-products validate open_data_products/generation/fragments/odpg_graph.yaml --json
open-data-products odpg-generate open_data_products/generation/fragments/odpg_graph.yaml --output /tmp/odp-generation-graph.html --json

# ODPC catalog helpers
open-data-products odpc-build examples/odpc_catalog_fragments/ --output /tmp/odp-catalog.yaml --json
open-data-products odpc-build examples/odpc_catalog_fragments/ --output /tmp/odp-catalog.yaml --html /tmp/odp-catalog.html --json
open-data-products odpc-summary /tmp/odp-catalog.yaml --json
open-data-products odpc-search "catalog data" --limit 3 --json

# ODPV vocabulary helpers
open-data-products odpv-summary --json
open-data-products odpv-search "governance policy risk" --limit 3 --json
open-data-products odpv-resolve "reusable data asset" --json
open-data-products odpv-explain DataProduct --json
open-data-products odpv-relationship DataProduct supports UseCase --json
open-data-products odpv-context DataProduct --json

# ODPG graph reasoning
open-data-products odpg-summary open_data_products/odpg/data/graph/graph.yaml
open-data-products odpg-traverse open_data_products/odpg/data/graph/graph.yaml --start AGENT-AVIATION-001 --depth 2
open-data-products odpg-analyze open_data_products/odpg/data/graph/graph.yaml
open-data-products odpg-agent-context open_data_products/odpg/data/graph/graph.yaml --node AGENT-AVIATION-001 --depth 2
open-data-products odpg-convert --input examples/graph.graphml --output /tmp/odp-converted-graph.yaml --json
open-data-products odpg-generate open_data_products/odpg/data/graph/graph.yaml --output /tmp/odp-graph-explorer.html --json

# Product-level Data Contract inspection
open-data-products product resolve-contracts examples/product.yaml --json
open-data-products product contract-schema examples/contract.yaml --json

See Data Contract workflows for product contract resolution, optional datacontract-cli integration, alignment checks, reports, and supported ODPS contract reference shapes. Live LLM generation requires Ollama or a configured provider API key; see LLM generation for runnable provider examples.

Spec-Specific Entry Points

  • open_data_products.generation: editable prompt templates and provider-backed generation helpers for ODPS, ODPC, and ODPG YAML artifacts. Defaults to local Ollama/Qwen 2.5 and can use copied config templates for external providers such as OpenAI.
  • open_data_products.odps: ODPS v4.1 models, standards-aware validation, YAML/JSON I/O, compliance helpers, and pricing_to_402.
  • open_data_products.odpc: ODPC catalog building, loading, validation, explanation, and object guidance search.
  • open_data_products.odpg: ODPG graph validation, summary, traversal, analysis, agent context, object search, external graph conversion, and graph explorer generation.
  • open_data_products.odpv: ODPV vocabulary loading, validation, search, and generated vocabulary artifacts.

Development

git clone https://github.com/Open-Data-Product-Initiative/odps-python
cd odps-python
pip install -e ".[dev]"
python examples/basic_usage.py

Dependencies

The library requires the following runtime packages:

  • PyYAML: YAML format support
  • jsonschema: ODPC and ODPG schema validation

Error Handling

The library provides detailed validation error messages that reference specific standards:

try:
    odp.validate()
except ODPSValidationError as e:
    print(e)
    # Output: "Validation errors: Invalid ISO 639-1 language code: 'xyz'; 
    #          dataHolder email must be a valid RFC 5322 email address"

Examples

ODPS v4.1 Example

See examples/odps_v41_example.py for a demonstration of key v4.1 features including:

  • ProductStrategy with business objectives
  • KPI definitions with targets and calculations
  • AI agent integration via MCP
  • Enhanced $ref support

Run the example:

python examples/odps_v41_example.py

Additional Examples

Generation Inputs And Outputs

See LLM generation for source documents, prompts, provider configuration, generated fragments, ODPG graph YAML, and graph explorer output.

Sample Apps

The examples/apps/ folder contains independent, runnable Python sample apps built on top of the SDK. Each app lives in its own folder with a cli.py entry point and can be run directly from the repository root.

  • ODP Document Inspector CLI: inspect any ODPS, ODPC, ODPG, or ODPV YAML/JSON document and print validation, explanation, references, and bundled resource metadata.
  • ODPV Vocabulary Finder CLI: search bundled ODPV terms by natural-language query and print definitions, scores, matched fields, and related terms.
  • ODPS Pricing 402 Builder CLI: build an HTTP 402 payment envelope from an ODPS product with pricing plans.
python examples/apps/document_inspector/cli.py examples/apps/pricing_402_builder/priced_product.yaml
python examples/apps/vocabulary_finder/cli.py "governance policy risk" --limit 5 --json
python examples/apps/pricing_402_builder/cli.py examples/apps/pricing_402_builder/priced_product.yaml --json

Acknowledgments

We extend our gratitude to the following:

Open Data Product Initiative Team - Special thanks to the team at opendataproducts.org for creating and maintaining the emerging Open Data Product standards family, including the Open Data Product Specification (ODPS), Open Data Product Catalog (ODPC), Open Data Product Graphs (ODPG), and Open Data Product Vocabulary (ODPV). Their vision of standardizing data product descriptions, catalogs, graphs, and shared vocabulary has made this SDK possible. These specifications represent years of collaborative effort from industry experts, data practitioners, and open source contributors who are driving the future of data standardization.

Chris Howard / Kitard - Special thanks to Chris Howard from Accenture for creating the original odps-python library. His foundational work made it possible to extend the project into the broader Open Data Products SDK and agent toolkit.

devlouie - Special thanks to devlouie for contributing the MCP layer and Agent Surface on top of the SDK, helping make the Open Data Products standards family easier to use from agentic tools and workflows.

Data Contract CLI - Special thanks to Stefan Negele, Jochen Christ, and Simon Harrer for creating Data Contract CLI, the open source execution engine this SDK can optionally use for external Data Contract validation, export, and ecosystem interoperability.

Python Community - For the exceptional ecosystem of libraries and tools that power this implementation, including PyYAML, jsonschema, and the countless other packages that make Python development a joy.

Data Community - For embracing open standards and driving the need for better data product specifications and tooling that benefits everyone in the data ecosystem.

Documentation Support - Documentation assistance provided by Claude (Anthropic).

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for guidelines, browse the open issues, and consider helping with new features, bug fixes, examples, documentation, or agent-facing workflow improvements.

License

Apache License 2.0 - see LICENSE file for details.

Links & References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

open_data_products-0.2.0.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

open_data_products-0.2.0-py3-none-any.whl (261.7 kB view details)

Uploaded Python 3

File details

Details for the file open_data_products-0.2.0.tar.gz.

File metadata

  • Download URL: open_data_products-0.2.0.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for open_data_products-0.2.0.tar.gz
Algorithm Hash digest
SHA256 00deb007a7c23840221e9b6ac6b6edc626f4212da5c4bccc6fd3a4c589ebb1ed
MD5 e98917ca1bab7b5011a61e899c7a1c3b
BLAKE2b-256 14e11b385025c132ad0f636ae21846767a98e5c9f717861cdc5150db807be9d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for open_data_products-0.2.0.tar.gz:

Publisher: publish-pypi.yml on Open-Data-Product-Initiative/odp-agent-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file open_data_products-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for open_data_products-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e6eaa4dd3e5742109a537dceaa48d92ed85ebff54be2ee6cb0c648accd3c33d
MD5 9d08241e506e51dd7be6bc114923fcdc
BLAKE2b-256 46cc4e9fb6af76811f77dbb56cd3d25d2adfd76770f4e44f9932161da8fffe35

See more details on using hashes here.

Provenance

The following attestation bundles were made for open_data_products-0.2.0-py3-none-any.whl:

Publisher: publish-pypi.yml on Open-Data-Product-Initiative/odp-agent-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page