Skip to main content

Purpose-scoped ADK agents for SDC4 data operations

Project description

SDC Agents

Purpose-scoped ADK agents for producing SDC4-compliant data artifacts from existing datastores.

PyPI License SDC4

SDC Agents — The key to SDCStudio's self-assembling semantic infrastructure


What is SDC Agents?

SDC Agents is an open-source suite of nine purpose-scoped agents built on Google's Agent Development Kit (ADK) that transform data from SQL databases, CSV files, and JSON sources into validated, multi-format SDC4 artifacts — without requiring the user to write XML, RDF, or GQL by hand.

Each agent is an ADK LlmAgent with a narrowly scoped BaseToolset, auditable activity, and enforced isolation boundaries. No single agent can reach across scope boundaries — a compromised or misbehaving agent has blast radius limited to its purpose.

MCP compatibility: Each toolset can also be exported as an MCP server for framework-agnostic integration with non-ADK clients.

From Craftsman to Factory: Traditional RAG/ETL vs Axius SDC pipeline


Architecture: Nine Agents

Agent Purpose Network Datasource Access
Catalog Agent Discover published SDC4 schemas and download artifacts from SDCStudio HTTPS (optional token auth) None
Introspect Agent Examine customer datasources and extract structure (read-only) None Read-only
Mapping Agent Suggest and manage column-to-component mappings None None
Generator Agent Produce SDC4 XML instances from mapped data None Read-only
Validation Agent Validate and sign XML instances via VaaS API HTTPS (token auth) None
Distribution Agent Route artifact packages to customer-local destinations Customer-local only None
Knowledge Agent Ingest customer context (CSV, JSON, TTL, Markdown, PDF, DOCX) into vector store None Read-only (files)
Assembly Agent Discover components, propose hierarchies, assemble published models HTTPS (Assembly API) None
Semantic Discovery Agent Search Vertex AI Search for SDC4 resources (ADK-only) GCP (Vertex AI Search) None

Security Principles

  1. No agent has both datasource access and network access
  2. Read-only datasource access — no agent can write to customer data
  3. Tools are declarative Python functions — ADK derives schemas from type hints and docstrings
  4. Structured audit log — every tool call logged with agent, tool, inputs, outputs, timestamp
  5. No credential sharing — each BaseToolset receives only its own credential scope
  6. Fail closed — errors are returned, never retried with escalated privileges

IEEE 7000-2021 Alignment

SDC Agents is designed consistent with IEEE 7000-2021 Value-based Engineering principles for ethical autonomous system design:

  • Transparency — append-only structured audit log records every tool invocation with agent, tool, inputs, outputs, timestamp, and duration
  • Traceability — all inter-agent handoffs are inspectable files on disk (.sdc-cache/), not opaque in-memory calls
  • Harm minimization — purpose-scoped isolation ensures no single agent can access both customer datasources and external networks; blast radius is confined to each agent's scope
  • Stakeholder value preservation — SDC4's curated, constraint-based semantic model (xsd:restriction only, immutable schemas) encodes data integrity and endurance as system-level guarantees, not optional features

Data Flow

Agents communicate through files on disk, not direct calls. Every handoff is an inspectable, version-controllable artifact:

Catalog Agent → .sdc-cache/schemas/     ─┐
Introspect Agent → .sdc-cache/introspections/ ─┤
                                               ▼
                                    Mapping Agent → .sdc-cache/mappings/
                                               ▼
                                    Generator Agent → ./sdc-output/*.xml
                                               ▼
                                    Validation Agent → ./sdc-output/*.pkg.zip
                                               ▼
                                    Distribution Agent → customer destinations

SDCStudio API Dependencies

SDC Agents consumes two sets of endpoints from SDCStudio:

  • Catalog API (public, optional token auth) — schema discovery, component trees, skeleton templates, schema-level RDF, reference ontologies
  • VaaS API (token auth) — XML validation, signing, artifact package generation

Authenticated Catalog Lookups: When an API key is provided, catalog search results are filtered according to the Modeler's project preferences configured in SDCStudio. If the Modeler's prj_filter setting is enabled (the default), results are scoped to their default project. Without an API key, the catalog returns all published public schemas. This means the same catalog_list_schemas tool returns personalized results for authenticated users and broad results for anonymous browsing, with no change to the tool interface.

See docs/dev/SDC_AGENTS_PRD.md for the full API contract and agent specifications.


Quick Start

Prerequisites

  • Python 3.11+
  • Google ADK 1.25+ (pip install google-adk)

Installation

pip install -e ".[dev]"

Configuration

Copy sdc-agents.example.yaml to sdc-agents.yaml and fill in values:

sdcstudio:
  base_url: "https://sdcstudio.example.com"
  api_key: "${SDC_API_KEY}"          # Token auth (Catalog preferences + VaaS validation)

cache:
  root: ".sdc-cache"
  ttl_hours: 24

audit:
  path: ".sdc-cache/audit.jsonl"
  log_level: "standard"    # "standard" summarizes outputs; "verbose" logs full payloads

datasources:
  my_database:
    type: sql
    connection_string: "${DB_CONNECTION}"   # env var substitution
  my_csv:
    type: csv
    path: "/data/exports/records.csv"

output:
  directory: "./output"
  formats:
    - "xml"

destinations:
  triplestore:
    type: fuseki
    endpoint: "${FUSEKI_URL}"
    auth: "${FUSEKI_AUTH}"
  graph_database:
    type: neo4j
    endpoint: "${NEO4J_URL}"
    database: "sdc4"
  archive:
    type: filesystem
    path: "./archive/{ct_id}/{instance_id}/"
    create_directories: true

Environment variables use ${VAR} syntax. Missing variables cause an immediate KeyError (fail closed).

Usage (ADK — Primary)

from sdc_agents.common.config import load_config
from sdc_agents.agents.catalog import create_catalog_agent
from sdc_agents.agents.introspect import create_introspect_agent
from sdc_agents.agents.mapping import create_mapping_agent
from sdc_agents.agents.generator import create_generator_agent
from sdc_agents.agents.validation import create_validation_agent
from sdc_agents.agents.distribution import create_distribution_agent

config = load_config("sdc-agents.yaml")

# Each factory returns an LlmAgent with its scoped BaseToolset
catalog_agent = create_catalog_agent(config)
introspect_agent = create_introspect_agent(config)
mapping_agent = create_mapping_agent(config)
generator_agent = create_generator_agent(config)
validation_agent = create_validation_agent(config)
distribution_agent = create_distribution_agent(config)

Or construct agents directly with toolsets:

from sdc_agents.common.config import load_config
from sdc_agents.toolsets.catalog import CatalogToolset
from google.adk.agents import LlmAgent

config = load_config("sdc-agents.yaml")

catalog_agent = LlmAgent(
    name="catalog",
    model="gemini-2.0-flash",
    description="Discovers SDC4 schemas from SDCStudio Catalog API.",
    instruction="Discover published SDC4 schemas and download artifacts.",
    tools=[CatalogToolset(config=config)],
)

Usage (MCP — Secondary)

Each agent can be served as an MCP stdio server for non-ADK clients:

# Start the Catalog Agent as an MCP server
sdc-agents serve --mcp catalog

# Start the Introspect Agent as an MCP server
sdc-agents serve --mcp introspect

# Any of the 8 MCP agents: assembly, catalog, distribution, generator, introspect, knowledge, mapping, validation
sdc-agents serve --mcp validation

CLI Commands

# Show configuration summary and agent inventory
sdc-agents info
sdc-agents info --config path/to/sdc-agents.yaml

# Validate a config file (useful in CI)
sdc-agents validate-config
sdc-agents validate-config --config path/to/sdc-agents.yaml

# Inspect the audit log
sdc-agents audit show                        # last 50 records
sdc-agents audit show --agent catalog        # filter by agent
sdc-agents audit show --last 24h --limit 20  # recent records
sdc-agents audit show --audit-path ./logs/audit.jsonl  # custom path

Docker

A single image serves all 8 MCP-servable agents. Select the agent at runtime with SDC_AGENT:

# Serve a single agent as an MCP server
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
  -e SDC_AGENT=catalog \
  ghcr.io/semanticdatacharter/sdc-agents

# Run any CLI command
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
  ghcr.io/semanticdatacharter/sdc-agents info

docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
  ghcr.io/semanticdatacharter/sdc-agents validate-config

Build locally:

docker build -t sdc-agents .
docker run sdc-agents  # prints usage hint

CI/CD

  • CI (.github/workflows/ci.yml): Runs on push to dev and PRs to main. Lints with ruff, checks formatting with black, runs pytest with coverage across Python 3.11/3.12/3.13.
  • Docker (.github/workflows/docker.yml): Builds and pushes to GHCR on push to main and v* tags.
  • PyPI (.github/workflows/release.yml): Publishes to PyPI on v* tags via OIDC trusted publisher (no API tokens).

One-time setup (maintainer):

  1. Configure PyPI trusted publisher — owner: SemanticDataCharter, repo: SDC_Agents, workflow: release.yml, environment: pypi
  2. Create a pypi environment in GitHub repo settings (Settings > Environments)

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=sdc_agents

# Run specific test modules
pytest tests/toolsets/test_catalog.py
pytest tests/security/

Documentation


Implementation Phases

Phase Goal Status
Phase 1 Catalog, Introspect, and Mapping agents with shared infra Complete
Phase 2 Generator and Validation agents, Introspect extensions Complete
Phase 3 Distribution Agent with multi-destination delivery Complete
Phase 4 Production hardening: CLI, Docker, CI/CD, MCP export, documentation Complete
Phase 5 Knowledge Agent + Component Assembly Agent Complete
Phase 5.5 PDF/DOCX Knowledge Sources + Semantic Discovery Agent Complete
Phase 6 ADK ecosystem contributions (adk-sparql-tools, Integration Page) Complete

What's Implemented (Phases 1–3)

Common infrastructure:

  • Pydantic config with ${VAR} substitution (fail closed), append-only JSONL audit logger with credential redaction, cache manager with path helpers

CatalogToolset (5 tools): catalog_list_schemas, catalog_get_schema, catalog_download_schema_rdf, catalog_download_skeleton, catalog_download_ontologies — httpx async, cache-first for immutable schemas, optional token auth for Modeler-scoped results

IntrospectToolset (5 tools): introspect_sql (SELECT-only enforcement), introspect_csv (type inference for 10 types), introspect_json (JSONPath extraction), introspect_mongodb (BSON-to-SDC4 type mapping), introspect_bigquery (BigQuery schema extraction via asyncio.to_thread)

MappingToolset (3 tools): mapping_suggest (type compatibility + name similarity), mapping_confirm, mapping_list

GeneratorToolset (3 tools): generate_instance, generate_batch, generate_preview — skeleton-based XML generation with placeholder substitution and optional element pruning

ValidationToolset (3 tools): validate_instance, sign_instance, validate_batch — VaaS API with path confinement, token auth, artifact package (.pkg.zip) support

DistributionToolset (5 tools): inspect_package, list_destinations, distribute_package, distribute_batch, bootstrap_triplestore — httpx-only connectors for SPARQL Graph Store, Neo4j HTTP, REST API, and filesystem

Agent factories: create_catalog_agent(), create_introspect_agent(), create_mapping_agent(), create_generator_agent(), create_validation_agent(), create_distribution_agent()

176+ tests, 82% coverage — 9 toolsets with 32 disjoint tools, security isolation tests (SQL write rejection, datasource name enforcement, path confinement, credential redaction, no cross-scope tool leakage)

Consumer-first: all tests use httpx.MockTransport — zero live SDCStudio, Fuseki, or Neo4j dependency


Related Projects

  • SDCStudio — SDC4 data model creation and management platform (provides Catalog and VaaS APIs)
  • SDCRM — SDC4 Reference Model specification
  • Form2SDCTemplate — PDF/DOCX to SDC template conversion
  • Google ADK — Agent Development Kit (agent framework)

License & Ownership

Copyright 2025-2026 Axius SDC, Inc.

Licensed under the Apache License 2.0 — see LICENSE for details.

SDC Agents is controlled and maintained by Axius SDC, Inc. The SemanticDataCharter GitHub organization hosts the open-source SDC4 ecosystem on behalf of Axius SDC, Inc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdc_agents-4.3.1.tar.gz (15.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdc_agents-4.3.1-py3-none-any.whl (66.2 kB view details)

Uploaded Python 3

File details

Details for the file sdc_agents-4.3.1.tar.gz.

File metadata

  • Download URL: sdc_agents-4.3.1.tar.gz
  • Upload date:
  • Size: 15.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sdc_agents-4.3.1.tar.gz
Algorithm Hash digest
SHA256 2e615ac8b895fd3e40921de77f0dc0fffffd07223b42d971e8e2daf3720369e5
MD5 86e67414967647637485f410a5feae3b
BLAKE2b-256 86085e7cdf15267dc4ef71324fe13403b2803120a91189c1aba06466bd382405

See more details on using hashes here.

Provenance

The following attestation bundles were made for sdc_agents-4.3.1.tar.gz:

Publisher: release.yml on SemanticDataCharter/SDC_Agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sdc_agents-4.3.1-py3-none-any.whl.

File metadata

  • Download URL: sdc_agents-4.3.1-py3-none-any.whl
  • Upload date:
  • Size: 66.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sdc_agents-4.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 92d4b6361dcc08f077948980a829640aba57bfa938eeb3670f7d903750b089ee
MD5 c8750bd01047d6cac761daed62585ae3
BLAKE2b-256 d57cf97dc43cad7815aeb41df8243d18797b49110ebc15871e20de3c0ef922ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for sdc_agents-4.3.1-py3-none-any.whl:

Publisher: release.yml on SemanticDataCharter/SDC_Agents

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page