Purpose-scoped ADK agents for SDC4 data operations
Project description
SDC Agents
Purpose-scoped ADK agents for producing SDC4-compliant data artifacts from existing datastores.
What is SDC Agents?
SDC Agents is an open-source suite of nine purpose-scoped agents built on Google's Agent Development Kit (ADK) that transform data from SQL databases, CSV files, and JSON sources into validated, multi-format SDC4 artifacts — without requiring the user to write XML, RDF, or GQL by hand.
Each agent is an ADK LlmAgent with a narrowly scoped BaseToolset, auditable activity, and enforced isolation boundaries. No single agent can reach across scope boundaries — a compromised or misbehaving agent has blast radius limited to its purpose.
MCP compatibility: Each toolset can also be exported as an MCP server for framework-agnostic integration with non-ADK clients.
Architecture: Nine Agents
| Agent | Purpose | Network | Datasource Access |
|---|---|---|---|
| Catalog Agent | Discover published SDC4 schemas and download artifacts from SDCStudio | HTTPS (optional token auth) | None |
| Introspect Agent | Examine customer datasources and extract structure (read-only) | None | Read-only |
| Mapping Agent | Suggest and manage column-to-component mappings | None | None |
| Generator Agent | Produce SDC4 XML instances from mapped data | None | Read-only |
| Validation Agent | Validate and sign XML instances via VaaS API | HTTPS (token auth) | None |
| Distribution Agent | Route artifact packages to customer-local destinations | Customer-local only | None |
| Knowledge Agent | Ingest customer context (CSV, JSON, TTL, Markdown, PDF, DOCX) into vector store | None | Read-only (files) |
| Assembly Agent | Discover components, propose hierarchies, assemble published models | HTTPS (Assembly API) | None |
| Semantic Discovery Agent | Search Vertex AI Search for SDC4 resources (ADK-only) | GCP (Vertex AI Search) | None |
Security Principles
- No agent has both datasource access and network access
- Read-only datasource access — no agent can write to customer data
- Tools are declarative Python functions — ADK derives schemas from type hints and docstrings
- Structured audit log — every tool call logged with agent, tool, inputs, outputs, timestamp
- No credential sharing — each
BaseToolsetreceives only its own credential scope - Fail closed — errors are returned, never retried with escalated privileges
IEEE 7000-2021 Alignment
SDC Agents is designed consistent with IEEE 7000-2021 Value-based Engineering principles for ethical autonomous system design:
- Transparency — append-only structured audit log records every tool invocation with agent, tool, inputs, outputs, timestamp, and duration
- Traceability — all inter-agent handoffs are inspectable files on disk (
.sdc-cache/), not opaque in-memory calls - Harm minimization — purpose-scoped isolation ensures no single agent can access both customer datasources and external networks; blast radius is confined to each agent's scope
- Stakeholder value preservation — SDC4's curated, constraint-based semantic model (
xsd:restrictiononly, immutable schemas) encodes data integrity and endurance as system-level guarantees, not optional features
Data Flow
Agents communicate through files on disk, not direct calls. Every handoff is an inspectable, version-controllable artifact:
Catalog Agent → .sdc-cache/schemas/ ─┐
Introspect Agent → .sdc-cache/introspections/ ─┤
▼
Mapping Agent → .sdc-cache/mappings/
▼
Generator Agent → ./sdc-output/*.xml
▼
Validation Agent → ./sdc-output/*.pkg.zip
▼
Distribution Agent → customer destinations
SDCStudio API Dependencies
SDC Agents consumes two sets of endpoints from SDCStudio:
- Catalog API (public, optional token auth) — schema discovery, component trees, skeleton templates, schema-level RDF, reference ontologies
- VaaS API (token auth) — XML validation, signing, artifact package generation
Authenticated Catalog Lookups: When an API key is provided, catalog search results are filtered according to the Modeler's project preferences configured in SDCStudio. If the Modeler's
prj_filtersetting is enabled (the default), results are scoped to their default project. Without an API key, the catalog returns all published public schemas. This means the samecatalog_list_schemastool returns personalized results for authenticated users and broad results for anonymous browsing, with no change to the tool interface.
See docs/dev/SDC_AGENTS_PRD.md for the full API contract and agent specifications.
Quick Start
Prerequisites
- Python 3.11+
- Google ADK 1.25+ (
pip install google-adk)
Installation
pip install -e ".[dev]"
Configuration
Copy sdc-agents.example.yaml to sdc-agents.yaml and fill in values:
sdcstudio:
base_url: "https://sdcstudio.example.com"
api_key: "${SDC_API_KEY}" # Token auth (Catalog preferences + VaaS validation)
cache:
root: ".sdc-cache"
ttl_hours: 24
audit:
path: ".sdc-cache/audit.jsonl"
log_level: "standard" # "standard" summarizes outputs; "verbose" logs full payloads
datasources:
my_database:
type: sql
connection_string: "${DB_CONNECTION}" # env var substitution
my_csv:
type: csv
path: "/data/exports/records.csv"
output:
directory: "./output"
formats:
- "xml"
destinations:
triplestore:
type: fuseki
endpoint: "${FUSEKI_URL}"
auth: "${FUSEKI_AUTH}"
graph_database:
type: neo4j
endpoint: "${NEO4J_URL}"
database: "sdc4"
archive:
type: filesystem
path: "./archive/{ct_id}/{instance_id}/"
create_directories: true
Environment variables use ${VAR} syntax. Missing variables cause an immediate KeyError (fail closed).
Usage (ADK — Primary)
from sdc_agents.common.config import load_config
from sdc_agents.agents.catalog import create_catalog_agent
from sdc_agents.agents.introspect import create_introspect_agent
from sdc_agents.agents.mapping import create_mapping_agent
from sdc_agents.agents.generator import create_generator_agent
from sdc_agents.agents.validation import create_validation_agent
from sdc_agents.agents.distribution import create_distribution_agent
config = load_config("sdc-agents.yaml")
# Each factory returns an LlmAgent with its scoped BaseToolset
catalog_agent = create_catalog_agent(config)
introspect_agent = create_introspect_agent(config)
mapping_agent = create_mapping_agent(config)
generator_agent = create_generator_agent(config)
validation_agent = create_validation_agent(config)
distribution_agent = create_distribution_agent(config)
Or construct agents directly with toolsets:
from sdc_agents.common.config import load_config
from sdc_agents.toolsets.catalog import CatalogToolset
from google.adk.agents import LlmAgent
config = load_config("sdc-agents.yaml")
catalog_agent = LlmAgent(
name="catalog",
model="gemini-2.0-flash",
description="Discovers SDC4 schemas from SDCStudio Catalog API.",
instruction="Discover published SDC4 schemas and download artifacts.",
tools=[CatalogToolset(config=config)],
)
Usage (MCP — Secondary)
Each agent can be served as an MCP stdio server for non-ADK clients:
# Start the Catalog Agent as an MCP server
sdc-agents serve --mcp catalog
# Start the Introspect Agent as an MCP server
sdc-agents serve --mcp introspect
# Any of the 8 MCP agents: assembly, catalog, distribution, generator, introspect, knowledge, mapping, validation
sdc-agents serve --mcp validation
CLI Commands
# Show configuration summary and agent inventory
sdc-agents info
sdc-agents info --config path/to/sdc-agents.yaml
# Validate a config file (useful in CI)
sdc-agents validate-config
sdc-agents validate-config --config path/to/sdc-agents.yaml
# Inspect the audit log
sdc-agents audit show # last 50 records
sdc-agents audit show --agent catalog # filter by agent
sdc-agents audit show --last 24h --limit 20 # recent records
sdc-agents audit show --audit-path ./logs/audit.jsonl # custom path
Docker
A single image serves all 8 MCP-servable agents. Select the agent at runtime with SDC_AGENT:
# Serve a single agent as an MCP server
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
-e SDC_AGENT=catalog \
ghcr.io/semanticdatacharter/sdc-agents
# Run any CLI command
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
ghcr.io/semanticdatacharter/sdc-agents info
docker run -v ./sdc-agents.yaml:/home/sdc/sdc-agents.yaml:ro \
ghcr.io/semanticdatacharter/sdc-agents validate-config
Build locally:
docker build -t sdc-agents .
docker run sdc-agents # prints usage hint
CI/CD
- CI (
.github/workflows/ci.yml): Runs on push todevand PRs tomain. Lints with ruff, checks formatting with black, runs pytest with coverage across Python 3.11/3.12/3.13. - Docker (
.github/workflows/docker.yml): Builds and pushes to GHCR on push tomainandv*tags. - PyPI (
.github/workflows/release.yml): Publishes to PyPI onv*tags via OIDC trusted publisher (no API tokens).
One-time setup (maintainer):
- Configure PyPI trusted publisher — owner:
SemanticDataCharter, repo:SDC_Agents, workflow:release.yml, environment:pypi - Create a
pypienvironment in GitHub repo settings (Settings > Environments)
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=sdc_agents
# Run specific test modules
pytest tests/toolsets/test_catalog.py
pytest tests/security/
Documentation
- User Documentation — configuration, tool reference, MCP integration, workflow guides
- Product Requirements — full agent specifications, tools, security model, type mapping tables
- Contributing — development setup, coding standards, PR workflow
- Security Policy — vulnerability reporting, agent isolation model
- Changelog — release history
Implementation Phases
| Phase | Goal | Status |
|---|---|---|
| Phase 1 | Catalog, Introspect, and Mapping agents with shared infra | Complete |
| Phase 2 | Generator and Validation agents, Introspect extensions | Complete |
| Phase 3 | Distribution Agent with multi-destination delivery | Complete |
| Phase 4 | Production hardening: CLI, Docker, CI/CD, MCP export, documentation | Complete |
| Phase 5 | Knowledge Agent + Component Assembly Agent | Complete |
| Phase 5.5 | PDF/DOCX Knowledge Sources + Semantic Discovery Agent | Complete |
| Phase 6 | ADK ecosystem contributions (adk-sparql-tools, Integration Page) |
Complete |
What's Implemented (Phases 1–3)
Common infrastructure:
- Pydantic config with
${VAR}substitution (fail closed), append-only JSONL audit logger with credential redaction, cache manager with path helpers
CatalogToolset (5 tools): catalog_list_schemas, catalog_get_schema, catalog_download_schema_rdf, catalog_download_skeleton, catalog_download_ontologies — httpx async, cache-first for immutable schemas, optional token auth for Modeler-scoped results
IntrospectToolset (5 tools): introspect_sql (SELECT-only enforcement), introspect_csv (type inference for 10 types), introspect_json (JSONPath extraction), introspect_mongodb (BSON-to-SDC4 type mapping), introspect_bigquery (BigQuery schema extraction via asyncio.to_thread)
MappingToolset (3 tools): mapping_suggest (type compatibility + name similarity), mapping_confirm, mapping_list
GeneratorToolset (3 tools): generate_instance, generate_batch, generate_preview — skeleton-based XML generation with placeholder substitution and optional element pruning
ValidationToolset (3 tools): validate_instance, sign_instance, validate_batch — VaaS API with path confinement, token auth, artifact package (.pkg.zip) support
DistributionToolset (5 tools): inspect_package, list_destinations, distribute_package, distribute_batch, bootstrap_triplestore — httpx-only connectors for SPARQL Graph Store, Neo4j HTTP, REST API, and filesystem
Agent factories: create_catalog_agent(), create_introspect_agent(), create_mapping_agent(), create_generator_agent(), create_validation_agent(), create_distribution_agent()
176+ tests, 82% coverage — 9 toolsets with 32 disjoint tools, security isolation tests (SQL write rejection, datasource name enforcement, path confinement, credential redaction, no cross-scope tool leakage)
Consumer-first: all tests use httpx.MockTransport — zero live SDCStudio, Fuseki, or Neo4j dependency
Related Projects
- SDCStudio — SDC4 data model creation and management platform (provides Catalog and VaaS APIs)
- SDCRM — SDC4 Reference Model specification
- Form2SDCTemplate — PDF/DOCX to SDC template conversion
- Google ADK — Agent Development Kit (agent framework)
License & Ownership
Copyright 2025-2026 Axius SDC, Inc.
Licensed under the Apache License 2.0 — see LICENSE for details.
SDC Agents is controlled and maintained by Axius SDC, Inc. The SemanticDataCharter GitHub organization hosts the open-source SDC4 ecosystem on behalf of Axius SDC, Inc.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sdc_agents-4.3.2.tar.gz.
File metadata
- Download URL: sdc_agents-4.3.2.tar.gz
- Upload date:
- Size: 15.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e82107e2e0acc24e5ca627e1208c1191a06a25d71d17d13f30773656659cb8bc
|
|
| MD5 |
762c7f73f5f7b3693c395c9fd4a770d8
|
|
| BLAKE2b-256 |
0ddaa2a0656e06843792b35cd2c54b45eb4ab6458d3c7d7e205f662544a5dd96
|
Provenance
The following attestation bundles were made for sdc_agents-4.3.2.tar.gz:
Publisher:
release.yml on SemanticDataCharter/SDC_Agents
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sdc_agents-4.3.2.tar.gz -
Subject digest:
e82107e2e0acc24e5ca627e1208c1191a06a25d71d17d13f30773656659cb8bc - Sigstore transparency entry: 1180615332
- Sigstore integration time:
-
Permalink:
SemanticDataCharter/SDC_Agents@364c07375d8334379784cef73280a7ca53a06eee -
Branch / Tag:
refs/tags/v4.3.2 - Owner: https://github.com/SemanticDataCharter
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@364c07375d8334379784cef73280a7ca53a06eee -
Trigger Event:
push
-
Statement type:
File details
Details for the file sdc_agents-4.3.2-py3-none-any.whl.
File metadata
- Download URL: sdc_agents-4.3.2-py3-none-any.whl
- Upload date:
- Size: 66.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6e026109e492f3664fe1e945656b980940821638448ef3695ffbb031687e2fc
|
|
| MD5 |
fcac545e1a1189b46fb1266ec213249f
|
|
| BLAKE2b-256 |
35939b4281da1031dbb30c76c6883ec0641fcc6b08fbd87d76a8f558f67c5cf5
|
Provenance
The following attestation bundles were made for sdc_agents-4.3.2-py3-none-any.whl:
Publisher:
release.yml on SemanticDataCharter/SDC_Agents
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sdc_agents-4.3.2-py3-none-any.whl -
Subject digest:
f6e026109e492f3664fe1e945656b980940821638448ef3695ffbb031687e2fc - Sigstore transparency entry: 1180615497
- Sigstore integration time:
-
Permalink:
SemanticDataCharter/SDC_Agents@364c07375d8334379784cef73280a7ca53a06eee -
Branch / Tag:
refs/tags/v4.3.2 - Owner: https://github.com/SemanticDataCharter
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@364c07375d8334379784cef73280a7ca53a06eee -
Trigger Event:
push
-
Statement type: