SDC4 XML validation with ExceptionalValue recovery
Project description
SDCvalidator
XML Schema Validation with Semantic Data Charter (SDC4) ExceptionalValue Recovery
Overview
SDCvalidator is a specialized XML Schema validation library designed for Semantic Data Charter Release 4 (SDC4) data models. It extends standard XML Schema 1.1 validation with automatic ExceptionalValue injection for validation errors, implementing the SDC4 "quarantine-and-tag" pattern.
When validation errors occur, SDCvalidator:
- Preserves the invalid data in the XML instance
- Inserts SDC4 ExceptionalValue elements to flag the errors
- Classifies errors into 15 ISO 21090-based ExceptionalValue types
- Enables data quality tracking and auditing workflows
This library is based on the excellent xmlschema library by Davide Brunato and SISSA.
Key Features
- SDC4 ExceptionalValue Recovery: Automatic error classification and injection
- Full XML Schema 1.1 Support: XSD 1.0 and 1.1 validation
- Data Quality Tracking: 15 ISO 21090 NULL Flavor-based ExceptionalValue types
- Quarantine-and-Tag Pattern: Preserves invalid data for forensic analysis
- Extensible Error Mapping: Customizable error-to-ExceptionalValue rules
- High-Level API: Simple SDC4Validator interface for common workflows
- Comprehensive Validation Reports: Detailed error summaries with ExceptionalValue classifications
Installation
pip install sdcvalidator
Quick Start
Basic SDC4 Validation with Recovery
from sdcvalidator import SDC4Validator
# Initialize validator with your SDC4 data model schema
validator = SDC4Validator('my_sdc4_datamodel.xsd')
# Validate XML instance and inject ExceptionalValues for errors
recovered_tree = validator.validate_with_recovery('my_instance.xml')
# Save the recovered XML with ExceptionalValue elements
validator.save_recovered_xml('recovered_instance.xml', 'my_instance.xml')
Generate Validation Reports
from sdcvalidator import SDC4Validator
validator = SDC4Validator('my_schema.xsd')
report = validator.validate_and_report('my_instance.xml')
print(f"Valid: {report['valid']}")
print(f"Error count: {report['error_count']}")
print(f"ExceptionalValue types: {report['exceptional_value_type_counts']}")
# Examine individual errors
for error in report['errors']:
print(f"{error['xpath']}: {error['exceptional_value_type']} - {error['reason']}")
Standard XML Schema Validation
SDCvalidator also supports traditional XML Schema validation:
from sdcvalidator import Schema, validate, is_valid
# Create schema (XSD 1.1 by default)
schema = Schema('my_schema.xsd')
# Validate instances
is_valid('my_instance.xml', schema)
validate('my_instance.xml', schema)
# Decode XML to dictionaries
data = schema.to_dict('my_instance.xml')
SDC4 ExceptionalValue Types
SDCvalidator maps validation errors to 15 ISO 21090 NULL Flavor-based ExceptionalValue types:
| Code | Name | Description | Typical Use Case |
|---|---|---|---|
| INV | Invalid | Value not a member of permitted data values | Type violations, pattern mismatches |
| OTH | Other | Value not in coding system | Enumeration violations |
| NI | No Information | Missing/omitted value | Missing required elements |
| NA | Not Applicable | No proper value applicable | Unexpected content |
| UNC | Unencoded | Raw source information | Encoding/format errors |
| UNK | Unknown | Proper value applicable but not known | - |
| ASKU | Asked but Unknown | Information sought but not found | - |
| ASKR | Asked and Refused | Information sought but refused | - |
| NASK | Not Asked | Information not sought | - |
| NAV | Not Available | Information not available | - |
| MSK | Masked | Information masked for privacy/security | - |
| DER | Derived | Derived or calculated value | - |
| PINF | Positive Infinity | Positive infinity | - |
| NINF | Negative Infinity | Negative infinity | - |
| TRC | Trace | Trace amount detected | - |
ExceptionalValue Injection Example
When validation errors occur, SDCvalidator inserts ExceptionalValue elements while preserving the invalid data:
Input XML (invalid):
<sdc4:AdultPopulation>
<label>Adult Population</label>
<xdcount-value>not_a_number</xdcount-value>
<xdcount-units>
<label>Count Units</label>
<xdstring-value>people</xdstring-value>
</xdcount-units>
</sdc4:AdultPopulation>
Output XML (after recovery):
<sdc4:AdultPopulation>
<label>Adult Population</label>
<!-- ExceptionalValue inserted to flag the error -->
<sdc4:INV>
<sdc4:ev-name>Invalid</sdc4:ev-name>
<!-- Validation error: not a valid value for type xs:integer -->
</sdc4:INV>
<!-- Invalid value preserved for auditing -->
<xdcount-value>not_a_number</xdcount-value>
<xdcount-units>
<label>Count Units</label>
<xdstring-value>people</xdstring-value>
</xdcount-units>
</sdc4:AdultPopulation>
Command-Line Interface
Validate and recover XML instances from the command line:
# Validate with ExceptionalValue recovery
sdcvalidate --recover my_instance.xml -o recovered.xml --schema my_schema.xsd
# Generate validation report
sdcvalidate --report my_instance.xml --schema my_schema.xsd
# Standard validation (no recovery)
sdcvalidate my_instance.xml --schema my_schema.xsd
Convert between XML and JSON:
# XML to JSON
sdcvalidator-xml2json my_instance.xml -o output.json --schema my_schema.xsd
# JSON to XML
sdcvalidator-json2xml my_data.json -o output.xml --schema my_schema.xsd
Advanced Usage
Custom Error Mapping Rules
from sdcvalidator import SDC4Validator, ErrorMapper, ExceptionalValueType
# Create custom error mapper
error_mapper = ErrorMapper()
# Add custom rule for confidential data errors
def is_confidential_error(error):
return error.reason and 'confidential' in error.reason.lower()
error_mapper._rules.insert(0, (is_confidential_error, ExceptionalValueType.MSK))
# Use custom mapper
validator = SDC4Validator('my_schema.xsd', error_mapper=error_mapper)
Filtering Valid Data for Analytics
To select only valid data (excluding elements with ExceptionalValues):
from xml.etree import ElementTree as ET
def has_exceptional_value(element):
"""Check if element contains any ExceptionalValue."""
for child in element:
local_name = child.tag.split('}')[1] if '}' in child.tag else child.tag
if local_name in ['INV', 'OTH', 'NI', 'NA', 'UNC', 'UNK', 'MSK',
'ASKU', 'ASKR', 'NASK', 'NAV', 'DER',
'PINF', 'NINF', 'TRC', 'QS']:
return True
return False
# Filter valid elements
tree = ET.parse('recovered_instance.xml')
valid_elements = [elem for elem in tree.iter() if not has_exceptional_value(elem)]
Architecture
SDCvalidator consists of:
- Core Validation (
sdcvalidator.core): Full XML Schema 1.0/1.1 validation engine - SDC4 Module (
sdcvalidator.sdc4): ExceptionalValue injection and error mapping - Resources (
sdcvalidator.resources): XML resource loading and caching - Converters (
sdcvalidator.converters): XML ↔ Python data conversion - XPath (
sdcvalidator.xpath): XPath-based element selection
Documentation
User Documentation
- API Reference - Comprehensive API documentation
- SDC4 Module Documentation - SDC4-specific features
- Semantic Data Charter - SDC4 specification
- SDCRM Repository - Reference model schemas
Developer Documentation
- CONTRIBUTING.md - How to contribute
- CLAUDE.md - Developer guidance and architecture
- SECURITY.md - Security policy and best practices
- CHANGELOG.rst - Version history
Development
Contributing
We welcome contributions! Please see our comprehensive guides:
- CONTRIBUTING.md - Contribution guidelines and workflow
- CLAUDE.md - Developer guide and architecture documentation
- GitHub Issues - Report bugs or request features
- GitHub Discussions - Ask questions and discuss ideas
Running Tests
# Run all tests
pytest
# Run SDC4 tests only
pytest tests/sdc4/ -v
# Run with coverage
pytest --cov=sdcvalidator --cov-report=html
# Run linters
flake8 sdcvalidator
mypy sdcvalidator
Development Setup
# Clone repository
git clone https://github.com/Axius-SDC/sdcvalidator.git
cd sdcvalidator
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
See CLAUDE.md for complete developer guide.
Credits
SDCvalidator is developed by Axius-SDC, Inc. and is based on the xmlschema library by:
- Davide Brunato (brunato@sissa.it)
- SISSA (International School for Advanced Studies)
The core XML Schema validation engine and much of the underlying architecture are from the xmlschema project.
License
This software is distributed under the terms of the MIT License.
Copyright (c) 2025, Axius-SDC, Inc. Copyright (c) 2016-2024, SISSA (International School for Advanced Studies)
See the LICENSE file for details.
SDC4 Ecosystem
SDCvalidator is part of the SDC4 (Semantic Data Charter version 4) ecosystem:
- SDCRM v4.0.0 - Reference model and schemas
- SDCStudio v4.0.0 - Web application for model generation
- SDCvalidator v4.0.1 - This library (validation and recovery)
- Obsidian Template v4.0.0 - Markdown templates for dataset descriptions
All SDC4 projects use 4.x.x versioning - the MAJOR version (4) represents the SDC generation.
Support
- Documentation: https://sdcvalidator.readthedocs.io/
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: contact@axius-sdc.com
- Security: See SECURITY.md for security policy
Acknowledgments
Special thanks to:
- Davide Brunato and SISSA for the excellent xmlschema library
- The Semantic Data Charter community for the SDC4 specification
- All contributors to the project
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sdcvalidator-4.0.5.tar.gz.
File metadata
- Download URL: sdcvalidator-4.0.5.tar.gz
- Upload date:
- Size: 655.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a083a40cc43b58cc0eff64cf8139f7e0399490f2e19fff939ed68e490e4445ba
|
|
| MD5 |
4352e35bdb96f833229d92ed1713d26b
|
|
| BLAKE2b-256 |
7c0ad84db48ea55a197a3e0b0f86dfdad2a3f0bf521133d7ebebe1454bc258ca
|
File details
Details for the file sdcvalidator-4.0.5-py3-none-any.whl.
File metadata
- Download URL: sdcvalidator-4.0.5-py3-none-any.whl
- Upload date:
- Size: 487.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f2c420d8b7172c83ea6fcf1b88720382f607e17f29c6d0806b5d4da80b0b094
|
|
| MD5 |
afc6b3673515a22da4d3febc80f01ae0
|
|
| BLAKE2b-256 |
e3bf2d3afbb4debeb3c01ac1e352abdae3faca1b8a07c31dfd4094445028961b
|