Data processing toolkit: YAML/JSON to relational tables, schema comparison, and metadata management

These details have not been verified by PyPI

Project description

Schema Sentinel

A comprehensive data processing and schema management toolkit for data engineers and analysts. Schema Sentinel provides powerful tools for transforming nested YAML/JSON data into relational structures, generating dynamic schemas, comparing data, and tracking database schema changes.

Perfect for data engineers, analytics teams, and DBAs working with complex configuration files, API responses, nested data structures, or needing to track schema changes across environments.

🎯 Key Features

YAML Shredder - Transform Nested Data

🔄 Automatic Schema Generation - Dynamically infer JSON Schema from YAML/JSON files with auto-detection of types and patterns
📊 Relational Table Conversion - Convert deeply nested YAML/JSON into normalized relational tables with automatic relationship mapping
🗄️ Multi-Database DDL Generation - Generate SQL DDL for Snowflake, PostgreSQL, MySQL, and SQLite
⚡ Data Loading - Load transformed data directly into SQLite databases with automatic indexing
🔍 Structure Analysis - Analyze and identify nested structures, arrays, and potential table candidates
� YAML Comparison - Compare two YAML files by converting to databases and analyzing structural/data differences
�💻 CLI & Python API - Command-line interface and Python API for seamless integration

Schema Comparison (Bonus)

📋 Metadata Extraction - Extract complete schema information from Snowflake databases
💾 Version Control - Store metadata snapshots in SQLite for historical tracking
🔎 Environment Comparison - Compare schemas between dev, staging, and production
📝 Multiple Report Formats - Generate comparison reports in Markdown, HTML, and JSON
🔒 Secure - Best practices for credential management and data security

🎓 Use Cases

YAML Shredder Use Cases

Configuration Management - Transform YAML configs into queryable database tables
API Response Processing - Convert nested JSON API responses into relational format
Data Pipeline Transformation - Normalize complex nested data for analytics
Schema Discovery - Automatically infer schemas from example data
Multi-Source Integration - Combine data from different YAML/JSON sources
Data Versioning - Track changes in configuration files over time
Configuration Drift Detection - Compare YAML configs across environments to identify differences

Schema Comparison Use Cases

Environment Synchronization - Ensure dev, staging, and production schemas are aligned
Change Tracking - Monitor database schema evolution over time
Deployment Validation - Verify schema changes after deployments
Compliance & Auditing - Maintain schema change history for compliance
Migration Planning - Identify schema differences before migrations

📋 Requirements

Python 3.13 or higher
uv - Modern Python package manager
Snowflake account (optional, only for schema comparison features)

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/Igladyshev/schema-sentinel.git
cd schema-sentinel

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh  # Linux/macOS
# or
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"  # Windows

# Set up environment and install dependencies
./setup.sh

# Or manually:
uv venv
source .venv/bin/activate  # Linux/macOS or .venv\Scripts\activate on Windows
uv pip install -e ".[dev,jupyter]"

Quick Start - YAML Processing

Command Line Interface

Schema Sentinel provides organized command groups for different tasks:

YAML Processing Commands (schema-sentinel yaml)

# Analyze YAML structure
uv run schema-sentinel yaml analyze config.yaml

# Generate JSON schema
uv run schema-sentinel yaml schema config.yaml -o schema.json

# Generate relational tables
uv run schema-sentinel yaml tables config.yaml -o output/ -f csv

# Generate SQL DDL
uv run schema-sentinel yaml ddl config.yaml -o schema.sql -d snowflake

# Load data into SQLite
uv run schema-sentinel yaml load config.yaml -db output.db -r CONFIG

# Complete workflow: analyze → tables → DDL → load
uv run schema-sentinel yaml shred config.yaml -db output.db -r CONFIG

# Compare two YAML files
uv run schema-sentinel yaml compare file1.yaml file2.yaml -o comparison.md

Schema Management Commands (schema-sentinel schema)

# Extract Snowflake schema metadata
uv run schema-sentinel schema extract MY_DATABASE --env prod

# Compare two schema snapshots
uv run schema-sentinel schema compare snapshot1 snapshot2 -o report.md

Python API

from yaml_shredder import TableGenerator, DDLGenerator, SQLiteLoader

# Load and convert YAML to tables
table_gen = TableGenerator()
tables = table_gen.generate_tables(data, root_table_name="CONFIG")

# Generate SQL DDL
ddl_gen = DDLGenerator(dialect="sqlite")
ddl = ddl_gen.generate_ddl(tables, table_gen.relationships)

# Load into SQLite
loader = SQLiteLoader("output.db")
loader.load_tables(tables)

YAML Comparison

Python API:

from pathlib import Path
from schema_sentinel.yaml_comparator import YAMLComparator

# Create comparator
comparator = YAMLComparator(output_dir=Path("./temp_dbs"))

# Compare YAML files
report = comparator.compare_yaml_files(
    yaml1_path=Path("config1.yaml"),
    yaml2_path=Path("config2.yaml"),
    output_report=Path("comparison.md"),
    keep_dbs=False,  # Clean up temporary databases
    root_table_name="root"
)

print(report)

Configuration (For Schema Comparison)

For Snowflake schema comparison features, create .env with credentials:

SNOWFLAKE_ACCOUNT=your_account
SNOWFLAKE_USER=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_WAREHOUSE=your_warehouse
SNOWFLAKE_DATABASE=your_database
SNOWFLAKE_ROLE=your_role
SNOWFLAKE_SCHEMAS=PUBLIC,ANALYTICS  # Optional

📖 Documentation

YAML Shredder

YAML Shredder CLI Guide - Complete CLI reference and examples
Notebooks Guide - Jupyter notebooks for data comparison and analysis
Generic Table Comparison - See MPM Comparison and Migration.ipynb for examples

General Documentation

📚 Project Wiki - Comprehensive documentation hub
- Getting Started - Installation and quick start
- Architecture - System design and architecture
- Development Guide - Development environment and guidelines
- Contributing Guide - How to contribute
- Security Guide - Security best practices
- Future Development Plan - Roadmap and upcoming features
Installation & Setup Guide
Development Guide - Detailed development instructions
Contributing Guide - How to contribute
Security Policy - Security guidelines and reporting
Changelog - Version history
Production Checklist - Production readiness guide

🛠️ Development

Setup Development Environment

# Install with development dependencies
uv pip install -e ".[dev,jupyter]"

# Install pre-commit hooks
pre-commit install

# Run tests
make test

# Format code
make format

# Lint code
make lint

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=schema_sentinel --cov-report=html

# Run specific test file
pytest tests/test_metadata.py

Code Quality

# Format code with Ruff
ruff format .

# Lint code
ruff check .

# Type checking
mypy schema_sentinel/

# Run all pre-commit hooks
pre-commit run --all-files

🏗️ Architecture

schema-sentinel/
├── schema_sentinel/              # Main package
│   ├── __init__.py             # Package initialization
│   ├── config/                  # Configuration management
│   │   ├── __init__.py
│   │   └── manager.py          # ConfigManager class
│   ├── markdown_utils/          # Markdown report generation
│   │   └── markdown.py
│   └── metadata_manager/        # Core metadata management
│       ├── engine.py           # Database connection engines
│       ├── metadata.py         # Metadata extraction logic
│       ├── changeset.py        # Change detection and tracking
│       ├── enums.py            # Enumerations and constants
│       ├── utils.py            # Utility functions
│       ├── model/              # Data models
│       │   ├── database.py     # Database model
│       │   ├── schema.py       # Schema model
│       │   ├── table.py        # Table model
│       │   ├── column.py       # Column model
│       │   ├── view.py         # View model
│       │   ├── procedure.py    # Stored procedure model
│       │   ├── function.py     # Function model
│       │   ├── constraint.py   # Constraint models
│       │   └── ...             # Other object models
│       └── lookup/             # Reference data
│           └── sql_data_type.py
├── yaml_shredder/               # YAML/JSON processing toolkit
│   ├── __init__.py
│   ├── schema_generator.py     # Auto JSON Schema generation
│   ├── structure_analyzer.py   # Nested structure analysis
│   ├── table_generator.py      # Relational table conversion
│   ├── ddl_generator.py        # SQL DDL generation
│   └── data_loader.py          # SQLite data loading
├── resources/                   # Configuration and templates
│   ├── examples/               # Example files and configurations
│   │   ├── .env.example        # Environment variables template
│   │   ├── example_sqlite_workflow.py  # SQLite workflow example
│   │   └── ...                 # Other example files
│   ├── db.properties           # Database config template
│   ├── datacompy/templates/    # Report templates
│   ├── meta-db/                # SQLite metadata storage
│   └── migrations-ddl/         # DDL migration procedures
├── tests/                       # Test suite
│   ├── test_config.py          # Configuration tests
│   ├── test_imports.py         # Import tests
│   └── ...                     # Other test files
├── docs/                        # API documentation (pdoc)
├── wiki/                        # Project wiki and guides
└── notebooks/                   # Jupyter notebooks
    ├── MPM Comparison and Migration.ipynb
    └── ...

Supported Database Objects

✅ Databases
✅ Schemas
✅ Tables (with columns, data types, nullability)
✅ Views
✅ Materialized Views
✅ Stored Procedures
✅ Functions (UDFs)
✅ Primary Keys
✅ Foreign Keys
✅ Unique Constraints
✅ Streams
✅ Tasks
✅ Pipes
✅ Stages

🤝 Contributing

We welcome contributions! This is an open source project and we'd love your help to make it better.

How to Contribute

Fork the repository
Create a feature branch from dev (git checkout -b feature/amazing-feature)
Make your changes
Add tests for your changes
Ensure tests pass (pytest)
Format code (ruff format .)
Commit changes (git commit -m 'feat: add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request to merge into dev branch

See CONTRIBUTING.md for detailed guidelines and BRANCHING.md for our branching strategy.

Development Guidelines

Follow PEP 8 style guide (enforced by Ruff)
Add tests for new features
Update documentation
Use conventional commit messages
Ensure CI passes before requesting review

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🔒 Security

Security is a top priority. Please see SECURITY.md for:

Reporting vulnerabilities
Security best practices
Credential management guidelines

Never commit credentials or sensitive data to the repository.

🌟 Acknowledgments

Built with modern Python tooling: uv, Ruff
Powered by SQLAlchemy and Snowflake SQLAlchemy
Inspired by the need for better database change management in data engineering

📊 Project Status

Current Status: Active Development 🚧

This project is being actively developed and prepared for production use. We're working towards v2.1.0 with:

✅ Modern Python packaging (pyproject.toml)
✅ Comprehensive testing framework
✅ CI/CD pipelines
✅ Documentation
🚧 Enhanced metadata extraction
🚧 Additional database support
🚧 Web UI (planned)

Roadmap

v2.1.0 - Current release with uv support, modern tooling
v2.2.0 - DuckDB integration, enhanced data comparator, PostgreSQL & MySQL support
v2.3.0 - REST API, CLI interface, Oracle & SQL Server support
v3.0.0 - Web UI, multi-user support, RBAC, CI/CD integration

📋 See the detailed Future Development Plan for comprehensive roadmap and planned features

💬 Support & Community

Issues: GitHub Issues
Discussions: GitHub Discussions
Questions: Use the question issue template

📈 Stats

GitHub stars GitHub forks GitHub watchers

Made with ❤️ for the data engineering community

If you find this project useful, please consider giving it a ⭐️ on GitHub!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.0.6

Feb 9, 2026

3.0.5

Feb 9, 2026

3.0.4

Feb 8, 2026

3.0.3

Feb 8, 2026

This version

3.0.2

Feb 8, 2026

3.0.1

Feb 8, 2026

3.0.0

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_sentinel-3.0.2.tar.gz (903.7 kB view details)

Uploaded Feb 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

schema_sentinel-3.0.2-py3-none-any.whl (70.0 kB view details)

Uploaded Feb 8, 2026 Python 3

File details

Details for the file schema_sentinel-3.0.2.tar.gz.

File metadata

Download URL: schema_sentinel-3.0.2.tar.gz
Upload date: Feb 8, 2026
Size: 903.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for schema_sentinel-3.0.2.tar.gz
Algorithm	Hash digest
SHA256	`762d2f26ab2d4acd21bdbd52f63604a99d2d11cfee6183df756b7a44e7d09afb`
MD5	`0b8997210d98e9667f8936cb51fd7b06`
BLAKE2b-256	`a24b2985abfeed092e37b3decdd6bd9bc007dbaee208ab2ccaeb5f192ec3a0ed`

See more details on using hashes here.

Provenance

The following attestation bundles were made for schema_sentinel-3.0.2.tar.gz:

Publisher: publish.yml on Igladyshev/schema-sentinel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: schema_sentinel-3.0.2.tar.gz
- Subject digest: 762d2f26ab2d4acd21bdbd52f63604a99d2d11cfee6183df756b7a44e7d09afb
- Sigstore transparency entry: 928472389
- Sigstore integration time: Feb 8, 2026
Source repository:
- Permalink: Igladyshev/schema-sentinel@096f9f01562f9126797820b9c33db54cf56c7b7a
- Branch / Tag: refs/tags/v3.0.2
- Owner: https://github.com/Igladyshev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@096f9f01562f9126797820b9c33db54cf56c7b7a
- Trigger Event: release

File details

Details for the file schema_sentinel-3.0.2-py3-none-any.whl.

File metadata

Download URL: schema_sentinel-3.0.2-py3-none-any.whl
Upload date: Feb 8, 2026
Size: 70.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for schema_sentinel-3.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac8ae2d06e85365627ad08289b5b12f67acc78b64b828d348892e4352c584852`
MD5	`665172ee2af80affd64d1ef7b3b5ba24`
BLAKE2b-256	`16cbd9e40ab7e8d0c4bd66f88d84ece7ce127b3205e9d3eab073fa69f5138264`

See more details on using hashes here.

Provenance

The following attestation bundles were made for schema_sentinel-3.0.2-py3-none-any.whl:

Publisher: publish.yml on Igladyshev/schema-sentinel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: schema_sentinel-3.0.2-py3-none-any.whl
- Subject digest: ac8ae2d06e85365627ad08289b5b12f67acc78b64b828d348892e4352c584852
- Sigstore transparency entry: 928472391
- Sigstore integration time: Feb 8, 2026
Source repository:
- Permalink: Igladyshev/schema-sentinel@096f9f01562f9126797820b9c33db54cf56c7b7a
- Branch / Tag: refs/tags/v3.0.2
- Owner: https://github.com/Igladyshev
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@096f9f01562f9126797820b9c33db54cf56c7b7a
- Trigger Event: release

schema-sentinel 3.0.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Schema Sentinel

🎯 Key Features

YAML Shredder - Transform Nested Data

Schema Comparison (Bonus)

🎓 Use Cases

YAML Shredder Use Cases

Schema Comparison Use Cases

📋 Requirements

🚀 Quick Start

Installation

Quick Start - YAML Processing

Command Line Interface

Python API

YAML Comparison

Configuration (For Schema Comparison)

📖 Documentation

YAML Shredder

General Documentation

🛠️ Development

Setup Development Environment

Running Tests

Code Quality

🏗️ Architecture

Supported Database Objects

🤝 Contributing

How to Contribute

Development Guidelines

📄 License

🔒 Security

🌟 Acknowledgments

📊 Project Status

Roadmap

💬 Support & Community

📈 Stats

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance