Skip to main content

Secure YAML loader/dumper with !include support, change tracking, and round-trip preservation

Reason this release was yanked:

Superseded by 1.0.1 metadata-only release with corrected package metadata.

Project description

yaml_serializer

A secure YAML loader/dumper with !include support, change tracking, and round‑trip preservation
Part of the protocollab framework.

yaml_serializer is a Python library built on top of ruamel.yaml that provides a safe, production‑ready way to load, modify, and save YAML files. It is the foundation of protocollab's protocol definition handling, but can also be used independently in any Python project that needs robust YAML processing.


✨ Key Features

  • 🔒 Security‑first loading – protects against path traversal, billion laughs, and arbitrary code execution via YAML tags.
  • 🔗 !include tag – split large YAML files into reusable components.
  • 📝 Round‑trip preservation – comments, quotes, and formatting are kept intact when dumping.
  • 🔄 Change tracking – automatic dirty marking and hash‑based change detection for efficient saving.
  • 🧩 Easy modification – helper functions to modify YAML structures while maintaining parent links and dirty flags.
  • 🔀 Smart file renaming – automatically updates !include paths when files are renamed.
  • High test coverage (100%) – battle‑tested and ready for production use.

📦 Installation

Install the standalone package:

pip install yaml-serializer

Install the whole framework when you also need the generators and CLI:

pip install protocollab

For development directly from this repository, either install the full monorepo from the repository root or install this package in editable mode:

pip install -e src/yaml_serializer

After installation, import it as:

from yaml_serializer import SerializerSession

Note: yaml_serializer requires Python 3.10 or later.


🚀 Quick Start

from yaml_serializer import SerializerSession
from yaml_serializer.modify import add_to_dict

# Create a session (encapsulates all state — thread-safe and test-friendly)
session = SerializerSession()

# Load a YAML file (all !include references are resolved automatically)
data = session.load("path/to/file.yaml")

# Modify the structure (parent links and dirty flags are updated automatically)
add_to_dict(data, "new_key", "new_value")

# Save only changed files, preserving all comments and formatting
session.save()

📁 Module Structure

yaml_serializer/
├── __init__.py           # Public API exports
├── serializer.py         # SerializerSession, loading, saving, renaming
├── safe_constructor.py   # Restricted YAML constructor and safety limits
├── modify.py             # Helpers for mutating YAML trees with dirty tracking
├── utils.py              # Path checks, hashing, include helpers, dirty propagation
└── tests/                # Test suite for loading, includes, security, and sessions

📚 Detailed Examples

Working with !include

person.yaml

name: Alice
age: 30

main.yaml

team:
  lead: !include person.yaml
from yaml_serializer import SerializerSession

session = SerializerSession()
data = session.load("main.yaml")
print(data["team"]["lead"]["name"])  # prints "Alice"

Modifying nested structures

from yaml_serializer import SerializerSession
from yaml_serializer.modify import add_to_dict

session = SerializerSession()
data = session.load('protocol.yaml')

# Add a new field to a nested type
add_to_dict(data['types']['Message'], 'timestamp', 'u64')

# Add a new type definition (will mark the file as dirty)
add_to_dict(data['types'], 'NewType', {'field': 'value'})

# Save only changed files
session.save(only_if_changed=True)

Secure loading with custom limits

from yaml_serializer import SerializerSession

config = {
    'max_file_size': 5 * 1024 * 1024,   # 5 MB
    'max_struct_depth': 20,               # max YAML nesting depth (default 50)
    'max_include_depth': 20,              # max !include nesting depth (default 50)
    'max_imports': 50                      # max number of included files (default 100)
}

# Config can be given at construction time (applies to every load call) …
session = SerializerSession(config)
data = session.load('protocol.yaml')

# … or overridden per-load:
data = session.load('protocol.yaml', config={'max_imports': 10})

Renaming files with automatic !include updates

from yaml_serializer import SerializerSession

session = SerializerSession()
session.load('main.yaml')

# Rename an included file – all !include references are automatically updated
session.rename('old_name.yaml', 'new_name.yaml')

session.save()

Multiple independent sessions

from yaml_serializer import SerializerSession

# Two sessions can load the same (or different) files without interfering:
session_a = SerializerSession()
session_b = SerializerSession()

data_a = session_a.load('spec_v1.yaml')
data_b = session_b.load('spec_v2.yaml')

# Modifications to data_a are invisible to session_b and vice-versa.

📖 API Reference

SerializerSession (primary API)

from yaml_serializer import SerializerSession

Each instance is completely independent — thread-safe, reusable, and isolated from other sessions.

SerializerSession(config: Optional[dict] = None)

Create a session with optional default configuration.

Key Default Description
max_file_size 10 MB Maximum file size in bytes
max_struct_depth 50 Maximum YAML nesting depth
max_include_depth 50 Maximum !include chain depth
max_imports 100 Maximum total included files

session.load(path: str, config: Optional[dict] = None) -> CommentedMap

Load path and all !include references. config overrides per-call defaults.

session.save(only_if_changed: bool = True)

Write modified files back to disk.

session.rename(old_path: str, new_path: str)

Rename a file and update all !include references to it.

session.propagate_dirty(file_path: str)

Mark as dirty all files that !include file_path.

session.clear()

Reset all loaded state. Configuration defaults are preserved.


The public helper functions exported from yaml_serializer complement the session API and automatically update parent links and dirty flags.

  • new_commented_map(initial: Optional[dict] = None, parent: Optional[Node] = None) -> CommentedMap
  • new_commented_seq(initial: Optional[list] = None, parent: Optional[Node] = None) -> CommentedSeq
  • add_to_dict(target: CommentedMap, key: str, value: Any)
  • update_in_dict(target: CommentedMap, key: str, value: Any)
  • remove_from_dict(target: CommentedMap, key: str)
  • add_to_list(target: CommentedSeq, value: Any)
  • remove_from_list(target: CommentedSeq, index: int)
  • get_node_hash(node: Union[CommentedMap, CommentedSeq]) -> str – returns the node’s hash (recalculates if dirty).

The lower-level internals in safe_constructor.py and most of serializer.py are implementation details of the current codebase. When using the library directly, prefer SerializerSession plus the re-exported helpers from yaml_serializer.


🛡️ Public API Stability

The following functions from yaml_serializer.utils are part of the stable advanced-use API for yaml_serializer 1.0.0 and are covered by backward-compatibility guarantees for the yaml_serializer 1.x line:

  • canonical_repr
  • compute_hash
  • resolve_include_path
  • is_path_within_root
  • mark_node
  • mark_dirty
  • clear_dirty
  • update_file_attr
  • replace_included
  • mark_includes

These functions are exported via yaml_serializer.utils.__all__ and marked with the _stable_api metadata decorator in the source.

Helpers prefixed with _ are internal implementation details and may change without notice.


🛡️ Security

yaml_serializer was designed with security as a first‑class concern, addressing the shortcomings of many YAML libraries:

  • Restricted YAML tags – only the custom !include tag is allowed; all others (including dangerous Python‑specific tags) are rejected.
  • File size limit – prevents memory exhaustion attacks (configurable, default 10 MB).
  • Nesting depth limit – prevents stack overflow from deeply nested structures (default 50).
  • Path traversal protection!include can only reference files inside the project root (or an explicitly allowed directory).
  • Circular import detection – prevents infinite recursion.
  • Import count limit – stops bomb‑style attacks with thousands of inclusions (default 100).

These measures make yaml_serializer suitable for processing untrusted YAML files – a key advantage over many alternatives.


🧪 Testing & Coverage

The module has an extensive test suite covering all critical paths.

  • Test suite: extensive coverage of critical paths
  • Code coverage: 100% (yaml_serializer)
  • Structure: thematic test modules + conftest.py (shared fixtures)

To run tests locally from the package directory:

pytest tests/ --cov=yaml_serializer

For more detailed output:

pytest tests/ -v --cov=yaml_serializer --cov-report=term-missing

🔧 Development Setup

# Clone the repository (if not already done)
git clone https://github.com/cherninkiy/protocollab
cd protocollab/src/yaml_serializer

# Install the package in editable mode
pip install -e .

# Run tests
pytest tests/

🤝 Contributing

Contributions are welcome! Please read our Contributing Guidelines and Code of Conduct before submitting a pull request.

If you discover a security vulnerability, do not open a public issue; instead, please follow the steps outlined in our Security Policy.


📄 License

yaml_serializer is released under the Apache License 2.0. A local copy is available in LICENSE, and the repository root also contains the canonical project license text in ../../LICENSE.


🙏 Acknowledgements

Built on the shoulders of ruamel.yaml, pydantic, and the Python community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yaml_serializer-1.0.0.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yaml_serializer-1.0.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file yaml_serializer-1.0.0.tar.gz.

File metadata

  • Download URL: yaml_serializer-1.0.0.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for yaml_serializer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 25d9b663ab8b50c97fc5dabfdc310b8e1959989c6f55d688a3efd4a5658f104f
MD5 7ff730450463f9649526552d2a236237
BLAKE2b-256 ee1beefb4f40dd14467dce89fe6e18d8fb2af8bdedd695c671092cbdc78fce55

See more details on using hashes here.

File details

Details for the file yaml_serializer-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for yaml_serializer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ddf86103877ed81db2fa1a47fcc04794e15e1261cba1b8a48f591728026175d
MD5 b36c3b624b28c1d3ddd6cb27bbbf0d42
BLAKE2b-256 1ee28d26934c501e4b0e9054f18334c9989f7023fbc3aa8147ba3244edffce7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page