Skip to main content

Secure YAML loader/dumper with !include support, change tracking, and round-trip preservation

Project description

yaml_serializer

A secure YAML loader/dumper with !include support, change tracking, and round‑trip preservation
Part of the protocollab framework.

yaml_serializer is a Python library built on top of ruamel.yaml that provides a safe, production‑ready way to load, modify, and save YAML files. It is the foundation of protocollab's protocol definition handling, but can also be used independently in any Python project that needs robust YAML processing.


✨ Key Features

  • 🔒 Security‑first loading – protects against path traversal, billion laughs, and arbitrary code execution via YAML tags.
  • 🔗 !include tag – split large YAML files into reusable components.
  • 📝 Round‑trip preservation – comments, quotes, and formatting are kept intact when dumping.
  • 🔄 Change tracking – automatic dirty marking and hash‑based change detection for efficient saving.
  • 🧩 Easy modification – helper functions to modify YAML structures while maintaining parent links and dirty flags.
  • 🔀 Smart file renaming – automatically updates !include paths when files are renamed.
  • High test coverage (100%) – battle‑tested and ready for production use.

📦 Installation

Install the standalone package:

pip install yaml-serializer

Install the whole framework when you also need the generators and CLI:

pip install protocollab

For development directly from this repository, either install the full monorepo from the repository root or install this package in editable mode:

pip install -e src/yaml_serializer

After installation, import it as:

from yaml_serializer import SerializerSession

Note: yaml_serializer requires Python 3.10 or later.


🚀 Quick Start

from yaml_serializer import SerializerSession
from yaml_serializer.modify import add_to_dict

# Create a session (encapsulates all state — thread-safe and test-friendly)
session = SerializerSession()

# Load a YAML file (all !include references are resolved automatically)
data = session.load("path/to/file.yaml")

# Modify the structure (parent links and dirty flags are updated automatically)
add_to_dict(data, "new_key", "new_value")

# Save only changed files, preserving all comments and formatting
session.save()

📁 Module Structure

yaml_serializer/
├── __init__.py           # Public API exports
├── serializer.py         # SerializerSession, loading, saving, renaming
├── safe_constructor.py   # Restricted YAML constructor and safety limits
├── modify.py             # Helpers for mutating YAML trees with dirty tracking
├── utils.py              # Path checks, hashing, include helpers, dirty propagation
└── tests/                # Test suite for loading, includes, security, and sessions

📚 Detailed Examples

Working with !include

person.yaml

name: Alice
age: 30

main.yaml

team:
  lead: !include person.yaml
from yaml_serializer import SerializerSession

session = SerializerSession()
data = session.load("main.yaml")
print(data["team"]["lead"]["name"])  # prints "Alice"

Modifying nested structures

from yaml_serializer import SerializerSession
from yaml_serializer.modify import add_to_dict

session = SerializerSession()
data = session.load('protocol.yaml')

# Add a new field to a nested type
add_to_dict(data['types']['Message'], 'timestamp', 'u64')

# Add a new type definition (will mark the file as dirty)
add_to_dict(data['types'], 'NewType', {'field': 'value'})

# Save only changed files
session.save(only_if_changed=True)

Secure loading with custom limits

from yaml_serializer import SerializerSession

config = {
    'max_file_size': 5 * 1024 * 1024,   # 5 MB
    'max_struct_depth': 20,               # max YAML nesting depth (default 50)
    'max_include_depth': 20,              # max !include nesting depth (default 50)
    'max_imports': 50                      # max number of included files (default 100)
}

# Config can be given at construction time (applies to every load call) …
session = SerializerSession(config)
data = session.load('protocol.yaml')

# … or overridden per-load:
data = session.load('protocol.yaml', config={'max_imports': 10})

Renaming files with automatic !include updates

from yaml_serializer import SerializerSession

session = SerializerSession()
session.load('main.yaml')

# Rename an included file – all !include references are automatically updated
session.rename('old_name.yaml', 'new_name.yaml')

session.save()

Multiple independent sessions

from yaml_serializer import SerializerSession

# Two sessions can load the same (or different) files without interfering:
session_a = SerializerSession()
session_b = SerializerSession()

data_a = session_a.load('spec_v1.yaml')
data_b = session_b.load('spec_v2.yaml')

# Modifications to data_a are invisible to session_b and vice-versa.

📖 API Reference

SerializerSession (primary API)

from yaml_serializer import SerializerSession

Each instance is completely independent — thread-safe, reusable, and isolated from other sessions.

SerializerSession(config: Optional[dict] = None)

Create a session with optional default configuration.

Key Default Description
max_file_size 10 MB Maximum file size in bytes
max_struct_depth 50 Maximum YAML nesting depth
max_include_depth 50 Maximum !include chain depth
max_imports 100 Maximum total included files

session.load(path: str, config: Optional[dict] = None) -> CommentedMap

Load path and all !include references. config overrides per-call defaults.

session.save(only_if_changed: bool = True)

Write modified files back to disk.

session.rename(old_path: str, new_path: str)

Rename a file and update all !include references to it.

session.propagate_dirty(file_path: str)

Mark as dirty all files that !include file_path.

session.clear()

Reset all loaded state. Configuration defaults are preserved.


The public helper functions exported from yaml_serializer complement the session API and automatically update parent links and dirty flags.

  • new_commented_map(initial: Optional[dict] = None, parent: Optional[Node] = None) -> CommentedMap
  • new_commented_seq(initial: Optional[list] = None, parent: Optional[Node] = None) -> CommentedSeq
  • add_to_dict(target: CommentedMap, key: str, value: Any)
  • update_in_dict(target: CommentedMap, key: str, value: Any)
  • remove_from_dict(target: CommentedMap, key: str)
  • add_to_list(target: CommentedSeq, value: Any)
  • remove_from_list(target: CommentedSeq, index: int)
  • get_node_hash(node: Union[CommentedMap, CommentedSeq]) -> str – returns the node’s hash (recalculates if dirty).

The lower-level internals in safe_constructor.py and most of serializer.py are implementation details of the current codebase. When using the library directly, prefer SerializerSession plus the re-exported helpers from yaml_serializer.


🛡️ Public API Stability

The following functions from yaml_serializer.utils are part of the stable advanced-use API for yaml_serializer 1.0.0 and are covered by backward-compatibility guarantees for the yaml_serializer 1.x line:

  • canonical_repr
  • compute_hash
  • resolve_include_path
  • is_path_within_root
  • mark_node
  • mark_dirty
  • clear_dirty
  • update_file_attr
  • replace_included
  • mark_includes

These functions are exported via yaml_serializer.utils.__all__ and marked with the _stable_api metadata decorator in the source.

Helpers prefixed with _ are internal implementation details and may change without notice.


🛡️ Security

yaml_serializer was designed with security as a first‑class concern, addressing the shortcomings of many YAML libraries:

  • Restricted YAML tags – only the custom !include tag is allowed; all others (including dangerous Python‑specific tags) are rejected.
  • File size limit – prevents memory exhaustion attacks (configurable, default 10 MB).
  • Nesting depth limit – prevents stack overflow from deeply nested structures (default 50).
  • Path traversal protection!include can only reference files inside the project root (or an explicitly allowed directory).
  • Circular import detection – prevents infinite recursion.
  • Import count limit – stops bomb‑style attacks with thousands of inclusions (default 100).

These measures make yaml_serializer suitable for processing untrusted YAML files – a key advantage over many alternatives.


🧪 Testing & Coverage

The module has an extensive test suite covering all critical paths.

  • Test suite: extensive coverage of critical paths
  • Code coverage: 100% (yaml_serializer)
  • Structure: thematic test modules + conftest.py (shared fixtures)

To run tests locally from the package directory:

pytest tests/ --cov=yaml_serializer

For more detailed output:

pytest tests/ -v --cov=yaml_serializer --cov-report=term-missing

🔧 Development Setup

# Clone the repository (if not already done)
git clone https://github.com/cherninkiy/protocollab
cd protocollab/src/yaml_serializer

# Install the package in editable mode
pip install -e .

# Run tests
pytest tests/

🤝 Contributing

Contributions are welcome! Please read our Contributing Guidelines and Code of Conduct before submitting a pull request.

If you discover a security vulnerability, do not open a public issue; instead, please follow the steps outlined in our Security Policy.


📄 License

yaml_serializer is released under the Apache License 2.0. A local copy is available in LICENSE, and the repository root also contains the canonical project license text in ../../LICENSE.


🙏 Acknowledgements

Built on the shoulders of ruamel.yaml, pydantic, and the Python community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yaml_serializer-1.0.1.tar.gz (42.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yaml_serializer-1.0.1-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file yaml_serializer-1.0.1.tar.gz.

File metadata

  • Download URL: yaml_serializer-1.0.1.tar.gz
  • Upload date:
  • Size: 42.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for yaml_serializer-1.0.1.tar.gz
Algorithm Hash digest
SHA256 309845dc2eacb297ce7903b798ddc7dde737043839f020cc53a7bed02b0ced4b
MD5 bfeae2a8db23ae1e41acaf1d40a20c89
BLAKE2b-256 fb88fb5ca32d5f0ab428163aca973998c8f34145fb8a2c083401c013d9bfd28d

See more details on using hashes here.

File details

Details for the file yaml_serializer-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for yaml_serializer-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c41de581f26969e787d43eb98a78e8ae5f02057feb206521adfba14da7475d56
MD5 b1003f1a2f3dee6b3a79a645461f1759
BLAKE2b-256 3034db5ef52c03bec7571c03faf1f8d5f9cf1bd550e93c70965e7fc7f8015822

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page