Secure YAML loader/dumper with !include support, change tracking, and round-trip preservation
Project description
yaml_serializer
A secure YAML loader/dumper with !include support, change tracking, and round‑trip preservation
Part of the protocollab framework.
yaml_serializer is a Python library built on top of ruamel.yaml that provides a safe, production‑ready way to load, modify, and save YAML files. It is the foundation of protocollab's protocol definition handling, but can also be used independently in any Python project that needs robust YAML processing.
✨ Key Features
- 🔒 Security‑first loading – protects against path traversal, billion laughs, and arbitrary code execution via YAML tags.
- 🔗
!includetag – split large YAML files into reusable components. - 📝 Round‑trip preservation – comments, quotes, and formatting are kept intact when dumping.
- 🔄 Change tracking – automatic dirty marking and hash‑based change detection for efficient saving.
- 🧩 Easy modification – helper functions to modify YAML structures while maintaining parent links and dirty flags.
- 🔀 Smart file renaming – automatically updates
!includepaths when files are renamed. - ✅ High test coverage (100%) – battle‑tested and ready for production use.
📦 Installation
Install the standalone package:
pip install yaml-serializer
Install the whole framework when you also need the generators and CLI:
pip install protocollab
For development directly from this repository, either install the full monorepo from the repository root or install this package in editable mode:
pip install -e src/yaml_serializer
After installation, import it as:
from yaml_serializer import SerializerSession
Note:
yaml_serializerrequires Python 3.10 or later.
🚀 Quick Start
from yaml_serializer import SerializerSession
from yaml_serializer.modify import add_to_dict
# Create a session (encapsulates all state — thread-safe and test-friendly)
session = SerializerSession()
# Load a YAML file (all !include references are resolved automatically)
data = session.load("path/to/file.yaml")
# Modify the structure (parent links and dirty flags are updated automatically)
add_to_dict(data, "new_key", "new_value")
# Save only changed files, preserving all comments and formatting
session.save()
📁 Module Structure
yaml_serializer/
├── __init__.py # Public API exports
├── serializer.py # SerializerSession, loading, saving, renaming
├── safe_constructor.py # Restricted YAML constructor and safety limits
├── modify.py # Helpers for mutating YAML trees with dirty tracking
├── utils.py # Path checks, hashing, include helpers, dirty propagation
└── tests/ # Test suite for loading, includes, security, and sessions
📚 Detailed Examples
Working with !include
person.yaml
name: Alice
age: 30
main.yaml
team:
lead: !include person.yaml
from yaml_serializer import SerializerSession
session = SerializerSession()
data = session.load("main.yaml")
print(data["team"]["lead"]["name"]) # prints "Alice"
Modifying nested structures
from yaml_serializer import SerializerSession
from yaml_serializer.modify import add_to_dict
session = SerializerSession()
data = session.load('protocol.yaml')
# Add a new field to a nested type
add_to_dict(data['types']['Message'], 'timestamp', 'u64')
# Add a new type definition (will mark the file as dirty)
add_to_dict(data['types'], 'NewType', {'field': 'value'})
# Save only changed files
session.save(only_if_changed=True)
Secure loading with custom limits
from yaml_serializer import SerializerSession
config = {
'max_file_size': 5 * 1024 * 1024, # 5 MB
'max_struct_depth': 20, # max YAML nesting depth (default 50)
'max_include_depth': 20, # max !include nesting depth (default 50)
'max_imports': 50 # max number of included files (default 100)
}
# Config can be given at construction time (applies to every load call) …
session = SerializerSession(config)
data = session.load('protocol.yaml')
# … or overridden per-load:
data = session.load('protocol.yaml', config={'max_imports': 10})
Renaming files with automatic !include updates
from yaml_serializer import SerializerSession
session = SerializerSession()
session.load('main.yaml')
# Rename an included file – all !include references are automatically updated
session.rename('old_name.yaml', 'new_name.yaml')
session.save()
Multiple independent sessions
from yaml_serializer import SerializerSession
# Two sessions can load the same (or different) files without interfering:
session_a = SerializerSession()
session_b = SerializerSession()
data_a = session_a.load('spec_v1.yaml')
data_b = session_b.load('spec_v2.yaml')
# Modifications to data_a are invisible to session_b and vice-versa.
📖 API Reference
SerializerSession (primary API)
from yaml_serializer import SerializerSession
Each instance is completely independent — thread-safe, reusable, and isolated from other sessions.
SerializerSession(config: Optional[dict] = None)
Create a session with optional default configuration.
| Key | Default | Description |
|---|---|---|
max_file_size |
10 MB | Maximum file size in bytes |
max_struct_depth |
50 | Maximum YAML nesting depth |
max_include_depth |
50 | Maximum !include chain depth |
max_imports |
100 | Maximum total included files |
session.load(path: str, config: Optional[dict] = None) -> CommentedMap
Load path and all !include references. config overrides per-call defaults.
session.save(only_if_changed: bool = True)
Write modified files back to disk.
session.rename(old_path: str, new_path: str)
Rename a file and update all !include references to it.
session.propagate_dirty(file_path: str)
Mark as dirty all files that !include file_path.
session.clear()
Reset all loaded state. Configuration defaults are preserved.
The public helper functions exported from yaml_serializer complement the
session API and automatically update parent links and dirty flags.
new_commented_map(initial: Optional[dict] = None, parent: Optional[Node] = None) -> CommentedMapnew_commented_seq(initial: Optional[list] = None, parent: Optional[Node] = None) -> CommentedSeqadd_to_dict(target: CommentedMap, key: str, value: Any)update_in_dict(target: CommentedMap, key: str, value: Any)remove_from_dict(target: CommentedMap, key: str)add_to_list(target: CommentedSeq, value: Any)remove_from_list(target: CommentedSeq, index: int)get_node_hash(node: Union[CommentedMap, CommentedSeq]) -> str– returns the node’s hash (recalculates if dirty).
The lower-level internals in safe_constructor.py and most of serializer.py
are implementation details of the current codebase. When using the library
directly, prefer SerializerSession plus the re-exported helpers from
yaml_serializer.
🛡️ Public API Stability
The following functions from yaml_serializer.utils are part of the stable
advanced-use API for yaml_serializer 1.0.0 and are covered by
backward-compatibility guarantees for the yaml_serializer 1.x line:
canonical_reprcompute_hashresolve_include_pathis_path_within_rootmark_nodemark_dirtyclear_dirtyupdate_file_attrreplace_includedmark_includes
These functions are exported via yaml_serializer.utils.__all__ and marked with
the _stable_api metadata decorator in the source.
Helpers prefixed with _ are internal implementation details and may change
without notice.
🛡️ Security
yaml_serializer was designed with security as a first‑class concern, addressing the shortcomings of many YAML libraries:
- Restricted YAML tags – only the custom
!includetag is allowed; all others (including dangerous Python‑specific tags) are rejected. - File size limit – prevents memory exhaustion attacks (configurable, default 10 MB).
- Nesting depth limit – prevents stack overflow from deeply nested structures (default 50).
- Path traversal protection –
!includecan only reference files inside the project root (or an explicitly allowed directory). - Circular import detection – prevents infinite recursion.
- Import count limit – stops bomb‑style attacks with thousands of inclusions (default 100).
These measures make yaml_serializer suitable for processing untrusted YAML files – a key advantage over many alternatives.
🧪 Testing & Coverage
The module has an extensive test suite covering all critical paths.
- Test suite: extensive coverage of critical paths
- Code coverage: 100% (yaml_serializer)
- Structure: thematic test modules +
conftest.py(shared fixtures)
To run tests locally from the package directory:
pytest tests/ --cov=yaml_serializer
For more detailed output:
pytest tests/ -v --cov=yaml_serializer --cov-report=term-missing
🔧 Development Setup
# Clone the repository (if not already done)
git clone https://github.com/cherninkiy/protocollab
cd protocollab/src/yaml_serializer
# Install the package in editable mode
pip install -e .
# Run tests
pytest tests/
🤝 Contributing
Contributions are welcome! Please read our Contributing Guidelines and Code of Conduct before submitting a pull request.
If you discover a security vulnerability, do not open a public issue; instead, please follow the steps outlined in our Security Policy.
📄 License
yaml_serializer is released under the Apache License 2.0. A local copy is
available in LICENSE, and the repository root also contains the
canonical project license text in ../../LICENSE.
🙏 Acknowledgements
Built on the shoulders of ruamel.yaml, pydantic, and the Python community.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yaml_serializer-1.0.1.tar.gz.
File metadata
- Download URL: yaml_serializer-1.0.1.tar.gz
- Upload date:
- Size: 42.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
309845dc2eacb297ce7903b798ddc7dde737043839f020cc53a7bed02b0ced4b
|
|
| MD5 |
bfeae2a8db23ae1e41acaf1d40a20c89
|
|
| BLAKE2b-256 |
fb88fb5ca32d5f0ab428163aca973998c8f34145fb8a2c083401c013d9bfd28d
|
File details
Details for the file yaml_serializer-1.0.1-py3-none-any.whl.
File metadata
- Download URL: yaml_serializer-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c41de581f26969e787d43eb98a78e8ae5f02057feb206521adfba14da7475d56
|
|
| MD5 |
b1003f1a2f3dee6b3a79a645461f1759
|
|
| BLAKE2b-256 |
3034db5ef52c03bec7571c03faf1f8d5f9cf1bd550e93c70965e7fc7f8015822
|