Pure Python YAML 1.2 parser - drop-in replacement for PyYAML
Project description
PyYAML-Pure
A pure Python YAML 1.2 parser and emitter — a drop-in replacement for PyYAML with better spec compliance and comment preservation.
Why PyYAML-Pure?
| Feature | PyYAML | PyYAML-Pure |
|---|---|---|
| YAML Test Suite (Event Parsing) | 76.3% | 100% |
| YAML Test Suite (JSON Comparison) | 76.3% | 93.5% |
| Pure Python | No (C extension) | Yes |
| YAML 1.2 Compliance | Partial | Full |
| Comment Preservation | No | Yes |
| C Compiler Required | Yes | No |
Features
- 100% Pure Python — No native dependencies, works everywhere including PyPy and WebAssembly
- YAML 1.2 Compliant — Passes 100% of the official YAML test suite event parsing tests
- Comment Preservation — Load, modify, and save YAML while keeping comments intact
- Drop-in Replacement — Same API as PyYAML, just
pip installand go - Better Correctness — Passes 73 more tests than PyYAML on the official test suite
- No Build Required — Pure
pip installwith no C compiler needed - Modern Python — Requires Python 3.10+, uses modern language features
Table of Contents
- Installation
- Quick Start
- Comment Preservation
- Usage Guide
- API Reference
- Performance
- Migration from PyYAML
- Development
Installation
pip install pyyaml-pure
Note: This package provides the
yamlmodule. If you have PyYAML installed, you must uninstall it first:pip uninstall pyyaml pip install pyyaml-pure
Quick Start
import yaml
# Load YAML from a string
data = yaml.safe_load("""
name: John Doe
age: 30
skills:
- Python
- YAML
- APIs
""")
print(data)
# {'name': 'John Doe', 'age': 30, 'skills': ['Python', 'YAML', 'APIs']}
# Dump Python objects to YAML
print(yaml.safe_dump(data))
Comment Preservation
One of PyYAML-Pure's key features is the ability to preserve comments during round-trip loading and dumping. This is essential for configuration file editing where you want to maintain human-readable annotations.
Basic Usage
Use the comments=True parameter to enable comment preservation:
import yaml
# Load YAML with comments preserved
config = yaml.safe_load("""
# Database configuration
database:
host: localhost # primary database server
port: 5432
# Application settings
app:
debug: true # set to false in production
log_level: INFO
""", comments=True)
# Modify values - comments stay attached
config['database']['host'] = 'db.example.com'
config['app']['debug'] = False
# Save with comments preserved
output = yaml.safe_dump(config, comments=True)
print(output)
Output:
# Database configuration
database:
host: db.example.com # primary database server
port: 5432
# Application settings
app:
debug: false # set to false in production
log_level: INFO
What Gets Preserved
When comments=True:
- Header comments — Comments at the start of mappings/sequences
- Inline comments — Comments at the end of a line after a value
- Before-key comments — Comments on the line(s) before a key
CommentedMap and CommentedSeq
With comments=True, safe_load() returns special dict/list subclasses:
from yaml import CommentedMap, CommentedSeq
data = yaml.safe_load("key: value", comments=True)
print(type(data)) # <class 'yaml.CommentedMap'>
# These behave exactly like dict/list
assert isinstance(data, dict) # True
data['new_key'] = 'new_value' # Works normally
Adding Comments Programmatically
You can also add comments to data structures:
from yaml import CommentedMap, CommentGroup, Comment
# Create a commented map
config = CommentedMap({
'host': 'localhost',
'port': 5432
})
# Add a comment to a key
group = CommentGroup()
group.before.append(Comment('Database host'))
group.inline = Comment('primary server')
config.set_comment('host', group)
# Dump with comments
print(yaml.safe_dump(config, comments=True))
# Output:
# # Database host
# host: localhost # primary server
# port: 5432
Usage Guide
Loading YAML
import yaml
# Load from string
data = yaml.safe_load("key: value")
# Load from file
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Load with comment preservation
with open("config.yaml") as f:
config = yaml.safe_load(f, comments=True)
# Load multiple documents
docs = list(yaml.safe_load_all("""
---
document: 1
---
document: 2
"""))
# Load multiple documents with comments
docs = list(yaml.safe_load_all(multi_doc_yaml, comments=True))
Dumping YAML
import yaml
data = {
"database": {
"host": "localhost",
"port": 5432,
"name": "myapp"
},
"features": ["auth", "api", "admin"]
}
# Dump to string
yaml_string = yaml.safe_dump(data)
# Dump to file
with open("config.yaml", "w") as f:
yaml.safe_dump(data, f)
# Dump with options
yaml.safe_dump(
data,
default_flow_style=False, # Use block style
sort_keys=False, # Preserve key order
indent=4, # Custom indentation
width=120 # Line width
)
# Dump with comment preservation
yaml.safe_dump(data, comments=True)
# Dump multiple documents
docs = [{"doc": 1}, {"doc": 2}, {"doc": 3}]
yaml_string = yaml.safe_dump_all(docs)
Loader Security Levels
PyYAML-Pure provides multiple loaders with different security levels:
import yaml
# SafeLoader - RECOMMENDED for untrusted input
# Only loads basic Python types (dict, list, str, int, float, bool, None)
data = yaml.safe_load(yaml_string)
# FullLoader - For trusted input
# Loads standard Python types including dates
data = yaml.full_load(yaml_string)
# UnsafeLoader - DANGEROUS, only for fully trusted input
# Can construct arbitrary Python objects
data = yaml.unsafe_load(trusted_yaml_string)
Working with Anchors and Aliases
PyYAML-Pure fully supports YAML anchors and aliases:
defaults: &defaults
adapter: postgres
host: localhost
development:
<<: *defaults
database: dev_db
production:
<<: *defaults
database: prod_db
host: prod.example.com
config = yaml.safe_load(yaml_string)
print(config['development']['adapter']) # 'postgres'
print(config['production']['host']) # 'prod.example.com'
Custom Tags
import yaml
# Register a custom constructor
def construct_point(loader, node):
value = loader.construct_mapping(node)
return (value['x'], value['y'])
yaml.add_constructor('!point', construct_point, Loader=yaml.SafeLoader)
# Use the custom tag
data = yaml.safe_load("""
location: !point
x: 10
y: 20
""")
print(data['location']) # (10, 20)
API Reference
Loading Functions
| Function | Description |
|---|---|
safe_load(stream, *, comments=False) |
Load YAML safely (recommended) |
safe_load_all(stream, *, comments=False) |
Load multiple documents safely |
full_load(stream) |
Load with Python-specific types |
full_load_all(stream) |
Load multiple documents with Python types |
unsafe_load(stream) |
Load with arbitrary Python objects (dangerous) |
unsafe_load_all(stream) |
Load multiple documents unsafely |
load(stream, Loader) |
Load with specified Loader class |
load_all(stream, Loader) |
Load multiple documents with specified Loader |
Dumping Functions
| Function | Description |
|---|---|
safe_dump(data, stream=None, *, comments=False, **kwargs) |
Dump using safe representers |
safe_dump_all(docs, stream=None, *, comments=False, **kwargs) |
Dump multiple documents safely |
dump(data, stream=None, Dumper=Dumper, **kwargs) |
Dump with specified Dumper |
dump_all(docs, stream=None, Dumper=Dumper, **kwargs) |
Dump multiple documents |
Dump Options
| Option | Type | Default | Description |
|---|---|---|---|
default_flow_style |
bool | False |
Use flow style for collections |
default_style |
str | None |
Default scalar style |
canonical |
bool | False |
Output in canonical YAML format |
indent |
int | 2 |
Number of spaces for indentation |
width |
int | 80 |
Maximum line width |
allow_unicode |
bool | True |
Allow unicode characters |
line_break |
str | '\n' |
Line break character |
explicit_start |
bool | None |
Emit document start marker --- |
explicit_end |
bool | None |
Emit document end marker ... |
sort_keys |
bool | True |
Sort dictionary keys |
comments |
bool | False |
Preserve comments |
Classes
Loaders
SafeLoader— Safe loading (recommended)FullLoader— Load standard Python typesUnsafeLoader— Load arbitrary Python objectsLoader— Alias for FullLoader
Dumpers
SafeDumper— Safe dumpingDumper— Standard dumping
Comment Support
CommentedMap— Dict subclass that preserves commentsCommentedSeq— List subclass that preserves commentsComment— Represents a single commentCommentGroup— Groups before/inline/after comments
Nodes
Node— Base node classScalarNode— Scalar value nodeSequenceNode— Sequence (list) nodeMappingNode— Mapping (dict) node
Events
StreamStartEvent,StreamEndEventDocumentStartEvent,DocumentEndEventMappingStartEvent,MappingEndEventSequenceStartEvent,SequenceEndEventScalarEvent,AliasEvent
Exceptions
YAMLError— Base exceptionMarkedYAMLError— Exception with position information
Low-Level Functions
# Parse to events
for event in yaml.parse(yaml_string):
print(event)
# Compose to nodes
node = yaml.compose(yaml_string)
# Emit events
yaml_string = yaml.emit(events)
# Serialize nodes
yaml_string = yaml.serialize(node)
Performance
PyYAML-Pure is approximately 1.7x slower than PyYAML with its C extension. This is expected for pure Python vs C code.
| Benchmark | PyYAML | PyYAML-Pure | Ratio |
|---|---|---|---|
| Simple dict (100 keys) | 1.8 ms | 3.6 ms | 0.51x |
| Nested structures | 425 ms | 847 ms | 0.50x |
| Flow style | 9.3 ms | 12.5 ms | 0.74x |
| Dumping | 2.2 ms | 3.3 ms | 0.67x |
For typical use cases (configuration files, API responses), the performance difference is negligible.
When Performance Matters
- Choose PyYAML-Pure for configuration files, API responses, and most applications
- Choose PyYAML for processing multi-MB YAML files or millions of documents
Performance Optimizations
PyYAML-Pure includes several optimizations:
- Pre-compiled regex patterns
- Direct attribute access in hot paths
- Inlined critical parsing functions
- Efficient character set membership testing
Test Suite Compliance
PyYAML-Pure is tested against the official YAML Test Suite:
| Metric | PyYAML | PyYAML-Pure |
|---|---|---|
| Event Parsing | 235/308 (76.3%) | 308/308 (100%) |
| JSON Comparison | 213/279 (76.3%) | 261/279 (93.5%) |
PyYAML-Pure achieves a perfect score on event parsing, the primary correctness metric from matrix.yaml.info.
Migration from PyYAML
Migration is straightforward — just change your installation:
pip uninstall pyyaml
pip install pyyaml-pure
Your existing code continues to work unchanged:
import yaml # Now uses PyYAML-Pure
# All your existing code works
data = yaml.safe_load(yaml_string)
output = yaml.safe_dump(data)
New Features Available After Migration
After migrating, you can optionally use comment preservation:
# Before (works the same)
data = yaml.safe_load(config_yaml)
# After (new feature available)
data = yaml.safe_load(config_yaml, comments=True)
When to Use PyYAML-Pure
Choose PyYAML-Pure when:
- Comment preservation needed — Editing config files while keeping comments
- Correctness matters — Parsing YAML that PyYAML handles incorrectly
- Pure Python required — No C compiler, restricted environments, WebAssembly
- YAML 1.2 compliance — Need full spec compliance
- Using PyPy — Benefits from JIT compilation
- Debugging — Clearer Python stack traces
Choose PyYAML when:
- Maximum speed critical — Processing very large YAML files
- High volume — Parsing millions of YAML documents
- C extensions OK — Standard deployment environment
Development
Running Tests
# Install development dependencies
pip install pytest pytest-cov
# Run tests
pytest tests/
# Run with coverage
pytest tests/ --cov=yaml --cov-report=html
# Run specific test file
pytest tests/test_comments.py -v
Running Benchmarks
# Install PyYAML for comparison
pip install pyyaml
# Run benchmarks
python benchmarks/run_benchmark.py 10
Project Structure
pyyaml-pure/
├── yaml/ # Main package
│ ├── __init__.py # Public API
│ ├── _parser.py # YAML 1.2 recursive descent parser
│ ├── _scanner.py # String scanner for lexical analysis
│ ├── _comments.py # Comment preservation classes
│ ├── roundtrip.py # Round-trip loader/dumper with comments
│ ├── constructor.py # Node to Python object construction
│ ├── representer.py # Python object to node representation
│ ├── emitter.py # YAML output generation
│ ├── loader.py # Loader classes
│ ├── dumper.py # Dumper classes
│ ├── nodes.py # Node classes
│ ├── events.py # Event classes
│ ├── resolver.py # Tag resolution
│ └── error.py # Exception classes
├── tests/ # Test suite
│ ├── test_comments.py # Comment preservation tests
│ └── ...
├── benchmarks/ # Performance benchmarks
└── README.md
About This Project
This package was built as a pure Python alternative to PyYAML, focusing on correctness, YAML 1.2 compliance, and comment preservation. It was developed based on the psych-pure Ruby implementation and tested against the official YAML test suite.
This entire package was generated using Claude Code, Anthropic's AI-powered coding assistant. From the initial parser implementation to performance optimizations, comment preservation, and documentation, Claude Code assisted in writing, testing, and refining every component.
License
MIT License — see LICENSE for details.
See Also
- PyYAML — The original PyYAML library
- YAML 1.2 Specification — Official YAML specification
- YAML Test Suite — Official test suite
- YAML Test Matrix — Implementation comparison
- psych-pure — Ruby implementation this was based on
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyyaml_pure-0.1.0.tar.gz.
File metadata
- Download URL: pyyaml_pure-0.1.0.tar.gz
- Upload date:
- Size: 122.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb359fef4d23239abf99d4c50c31214c24eb86d50ff6f801610aa879453cc56e
|
|
| MD5 |
3e248e704e192f56b00bc4b68518c009
|
|
| BLAKE2b-256 |
9d515bdc8a3c5055cc910a02e5f784cc916908f485d3cc6de391962ae2aab79b
|
Provenance
The following attestation bundles were made for pyyaml_pure-0.1.0.tar.gz:
Publisher:
release.yml on milkstrawai/pyyaml-pure
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyyaml_pure-0.1.0.tar.gz -
Subject digest:
fb359fef4d23239abf99d4c50c31214c24eb86d50ff6f801610aa879453cc56e - Sigstore transparency entry: 810458791
- Sigstore integration time:
-
Permalink:
milkstrawai/pyyaml-pure@01044be0d95fc27d6d998235c8c82ddff08e480d -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/milkstrawai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@01044be0d95fc27d6d998235c8c82ddff08e480d -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyyaml_pure-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyyaml_pure-0.1.0-py3-none-any.whl
- Upload date:
- Size: 67.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db903ab1cec81037fe5cd369cc24652e44f5383073c07a23eccc1efb6c977a4b
|
|
| MD5 |
0b9559d47bf67f7b1135777bb7af59bc
|
|
| BLAKE2b-256 |
2f30c3b115e2a8f2ceea9049ceff268c40d2fb08b28bba9db2c9d83efda80b74
|
Provenance
The following attestation bundles were made for pyyaml_pure-0.1.0-py3-none-any.whl:
Publisher:
release.yml on milkstrawai/pyyaml-pure
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyyaml_pure-0.1.0-py3-none-any.whl -
Subject digest:
db903ab1cec81037fe5cd369cc24652e44f5383073c07a23eccc1efb6c977a4b - Sigstore transparency entry: 810458794
- Sigstore integration time:
-
Permalink:
milkstrawai/pyyaml-pure@01044be0d95fc27d6d998235c8c82ddff08e480d -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/milkstrawai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@01044be0d95fc27d6d998235c8c82ddff08e480d -
Trigger Event:
release
-
Statement type: