Skip to main content

A YAML round-trip library that preserves comments and insertion order

Project description

yarutsk

PyPI Python 3.12+ License: MIT

A Python YAML library that round-trips documents while preserving comments, insertion order, scalar styles, tags, anchors and aliases, blank lines, and explicit document markers.

What it does

Most YAML libraries silently drop comments on load. yarutsk keeps them attached to their keys — both inline (key: value # like this) and block-level (# above a key) — so a load → modify → dump cycle leaves the rest of the file intact.

import io
import yarutsk

doc = yarutsk.load(io.StringIO("""
# database config
host: localhost  # primary
port: 5432
"""))

doc["port"] = 5433

out = io.StringIO()
yarutsk.dump(doc, out)
print(out.getvalue())
# # database config
# host: localhost  # primary
# port: 5433

YamlMapping is a subclass of dict and YamlSequence is a subclass of list, so they work everywhere a dict or list is expected:

import json

doc = yarutsk.loads("name: Alice\nscores: [10, 20, 30]")

isinstance(doc, dict)           # True
isinstance(doc["scores"], list) # True
json.dumps(doc)                 # '{"name": "Alice", "scores": [10, 20, 30]}'

Round-trip fidelity

yarutsk reproduces the source text exactly for everything it understands. A loads followed by dumps gives back the original string byte-for-byte in the common case:

src = """\
defaults: &base
  timeout: 30
  retries: 3

service:
  name: api
  config: *base
"""
assert yarutsk.dumps(yarutsk.loads(src)) == src

Specifically preserved:

  • Scalar styles — plain, 'single-quoted', "double-quoted", literal block |, folded block >
  • Non-canonical scalarsyes/no/on/off, ~, Null, True/False, 0xFF, 0o77 — reproduced as written, not re-canonicalised to true/false/null/255
  • YAML tags!!str, !!python/tuple, and any custom tag are emitted back verbatim
  • Anchors and aliases&name on the anchor node and *name for references are preserved; the Python layer returns the resolved value transparently
  • Blank lines between mapping entries and sequence items
  • Explicit document markers--- and ...

Installation

pip install yarutsk

To build from source (requires Rust 1.85+ and uv):

git clone --recurse-submodules https://github.com/theyugin/yarutsk
cd yarutsk
make setup

API

Loading and dumping

# Load from stream (StringIO / BytesIO) — reads in 8 KB chunks, no full-file buffering
doc  = yarutsk.load(stream)            # first document
docs = yarutsk.load_all(stream)        # all documents as a list

# Load from string
doc  = yarutsk.loads(text)
docs = yarutsk.loads_all(text)

# Lazy document iteration — yields one document at a time, never accumulates all docs
for doc in yarutsk.iter_load_all(stream):   # from IO stream
    process(doc)

for doc in yarutsk.iter_loads_all(text):    # from string
    process(doc)

# Dump to stream — writes directly to the stream without intermediate string buffering
yarutsk.dump(doc, stream)
yarutsk.dump_all(docs, stream)

# Dump to string
text = yarutsk.dumps(doc)
text = yarutsk.dumps_all(docs)

# Custom indentation (default is 2 spaces)
text = yarutsk.dumps(doc, indent=4)

load / loads return a YamlMapping, YamlSequence, or YamlScalar (for a top-level scalar document), or None for empty input. Nested container nodes are YamlMapping or YamlSequence; scalar leaves inside mappings and sequences are returned as native Python primitives (int, float, bool, str, bytes, datetime.datetime, datetime.date, or None).

iter_load_all and iter_loads_all return a YamlIter object — an iterator that drives the parser on demand and yields documents one at a time. This lets you process large multi-document streams without holding all documents in memory simultaneously:

import io
import yarutsk

# Process a multi-document stream lazily
stream = io.StringIO("---\na: 1\n---\nb: 2\n---\nc: 3\n")
for doc in yarutsk.iter_load_all(stream):
    print(doc)   # {'a': 1}, then {'b': 2}, then {'c': 3}

load / load_all also stream from IO in 8 KB chunks rather than reading the entire input into a string first, but they still build and return the full document tree(s).

Type conversions

Implicit coercion

Plain YAML values (no tag) are converted to Python types automatically:

Value pattern Python type Examples
Decimal integer int 42, -7
Hex / octal integer int 0xFF255, 0o1715
Float float 3.14, 1.5e2, .inf, -.inf, .nan
true / false (any case) bool True, FALSE
yes / no / on / off (any case) bool YAML 1.1 booleans
null, Null, NULL, ~, empty value None
Anything else str hello, "quoted"

Non-canonical forms are reproduced as written on dump — yes stays yes, 0xFF stays 0xFF, ~ stays ~.

Explicit tags

A !!tag overrides implicit coercion and controls which Python type is returned:

Tag Python type Notes
!!str str Forces string even if the value looks like an int, bool, or null
!!int int Parses decimal, hex (0xFF), and octal (0o17)
!!float float Promotes integer literals (!!float 11.0)
!!bool bool
!!null None Forces null regardless of content (!!null ""None)
!!binary bytes Base64-decoded on load; base64-encoded on dump
!!timestamp datetime.datetime or datetime.date Date-only values return date; datetime values return datetime

Tags are preserved through the round-trip: load → dump reproduces the original tag and source text exactly.

import datetime

# !!binary
doc = yarutsk.loads("data: !!binary aGVsbG8=\n")
doc["data"]                            # b'hello'
yarutsk.dumps(doc)                     # 'data: !!binary aGVsbG8=\n'

# !!timestamp
doc = yarutsk.loads("ts: !!timestamp 2024-01-15T10:30:00\n")
doc["ts"]                              # datetime.datetime(2024, 1, 15, 10, 30)

doc = yarutsk.loads("ts: !!timestamp 2024-01-15\n")
doc["ts"]                              # datetime.date(2024, 1, 15)

# !!float promotes integers
doc = yarutsk.loads("x: !!float 1\n")
doc["x"]                               # 1.0  (float, not int)

# !!str forces a string
doc = yarutsk.loads("x: !!str 42\n")
doc["x"]                               # '42'

# Dumping Python bytes / datetime automatically produces the right tag
mapping = yarutsk.loads("x: placeholder\n")
mapping["x"] = b"hello"
yarutsk.dumps(mapping)                 # 'x: !!binary aGVsbG8=\n'

mapping["x"] = datetime.datetime(2024, 1, 15, 10, 30)
yarutsk.dumps(mapping)                 # 'x: !!timestamp 2024-01-15T10:30:00\n'

Schema — custom types

Schema lets you register loaders (tag → Python object, fired on load) and dumpers (Python type → tag + data, fired on dump). Pass it as a keyword argument to any load or dump function.

Mapping types

The loader receives a YamlMapping (dict-like); the dumper returns a (tag, dict) tuple:

import yarutsk

class Point:
    def __init__(self, x, y): self.x, self.y = x, y

schema = yarutsk.Schema()
schema.add_loader("!point", lambda d: Point(d["x"], d["y"]))
schema.add_dumper(Point, lambda p: ("!point", {"x": p.x, "y": p.y}))

doc = yarutsk.loads("origin: !point\n  x: 0\n  y: 0\n", schema=schema)
doc["origin"]                          # Point(0, 0)

doc["pos"] = Point(3, 4)               # assigning custom objects works too
yarutsk.dumps(doc, schema=schema)
# origin: !point
#   x: 0
#   y: 0
# pos: !point
#   x: 3
#   y: 4

Scalar types

The loader receives the raw scalar string; the dumper returns a (tag, str) tuple:

class Color:
    def __init__(self, r, g, b): self.r, self.g, self.b = r, g, b

schema = yarutsk.Schema()
schema.add_loader("!color", lambda s: Color(*[int(x) for x in s.split(",")]))
schema.add_dumper(Color, lambda c: ("!color", f"{c.r},{c.g},{c.b}"))

doc = yarutsk.loads("bg: !color 255,0,128\n", schema=schema)
doc["bg"]                              # Color(255, 0, 128)
yarutsk.dumps(doc, schema=schema)      # 'bg: !color 255,0,128\n'

Controlling style from a dumper

A dumper's second return value can be a YamlScalar, YamlMapping, or YamlSequence to control emitted style. The tag from the first return value is stamped on top:

class Color:
    def __init__(self, r, g, b): self.r, self.g, self.b = r, g, b

schema = yarutsk.Schema()
schema.add_loader("!color", lambda s: Color(*[int(x) for x in s.split(",")]))

# Emit the value double-quoted
schema.add_dumper(Color, lambda c: (
    "!color",
    yarutsk.YamlScalar(f"{c.r},{c.g},{c.b}", style="double"),
))

doc = yarutsk.loads("bg: placeholder\n")
doc["bg"] = Color(255, 0, 128)
yarutsk.dumps(doc, schema=schema)      # 'bg: !color "255,0,128"\n'

Similarly, returning a YamlMapping(style="flow") or YamlSequence(style="flow") from a dumper emits the container in flow style.

Overriding built-in tags

Registering a loader for !!int, !!float, !!bool, !!null, or !!str bypasses the built-in coercion. The callable receives the raw YAML string rather than the already-converted Python value:

schema = yarutsk.Schema()
schema.add_loader("!!int", lambda raw: int(raw, 0))  # parses 0xFF, 0o77, etc.

doc = yarutsk.loads("x: !!int 0xFF\n", schema=schema)
doc["x"]                               # 255

Multiple dumpers for the same type are checked in registration order; the first isinstance match wins.

YamlScalar

Top-level scalar documents are wrapped in a YamlScalar node:

doc = yarutsk.loads("42")
doc.value                              # 42 (Python int)
doc.to_dict()                          # same as .value

# Scalar style
doc = yarutsk.loads("---\n'hello'\n")
doc.style                              # 'single'
doc.style = "double"                   # 'plain'|'single'|'double'|'literal'|'folded'

# YAML tag
doc = yarutsk.loads("!!str 42")
doc.tag                                # '!!str'
doc.tag = None                         # clear tag

# Anchor
doc = yarutsk.loads("value: &val 42\n")
doc.node("value").anchor               # 'val'
doc.anchor = "root"                    # set anchor on top-level scalar

# Explicit document markers
doc = yarutsk.loads("---\n42\n...")
doc.explicit_start                     # True
doc.explicit_end                       # True
doc.explicit_start = False
doc.explicit_end   = False

# Document-level directives (top-level node only)
doc = yarutsk.loads("%YAML 1.1\n---\n42\n")
doc.yaml_version                       # '1.1'
doc.yaml_version = "1.2"
doc.tag_directives                     # list of (handle, prefix) tuples
doc.tag_directives = [("!", "tag:example.com,2024:")]

YamlScalar can also be constructed directly to control how a value is emitted when assigned into a mapping or sequence:

import yarutsk

doc = yarutsk.loads("x: placeholder\n")

# Assign a double-quoted string
doc["x"] = yarutsk.YamlScalar("hello", style="double")
yarutsk.dumps(doc)                     # 'x: "hello"\n'

# Assign a plain string with a custom tag
doc["x"] = yarutsk.YamlScalar("42", tag="!!str")
yarutsk.dumps(doc)                     # 'x: !!str 42\n'

Constructor signature: YamlScalar(value, *, style="plain", tag=None)

  • value — a Python primitive: bool, int, float, str, or None
  • style"plain" (default), "single", "double", "literal", "folded"
  • tag — YAML tag string, e.g. "!!str", "!mytag", or None

YamlMapping

YamlMapping is a subclass of dict with insertion-ordered keys. It can be constructed directly with a style and optional tag:

# Create a flow-style mapping and populate it
m = yarutsk.YamlMapping(style="flow")
m["x"] = 1
m["y"] = 2

doc = yarutsk.loads("point: placeholder\n")
doc["point"] = m
yarutsk.dumps(doc)                     # 'point: {x: 1, y: 2}\n'

Constructor signature: YamlMapping(mapping=None, *, style="block", tag=None)

  • mapping — optional initial data: a plain dict, another YamlMapping (inner metadata preserved), or any mapping; if omitted the mapping starts empty
  • style"block" (default) or "flow"
  • tag — YAML tag string, or None
# From a plain dict
m = yarutsk.YamlMapping({"a": 1, "b": 2}, style="flow")
yarutsk.dumps(m)                       # '{a: 1, b: 2}\n'

# From an existing YamlMapping (preserves comments, styles, anchors)
src = yarutsk.loads("x: 1\ny: 2\n")
m = yarutsk.YamlMapping(src, style="flow")
m.style                                # 'flow'

All standard dict operations work directly:

# Standard dict interface (inherited)
doc["key"]                             # get (KeyError if missing)
doc["key"] = value                     # set (preserves position if key exists)
del doc["key"]                         # delete
"key" in doc                           # membership test
len(doc)                               # number of entries
for key in doc: ...                    # iterate over keys in order
doc.keys()                             # KeysView in insertion order
doc.values()                           # ValuesView in insertion order
doc.items()                            # ItemsView of (key, value) pairs
doc.get("key")                         # returns None if missing
doc.get("key", default)                # returns default if missing
doc.pop("key")                         # remove & return (KeyError if missing)
doc.pop("key", default)                # remove & return, or default
doc.setdefault("key", default)         # get or insert default
doc.update(other)                      # merge from dict or YamlMapping
doc == {"a": 1}                        # equality comparison

# Works with any dict-expecting library
isinstance(doc, dict)                  # True
json.dumps(doc)                        # works

# Conversion
doc.to_dict()                          # deep conversion to plain Python dict

# Comments — overload form (1-arg = get, 2-arg = set; pass None to clear)
doc.comment_inline("key")             # -> str | None
doc.comment_before("key")             # -> str | None
doc.comment_inline("key", text)
doc.comment_before("key", text)

# Comments — explicit form (same semantics, better IDE support)
doc.get_comment_inline("key")         # -> str | None
doc.get_comment_before("key")         # -> str | None
doc.set_comment_inline("key", text)   # pass None to clear
doc.set_comment_before("key", text)   # pass None to clear

# YAML tag
doc.tag                                # -> str | None  (e.g. '!!python/object:Foo')
doc.tag = "!!map"

# Anchor
doc.anchor                             # -> str | None  (e.g. 'base' from `base: &base ...`)
doc.anchor = "myanchor"               # set anchor; emits `&myanchor` before the mapping

# Explicit document markers (top-level node only)
doc.explicit_start                     # bool
doc.explicit_end                       # bool
doc.explicit_start = True
doc.explicit_end   = True

# Document-level directives (top-level node only)
doc.yaml_version                       # -> str | None  (e.g. '1.1')
doc.yaml_version = "1.2"
doc.tag_directives                     # -> list[tuple[str, str]]  (handle, prefix pairs)
doc.tag_directives = [("!", "tag:example.com,2024:")]

# Node access — returns YamlScalar/YamlMapping/YamlSequence preserving style/tag/anchor
node = doc.node("key")                # KeyError if absent

# Scalar style shortcut
# Raises TypeError if the value is not a scalar (use container_style() instead)
doc.scalar_style("key", "single")     # 'plain'|'single'|'double'|'literal'|'folded'

# Assign a styled scalar in one expression using YamlScalar
doc["key"] = yarutsk.YamlScalar("value", style="double")

# Container style (read from source; also settable to switch block ↔ flow)
doc.style                              # -> 'block' | 'flow'
doc.style = "flow"                     # emit as {key: value, ...}

# Assign a styled nested container using YamlMapping / YamlSequence
doc["nested"] = yarutsk.YamlMapping(style="flow")
doc["nested"]["x"] = 1

# Blank lines before a key (1-arg = get, 2-arg = set)
doc.blank_lines_before("key")         # -> int
doc.blank_lines_before("key", 2)      # emit 2 blank lines before this key
doc.trailing_blank_lines              # blank lines after all entries
doc.trailing_blank_lines = 1

# Sorting
doc.sort_keys()                        # alphabetical, in-place
doc.sort_keys(reverse=True)            # reverse alphabetical
doc.sort_keys(key=lambda k: len(k))    # custom key function on key strings
doc.sort_keys(recursive=True)          # also sort all nested mappings

# Aliases — check and mark values as YAML aliases
doc.alias_name("key")                  # -> str if value is an alias, else None; KeyError if absent
doc.set_alias("key", "anchor")         # mark "key"'s value as emitting *anchor; resolved value unchanged

# Normalize formatting (reset all cosmetic metadata to YAML defaults)
doc.format()                           # reset styles, comments, and blank lines
doc.format(comments=False)            # reset styles + blank lines, keep comments
doc.format(styles=False)              # clear comments + blank lines, keep styles

Aliases round-trip automatically — base: &anchor 1\nref: *anchor\n emits back intact. alias_name lets you detect which values are aliases programmatically, and set_alias lets you add new aliases in code:

# Detect parsed aliases
doc = yarutsk.loads("base: &val 1\nref: *val\n")
doc.alias_name("ref")                  # 'val'
doc.alias_name("base")                 # None  (has anchor, not alias)
doc["ref"]                             # 1  (resolved value always accessible)

# Create a new alias programmatically
doc = yarutsk.loads("base: &anchor hello\nother: hello\n")
doc.set_alias("other", "anchor")
doc.alias_name("other")                # 'anchor'
doc["other"]                           # 'hello'
yarutsk.dumps(doc)                     # 'base: &anchor hello\nother: *anchor\n'

YamlSequence

YamlSequence is a subclass of list. It can be constructed directly with a style and optional tag:

# Create a flow-style sequence
s = yarutsk.YamlSequence(style="flow")
s.append(1)
s.append(2)
s.append(3)

doc = yarutsk.loads("values: placeholder\n")
doc["values"] = s
yarutsk.dumps(doc)                     # 'values: [1, 2, 3]\n'

Constructor signature: YamlSequence(iterable=None, *, style="block", tag=None)

  • iterable — optional initial data: a YamlSequence (inner metadata preserved), any other iterable, or None for an empty sequence
  • style"block" (default) or "flow"
  • tag — YAML tag string, or None
# From a plain list / iterable
s = yarutsk.YamlSequence([1, 2, 3], style="flow")
yarutsk.dumps(s)                       # '[1, 2, 3]\n'

s = yarutsk.YamlSequence(range(3))
list(s)                                # [0, 1, 2]

# From an existing YamlSequence (preserves comments, styles, anchors)
src = yarutsk.loads("- 1\n- 2\n")
s = yarutsk.YamlSequence(src, style="flow")
s.style                                # 'flow'

All standard list operations work directly:

# Standard list interface (inherited)
doc[0]                                 # get by index (negative indices supported)
doc[0] = value                         # set by index
del doc[0]                             # delete by index
value in doc                           # membership test
len(doc)                               # number of items
for item in doc: ...                   # iterate over items
doc.append(value)                      # add to end
doc.insert(idx, value)                 # insert before index
doc.pop()                              # remove & return last item
doc.pop(idx)                           # remove & return item at index
doc.remove(value)                      # remove first occurrence (ValueError if missing)
doc.extend(iterable)                   # append items from list or YamlSequence
doc.index(value)                       # index of first occurrence
doc.count(value)                       # number of occurrences
doc.reverse()                          # reverse in-place
doc == [1, 2, 3]                       # equality comparison

# Works with any list-expecting library
isinstance(doc, list)                  # True
json.dumps(doc)                        # works

# Conversion
doc.to_dict()                          # deep conversion to plain Python list

# Comments — overload form (1-arg = get, 2-arg = set; pass None to clear)
doc.comment_inline(idx)               # -> str | None
doc.comment_before(idx)               # -> str | None
doc.comment_inline(idx, text)
doc.comment_before(idx, text)

# Comments — explicit form (same semantics, better IDE support)
doc.get_comment_inline(idx)           # -> str | None
doc.get_comment_before(idx)           # -> str | None
doc.set_comment_inline(idx, text)     # pass None to clear
doc.set_comment_before(idx, text)     # pass None to clear

# YAML tag
doc.tag                                # -> str | None  (e.g. '!!python/tuple')
doc.tag = None

# Anchor
doc.anchor                             # -> str | None
doc.anchor = "mylist"                 # set anchor; emits `&mylist` before the sequence

# Explicit document markers (top-level node only)
doc.explicit_start                     # bool
doc.explicit_end                       # bool
doc.explicit_start = True
doc.explicit_end   = True

# Document-level directives (top-level node only)
doc.yaml_version                       # -> str | None  (e.g. '1.1')
doc.yaml_version = "1.2"
doc.tag_directives                     # -> list[tuple[str, str]]  (handle, prefix pairs)
doc.tag_directives = [("!", "tag:example.com,2024:")]

# Scalar style shortcut
# Raises TypeError if the item is not a scalar (use container_style() instead)
doc.scalar_style(0, "double")         # 'plain'|'single'|'double'|'literal'|'folded'

# Assign a styled scalar in one expression using YamlScalar
doc[0] = yarutsk.YamlScalar("item", style="single")

# Container style
doc.style                              # -> 'block' | 'flow'
doc.style = "flow"                     # emit as [item, ...]

# Blank lines before an item (1-arg = get, 2-arg = set)
doc.blank_lines_before(0)             # -> int
doc.blank_lines_before(0, 1)          # emit 1 blank line before item 0
doc.trailing_blank_lines              # blank lines after all items
doc.trailing_blank_lines = 0

# Sorting (preserves comment metadata)
doc.sort()                             # natural order, in-place
doc.sort(reverse=True)
doc.sort(key=lambda v: len(v))         # custom key function on item values
doc.sort(recursive=True)              # also sort nested mappings (by key) and sequences

# Aliases — check and mark items as YAML aliases
doc.alias_name(idx)                    # -> str if item is an alias, else None; IndexError if out of range
doc.set_alias(idx, "anchor")           # mark item at idx as emitting *anchor; resolved value unchanged

# Normalize formatting (reset all cosmetic metadata to YAML defaults)
doc.format()                           # reset styles, comments, and blank lines
doc.format(comments=False)            # reset styles + blank lines, keep comments
doc.format(styles=False)              # clear comments + blank lines, keep styles

Sorting preserves all comments — each entry or item carries its inline and before-key comments with it when reordered.

Normalizing formatting

format() strips all cosmetic metadata and resets the document to clean YAML defaults. Useful for diffing values without noise, canonicalizing config files, or stripping comments before committing:

src = """\
# Config
server:
  host: 'localhost'  # primary
  port: 8080

  debug: yes
"""

doc = yarutsk.loads(src)
doc.format()
print(yarutsk.dumps(doc))
# server:
#   host: localhost
#   port: 8080
#   debug: yes

format() is available on YamlMapping, YamlSequence, and YamlScalar. It recurses into all nested containers automatically.

Three keyword flags (all True by default) control what is reset:

Flag Effect
styles=True Scalar quoting → plain (multiline strings → literal block |); container style → block; non-canonical originals (0xFF, 1.5e10) cleared so they emit canonically
comments=True comment_before and comment_inline cleared on every entry/item
blank_lines=True blank_lines_before zeroed on every entry/item; trailing_blank_lines zeroed on containers

Tags, anchors, and document-level markers (explicit_start, yaml_version, etc.) are always preserved — they are semantic, not cosmetic.

# Preserve comments, reset only styles and blank lines
doc.format(comments=False)

# Preserve styles, clear only comments and blank lines
doc.format(styles=False)

# Clear only blank lines
doc.format(styles=False, comments=False)

Comparison

Feature yarutsk ruamel.yaml PyYAML
Comments preserved Yes Yes No
Scalar styles preserved Yes Partial No
Insertion order preserved Yes Yes No
Blank lines preserved Yes Partial No
Tags preserved Yes Yes No
Anchors/aliases preserved Yes Yes No
dict / list subclasses Yes No No
Rust speed Yes No No
Python 3.12+ required Yes No No

yarutsk focuses on round-trip fidelity: if you need to edit a config file and emit it back without touching the formatting, it keeps every comment, blank line, and scalar quote style exactly as written. ruamel.yaml offers similar round-trip support in pure Python. PyYAML is faster for load-only workloads where output formatting doesn't matter.

Error handling

yarutsk defines a small exception hierarchy rooted at YarutskError:

Exception
└── YarutskError         # base for all library errors
    ├── ParseError       # malformed YAML input
    ├── LoaderError      # schema loader callable raised
    └── DumperError      # schema dumper raised or returned wrong type
import yarutsk

# Malformed YAML → yarutsk.ParseError
# Message includes the error description and source position (line, column).
try:
    yarutsk.loads("key: [unclosed")
except yarutsk.ParseError as e:
    print(e)   # while parsing a flow sequence, expected ',' or ']' at byte ...

# Catch any library error with the base class
try:
    yarutsk.loads("key: [unclosed")
except yarutsk.YarutskError as e:
    print(e)

# Schema loader raises → yarutsk.LoaderError
# Message includes the tag name so you know which loader misbehaved.
schema = yarutsk.Schema()
schema.add_loader("!color", lambda s: s.split(","))  # expects str
try:
    yarutsk.loads("bg: !color\n  r: 255\n  g: 0\n  b: 128\n", schema=schema)
except yarutsk.LoaderError as e:
    print(e)   # Schema loader for tag '!color' raised: AttributeError: ...

# Schema dumper raises or returns the wrong type → yarutsk.DumperError
# Message includes the Python type name.
schema = yarutsk.Schema()
schema.add_dumper(MyType, lambda x: "not-a-tuple")  # must return (tag, data)
try:
    yarutsk.dumps(doc, schema=schema)
except yarutsk.DumperError as e:
    print(e)   # Schema dumper for MyType must return (tag, data) tuple: ...

# Unsupported Python type (no schema dumper registered) → TypeError
try:
    yarutsk.dumps({"key": {1, 2}})  # sets are not supported
except TypeError as e:
    print(e)

# Missing key → KeyError  (standard dict behaviour)
doc = yarutsk.loads("a: 1")
doc["missing"]               # KeyError: 'missing'
doc.comment_inline("missing")  # KeyError: 'missing'

# Bad index → IndexError  (standard list behaviour)
seq = yarutsk.loads("- 1\n- 2")
seq[99]                      # IndexError

Limitations

  • Integer range: integers are stored as 64-bit signed (i64). Values outside [-9223372036854775808, 9223372036854775807] are loaded as strings.
  • Underscore separators: 1_000 is not parsed as an integer — it is loaded as the string "1_000" (and round-tripped faithfully as such).
  • Blank line cap: at most 255 blank lines before any entry are tracked; runs longer than that are clamped to 255 on load.
  • Block only by default: the emitter writes block-style YAML. Flow containers ({...} / [...]) from the source are preserved if they were already flow-style, but there is no option to force everything to flow on dump.
  • Memory per document: each individual document must fit in memory. load / load_all stream from IO in 8 KB chunks so the raw source text is not buffered as a whole, but the resulting Python objects still live in memory. For large multi-document streams use iter_load_all / iter_loads_all to process one document at a time without accumulating the full list.
  • YAML version: the scanner implements YAML 1.1 boolean/null coercion (yes/no/on/off/~). Most YAML 1.2-only documents load correctly, but inputs that rely on strict YAML 1.2 semantics may differ.

Benchmarks

Compare load, dump, and round-trip performance against PyYAML and ruamel.yaml across small, medium, and large inputs:

make bench

Running tests

You need Rust 1.85+ and Python 3.12+ with uv. Python 3.12 is the minimum — YamlSequence subclasses list, which requires PyO3's extends = PyList support introduced in Python 3.12.

# 1. Clone with the yaml-test-suite submodule
git clone --recurse-submodules https://github.com/theyugin/yarutsk
cd yarutsk

# 2. Install dependencies and build
make setup

# 3. Run the suites
make test        # core library tests (fast)
make test-all    # all tests including yaml-test-suite compliance

test_yaml_suite.py requires the yaml-test-suite submodule. Test cases marked fail: true in the suite metadata (invalid YAML that a conformant parser must reject) are marked xfail(strict=True)test_parse is expected to fail on them, and an unexpected pass is treated as an error. Cases marked skip: true in the suite metadata are skipped entirely.

Other useful targets:

make build          # debug build
make build-release  # optimised build
make lint           # ruff + cargo clippy
make fmt            # auto-format Python and Rust
make typecheck      # mypy strict check on stubs
make test-roundtrip # round-trip fidelity tests only

Run make help for the full list.

Internals

The scanner and parser are vendored from yaml-rust2 (MIT licensed) with one targeted modification: the comment-skipping loop in the scanner now emits Comment tokens instead of discarding them. Everything else — block/flow parsing, scalar type coercion, multi-document support — comes from yaml-rust2 unchanged. The builder layer wires those tokens to the data model, and a hand-written block-style emitter serialises it back out.

YamlMapping and YamlSequence are PyO3 pyclasses that extend Python's built-in dict and list types. A Rust inner field stores the full YAML data model (including comments); the parent dict/list is kept in sync on every mutation so that all standard Python operations work transparently.

Disclaimer

This library was created with Claude Code (Anthropic). The design, implementation, tests, and this README were written by Claude under human direction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yarutsk-0.3.6.tar.gz (183.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

yarutsk-0.3.6-cp314-cp314-win_amd64.whl (549.7 kB view details)

Uploaded CPython 3.14Windows x86-64

yarutsk-0.3.6-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (662.8 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

yarutsk-0.3.6-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.2 MB view details)

Uploaded CPython 3.14macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

yarutsk-0.3.6-cp313-cp313-win_amd64.whl (554.2 kB view details)

Uploaded CPython 3.13Windows x86-64

yarutsk-0.3.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (670.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

yarutsk-0.3.6-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.3 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

yarutsk-0.3.6-cp312-cp312-win_amd64.whl (554.7 kB view details)

Uploaded CPython 3.12Windows x86-64

yarutsk-0.3.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (671.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

yarutsk-0.3.6-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.3 MB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file yarutsk-0.3.6.tar.gz.

File metadata

  • Download URL: yarutsk-0.3.6.tar.gz
  • Upload date:
  • Size: 183.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yarutsk-0.3.6.tar.gz
Algorithm Hash digest
SHA256 d5bf6a5cf2a37f477581ab1e9ca1980f50214a4270f5470cc4a4250b5c22f4fa
MD5 d8c5a7df4ece5494210700baf911e9ef
BLAKE2b-256 4bbf8a7053f28f4461099f264f82201adbbeedfb434ab72f990c3eb14e764796

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6.tar.gz:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: yarutsk-0.3.6-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 549.7 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yarutsk-0.3.6-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 1b877583e83fa4734f857f55e1c539254c159dea9ab5c491a4d1ecbe6f3e34a6
MD5 a2a44db25f55d70bbd32a1a9f9495c22
BLAKE2b-256 8a4091dc173308f21eaf77f7f9544460cd1102e38279c13468ede197a9a79a19

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp314-cp314-win_amd64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for yarutsk-0.3.6-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0174e14294bdd1af02098f8329d8c36f6300b28843ff76f1a1e1ebd7d8393b6e
MD5 2daf08a2ccae7c14f2ca4fca33f4cfab
BLAKE2b-256 2b06d9cdfa6e06551ddcdef1d430e713f9444a7b55565499e6d4065bd8a865be

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for yarutsk-0.3.6-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 a8d0a566c3ce5af8603c4950d41cc680bf85aeccaaa5cab2d9f241a04f16a4fb
MD5 991551bbd5bf0b511121ad109f4c846d
BLAKE2b-256 2ed63c3b1fbb578ed205b399b402340c55b55f74232826032b0bb2a6b87cd921

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: yarutsk-0.3.6-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 554.2 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yarutsk-0.3.6-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 a78fc07c5bcc0e3adacaf5d680ff7a65b73b53066b126734be1b084421fe10d5
MD5 bc6780be92bbf8653fee46e8f41d1f41
BLAKE2b-256 55f6efb5e5838e01a17573b3e92910a5388accbb7e274d4d12802aee7bade604

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp313-cp313-win_amd64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for yarutsk-0.3.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8e4e00d6034d33b8b07b7e6d5a9926f1e030361a39a877e887937e346e00a95b
MD5 d1844ae9672b6e44f1cb817141a9a5fe
BLAKE2b-256 d8119032c05509b9177e3d3651b43507825414b87c757701c373958045125cf3

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for yarutsk-0.3.6-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 c85cf200ecde0e401f67be9afee0ead4873fdd20530c55bc5768531a0cc80cc4
MD5 66d037dd2b70d06b708f99f3b113cdb0
BLAKE2b-256 debae2212a4cfef2ad2cbe04fc7bfa55b7302f2654aa6fa81f6d7747aef8dfd8

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: yarutsk-0.3.6-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 554.7 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yarutsk-0.3.6-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 8acaa024c8cde7b43bdd399a00dda3ec6336a11e22066ceebbfd4ec6fe8e6ced
MD5 ce695a8c390f876baad5c1e91c957dc5
BLAKE2b-256 3046e8cd29395fe845217ab04504bb4b43c0c16c0cf6e14f67315b103465e017

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp312-cp312-win_amd64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for yarutsk-0.3.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2e52a641488a5fcc8ee4c48354a65e2267d4f73cc689a547f9a2a6dc8cb720de
MD5 e1c40c457edc857e85a924c8998d338b
BLAKE2b-256 657723f5f9354dcc875a7c367d1407d748e2dc5316bfb4004b7b8879a43022cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.3.6-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for yarutsk-0.3.6-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 85beb8f5456f9464feb4b4f3de350177d75ba71f5b9511668a475dea5cd65cce
MD5 34dfa3341e17d16fbf27e43f08ec7085
BLAKE2b-256 f60b926d516c2b6e3f2d3beb0938cca3a38d762d8d0b6883cceeb6bea5f8dd0a

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.3.6-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page