Skip to main content

A YAML round-trip library that preserves comments and insertion order

Project description

yarutsk

A Python YAML library that round-trips documents while preserving comments, insertion order, scalar styles, tags, anchors and aliases, blank lines, and explicit document markers.

What it does

Most YAML libraries silently drop comments on load. yarutsk keeps them attached to their keys — both inline (key: value # like this) and block-level (# above a key) — so a load → modify → dump cycle leaves the rest of the file intact.

import io
import yarutsk

doc = yarutsk.load(io.StringIO("""
# database config
host: localhost  # primary
port: 5432
"""))

doc["port"] = 5433

out = io.StringIO()
yarutsk.dump(doc, out)
print(out.getvalue())
# # database config
# host: localhost  # primary
# port: 5433

YamlMapping is a subclass of dict and YamlSequence is a subclass of list, so they work everywhere a dict or list is expected:

import json

doc = yarutsk.loads("name: Alice\nscores: [10, 20, 30]")

isinstance(doc, dict)           # True
isinstance(doc["scores"], list) # True
json.dumps(doc)                 # '{"name": "Alice", "scores": [10, 20, 30]}'

Round-trip fidelity

yarutsk reproduces the source text exactly for everything it understands. A loads followed by dumps gives back the original string byte-for-byte in the common case:

src = """\
defaults: &base
  timeout: 30
  retries: 3

service:
  name: api
  config: *base
"""
assert yarutsk.dumps(yarutsk.loads(src)) == src

Specifically preserved:

  • Scalar styles — plain, 'single-quoted', "double-quoted", literal block |, folded block >
  • Non-canonical scalarsyes/no/on/off, ~, Null, True/False, 0xFF, 0o77 — reproduced as written, not re-canonicalised to true/false/null/255
  • YAML tags!!str, !!python/tuple, and any custom tag are emitted back verbatim
  • Anchors and aliases&name on the anchor node and *name for references are preserved; the Python layer returns the resolved value transparently
  • Blank lines between mapping entries and sequence items
  • Explicit document markers--- and ...

Installation

Built with Maturin. From the repo root:

pip install maturin
maturin develop

API

Loading and dumping

# Load from stream (StringIO / BytesIO)
doc  = yarutsk.load(stream)            # first document
docs = yarutsk.load_all(stream)        # all documents as a list

# Load from string
doc  = yarutsk.loads(text)
docs = yarutsk.loads_all(text)

# Dump to stream
yarutsk.dump(doc, stream)
yarutsk.dump_all(docs, stream)

# Dump to string
text = yarutsk.dumps(doc)
text = yarutsk.dumps_all(docs)

load / loads return a YamlMapping, YamlSequence, YamlScalar, or None (for empty input). Nested nodes are also YamlMapping or YamlSequence; scalar leaves are returned as native Python primitives (int, float, bool, str, bytes, datetime.datetime, datetime.date, or None).

Type tags

Standard YAML type tags affect the Python type returned from a load:

Tag Python type Notes
!!str str Forces string even if value looks like an int/bool/null
!!int int Overrides if inference gave a different type
!!float float Promotes integer literals (!!float 11.0)
!!bool bool Overrides if inference gave a different type
!!null None Forces null regardless of value (e.g. !!null ""None)
!!binary bytes Base64-decoded on load; base64-encoded on dump
!!timestamp datetime.datetime or datetime.date Date-only values (2024-01-15) return date; datetime values return datetime

Tags are preserved for round-trip: load → dump reproduces the original tag and source text.

# !!binary
doc = yarutsk.loads("data: !!binary aGVsbG8=\n")
doc["data"]                            # b'hello'
yarutsk.dumps(doc)                     # 'data: !!binary aGVsbG8=\n'

# Dumping Python bytes / datetime creates the appropriate tag automatically
import datetime
mapping = yarutsk.loads("x: placeholder\n")
mapping["x"] = b"hello"
yarutsk.dumps(mapping)                 # 'x: !!binary aGVsbG8=\n'

mapping["x"] = datetime.datetime(2024, 1, 15, 10, 30)
yarutsk.dumps(mapping)                 # 'x: !!timestamp 2024-01-15T10:30:00\n'

# !!timestamp
doc = yarutsk.loads("ts: !!timestamp 2024-01-15T10:30:00\n")
doc["ts"]                              # datetime.datetime(2024, 1, 15, 10, 30)

doc = yarutsk.loads("ts: !!timestamp 2024-01-15\n")
doc["ts"]                              # datetime.date(2024, 1, 15)

# Standard schema tag coercion
doc = yarutsk.loads("x: !!float 1\n")
doc["x"]                               # 1.0 (float, not int)

doc = yarutsk.loads('x: !!null ""\n')
doc["x"]                               # None

YamlScalar

Top-level scalar documents are wrapped in a YamlScalar node:

doc = yarutsk.loads("42")
doc.value                              # 42 (Python int)
doc.to_dict()                          # same as .value

# Scalar style
doc = yarutsk.loads("---\n'hello'\n")
doc.style                              # 'single'
doc.style = "double"                   # 'plain'|'single'|'double'|'literal'|'folded'

# YAML tag
doc = yarutsk.loads("!!str 42")
doc.get_tag()                          # '!!str'
doc.set_tag(None)                      # clear tag

# Explicit document markers
doc = yarutsk.loads("---\n42\n...")
doc.explicit_start                     # True
doc.explicit_end                       # True
doc.explicit_start = False
doc.explicit_end   = False

YamlMapping

YamlMapping is a subclass of dict with insertion-ordered keys. All standard dict operations work directly:

# Standard dict interface (inherited)
doc["key"]                             # get (KeyError if missing)
doc["key"] = value                     # set (preserves position if key exists)
del doc["key"]                         # delete
"key" in doc                           # membership test
len(doc)                               # number of entries
for key in doc: ...                    # iterate over keys in order
doc.keys()                             # KeysView in insertion order
doc.values()                           # ValuesView in insertion order
doc.items()                            # ItemsView of (key, value) pairs
doc.get("key")                         # returns None if missing
doc.get("key", default)                # returns default if missing
doc.pop("key")                         # remove & return (KeyError if missing)
doc.pop("key", default)                # remove & return, or default
doc.setdefault("key", default)         # get or insert default
doc.update(other)                      # merge from dict or YamlMapping
doc == {"a": 1}                        # equality comparison

# Works with any dict-expecting library
isinstance(doc, dict)                  # True
json.dumps(doc)                        # works

# Conversion
doc.to_dict()                          # deep conversion to plain Python dict

# Comments
doc.get_comment_inline("key")          # -> str | None
doc.get_comment_before("key")          # -> str | None
doc.set_comment_inline("key", text)
doc.set_comment_before("key", text)

# YAML tag
doc.get_tag()                          # -> str | None  (e.g. '!!python/object:Foo')
doc.set_tag("!!map")

# Explicit document markers
doc.explicit_start                     # bool
doc.explicit_end                       # bool
doc.explicit_start = True
doc.explicit_end   = True

# Node access — returns YamlScalar/YamlMapping/YamlSequence preserving style/tag/anchor
node = doc.get_node("key")            # KeyError if absent

# Scalar style shortcut (equivalent to: doc.get_node("key").style = "single")
doc.set_scalar_style("key", "single") # 'plain'|'single'|'double'|'literal'|'folded'

# Sorting
doc.sort_keys()                        # alphabetical, in-place
doc.sort_keys(reverse=True)            # reverse alphabetical
doc.sort_keys(key=lambda k: len(k))    # custom key function on key strings
doc.sort_keys(recursive=True)          # also sort all nested mappings

YamlSequence

YamlSequence is a subclass of list. All standard list operations work directly:

# Standard list interface (inherited)
doc[0]                                 # get by index (negative indices supported)
doc[0] = value                         # set by index
del doc[0]                             # delete by index
value in doc                           # membership test
len(doc)                               # number of items
for item in doc: ...                   # iterate over items
doc.append(value)                      # add to end
doc.insert(idx, value)                 # insert before index
doc.pop()                              # remove & return last item
doc.pop(idx)                           # remove & return item at index
doc.remove(value)                      # remove first occurrence (ValueError if missing)
doc.extend(iterable)                   # append items from list or YamlSequence
doc.index(value)                       # index of first occurrence
doc.count(value)                       # number of occurrences
doc.reverse()                          # reverse in-place
doc == [1, 2, 3]                       # equality comparison

# Works with any list-expecting library
isinstance(doc, list)                  # True
json.dumps(doc)                        # works

# Conversion
doc.to_dict()                          # deep conversion to plain Python list

# Comments (addressed by integer index)
doc.get_comment_inline(idx)            # -> str | None
doc.get_comment_before(idx)            # -> str | None
doc.set_comment_inline(idx, text)
doc.set_comment_before(idx, text)

# YAML tag
doc.get_tag()                          # -> str | None  (e.g. '!!python/tuple')
doc.set_tag(None)

# Explicit document markers
doc.explicit_start                     # bool
doc.explicit_end                       # bool
doc.explicit_start = True
doc.explicit_end   = True

# Sorting (preserves comment metadata)
doc.sort()                             # natural order, in-place
doc.sort(reverse=True)
doc.sort(key=lambda v: len(v))         # custom key function on item values

Sorting preserves all comments — each entry or item carries its inline and before-key comments with it when reordered.

Benchmarks

Compare load, dump, and round-trip performance against PyYAML and ruamel.yaml across small, medium, and large inputs:

uv sync --group benchmark
uv run maturin develop --release
uv run pytest benchmarks/ -v --benchmark-sort=name

Running tests

You need Rust 1.85+ and Python 3.12+ with uv. Python 3.12 is the minimum — YamlSequence subclasses list, which requires PyO3's extends = PyList support introduced in Python 3.12.

# 1. Clone with the yaml-test-suite submodule
git clone --recurse-submodules https://github.com/theyugin/yarutsk
cd yarutsk

# 2. Create a virtual environment and install dev dependencies
uv sync --group dev

# 3. Build the extension in dev (debug) mode
uv run maturin develop

# 4. Run the suites
uv run pytest tests/ --ignore=tests/test_yaml_suite.py -v  # core library tests
uv run pytest tests/test_yaml_suite.py -q                   # yaml-test-suite compliance

test_yaml_suite.py requires the yaml-test-suite submodule. Tests that fail due to known YAML normalisation differences are marked xfail and do not count as failures.

Internals

The scanner and parser are vendored from yaml-rust2 (MIT licensed) with one targeted modification: the comment-skipping loop in the scanner now emits Comment tokens instead of discarding them. Everything else — block/flow parsing, scalar type coercion, multi-document support — comes from yaml-rust2 unchanged. The builder layer wires those tokens to the data model, and a hand-written block-style emitter serialises it back out.

YamlMapping and YamlSequence are PyO3 pyclasses that extend Python's built-in dict and list types. A Rust inner field stores the full YAML data model (including comments); the parent dict/list is kept in sync on every mutation so that all standard Python operations work transparently.

Disclaimer

This library was created with Claude Code (Anthropic). The design, implementation, tests, and this README were written by Claude under human direction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yarutsk-0.1.2.tar.gz (121.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

yarutsk-0.1.2-cp314-cp314-win_amd64.whl (399.8 kB view details)

Uploaded CPython 3.14Windows x86-64

yarutsk-0.1.2-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (509.0 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

yarutsk-0.1.2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (959.0 kB view details)

Uploaded CPython 3.14macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

yarutsk-0.1.2-cp313-cp313-win_amd64.whl (402.0 kB view details)

Uploaded CPython 3.13Windows x86-64

yarutsk-0.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (512.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

yarutsk-0.1.2-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (963.4 kB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

yarutsk-0.1.2-cp312-cp312-win_amd64.whl (402.5 kB view details)

Uploaded CPython 3.12Windows x86-64

yarutsk-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (512.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

yarutsk-0.1.2-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (964.1 kB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file yarutsk-0.1.2.tar.gz.

File metadata

  • Download URL: yarutsk-0.1.2.tar.gz
  • Upload date:
  • Size: 121.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yarutsk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5227854eddb8bb491e0173c04fbb418188bf0da75e4c74c60630777c9751e546
MD5 3e7003c6c8b3ba83e342040025640c08
BLAKE2b-256 376b93519618aad06871465abe07b423cddef26c398d242baf6d64ba571d3d8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2.tar.gz:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: yarutsk-0.1.2-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 399.8 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yarutsk-0.1.2-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 2db3c9c61d626fb59da3fe4d1b95825c44fb1a6cfde2f967def3a0311d5d8eda
MD5 72bc51e566d37be9b1cb94a4c17e049d
BLAKE2b-256 11331dcad5945d26418223ba9b585d65d3a1e26e2e0fd02916ce17d3664f71d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp314-cp314-win_amd64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for yarutsk-0.1.2-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 647bed088483ace551a8505910bfa254b3168d9ef15b8dce5b68a6b7a4c5d9ed
MD5 56af8c05e5d579a975edb8d06196546d
BLAKE2b-256 6e6165c7383fcb73497470ea58179a6e0a7e1fa63aab95d08b692c7791e10d18

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for yarutsk-0.1.2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 decd791823ed96ce56c9e1ff80dcd40129e20a359de57f2fc05bede65282544e
MD5 d3c2522779332f8e27306f647d6316f4
BLAKE2b-256 fb8ce3f52eafdfb130b735d460bd4a1e71bac6556a9a8ff960ac7d0bad60e74c

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp314-cp314-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: yarutsk-0.1.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 402.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yarutsk-0.1.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 62e4c9be6ad12f6c5ee19710944a83c21e9abb073be7fe0792d3f0f9c24c6cd6
MD5 1f58d4e370a7e5483093048542ec94cd
BLAKE2b-256 90af284b8b6fb555890e6348912c82da5cba95b1d6189a2c99c3ef021b6de919

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp313-cp313-win_amd64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for yarutsk-0.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7eed30563fe4561b2fdc663eee21525e4a7deb38ac1687165118b61037a0d958
MD5 b9fe68db84034b23ba755857e6195983
BLAKE2b-256 56609425b9d10e2174499999cd0b59647c68c362d6b5c22a5b7e422ad5c67404

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for yarutsk-0.1.2-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 d45b4f0c8d58846db04082775f08744964c3f8de0b391e6be7f08dbcb0a1b519
MD5 a01a8c05fec52ad2c2e3ecbfb88e490d
BLAKE2b-256 738c4350da3d704847e09aa00f371505c8ec3b304c78b8ff771d9334bc540c6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: yarutsk-0.1.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 402.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yarutsk-0.1.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c3d84fec1ac87537944a782f92d4bd5f68c276528155909e63b3c336d3b517ca
MD5 1797dc0dc4e1b483cb49beaaca9644f9
BLAKE2b-256 40ea0e6f0acc1e216aa380f51dea1bf193aa11e130116c21c7fc48a981f8f8a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp312-cp312-win_amd64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for yarutsk-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8fcdbfd268f55ffc4ac7a190d87dda79083b1135aa35f6cd7775562dfea34299
MD5 f6e83241b25582c82e9fcef347e7e4b6
BLAKE2b-256 710fd4c64fc57a772e12840b16a8a7ba996aafaaa758c4148589e292dce3ae5c

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yarutsk-0.1.2-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for yarutsk-0.1.2-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 678e5b6e5d694127049a395f862099ffd2300ab18ade67d068ecccc671319699
MD5 9cd5ba91171f9648a7ff675e98195d76
BLAKE2b-256 97970d8bf643de3ddd5a9ba372b56946d337fab231a6392f7a5debe7f0822f82

See more details on using hashes here.

Provenance

The following attestation bundles were made for yarutsk-0.1.2-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: ci.yml on theyugin/yarutsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page